hadoop - complex Hive Query -
hi have following table:
id------ |--- time ====================== 5------- | ----200101 3--------| --- 200102 2--------|---- 200103 12 ------|---- 200101 16-------|---- 200103 18-------|---- 200106
now want know how month in year appears. cant use group because counts number of times appears in table. want 0 when month in year not appear. output should this:
time-------|----count ===================== 200101--|-- 2 200102--|-- 1 200103--|-- 1 200104--|-- 0 200105--|-- 0 200106--|-- 1
sorry bad table format, hope still clear mean. apreciate help
you can provide year-month table containing year , month information. wrote script generate such csv file:
#!/bin/bash # year_month.sh start_year=1970 end_year=2015 year in $( seq ${start_year} ${end_year} ); month in $( seq 1 12 ); echo ${year}$( echo ${month} | awk '{printf("%02d\n", $1)}'); done; done > year_month.csv
save in year_month.sh
, run it. file year_month.csv
containing year , month 1970 2015. can change start_year
, end_year
specify year range.
then, upload year_month.csv
file hdfs. example,
hadoop fs -mkdir /user/joe/year_month hadoop fs -put year_month.csv /user/joe/year_month/
after that, can load year_month.csv
hive. example,
create external table if not exists year_month (time int) location '/user/joe/year_month';
at last, can join new table table final result. example, assume table id_time
:
from (select year_month.time time, time_count.id id year_month left outer join id_time on year_month.time = id_time.time) temp select time, count(id) count group time;
note: need make tiny modification (such path, type) above statement.
Comments
Post a Comment