java - How to output files with a specific extension (like .csv) in Hadoop, using MultipleOutputs class -
i have mapreduce program uses multipleoutputs
to output result multiple files. reducer looks this:
private multipleoutputs mo = new multipleoutputs<nullwritable, text>(context); ... public void reduce(edge keys, iterable<nullwritable> values, context context) throws ioexception, interruptedexception { string date = records.formatdate(millis); out.set(keys.get(0) + "\t" + keys.get(1)); parser.parse(key); string filepath = string.format("%s/part", parser.getfileid()); mo.write(noval, out, filepath); }
this similar example in book hadoop: definitive guide - however, problem outputs files plain text. want files outputted .csv files , haven't managed find explanation on in book or online.
how can done?
have tried iterate through output folder after completion of job object in driver rename files?
as long emit in reducer (the text should line in csv values separated semicolon or whatever need) can give try this:
job job = new job(getconf()); //... //your job setup, including output config job.setoutputkeyclass(text.class); job.setoutputvalueclass(nullwritable.class); //... boolean success = job.waitforcompletion(true); if (success){ filesystem hdfs = filesystem.get(getconf()); filestatus fs[] = hdfs.liststatus(new path(outputpath)); if (fs != null){ (filestatus afile : fs) { if (!afile.isdir()) { hdfs.rename(afile.getpath(), new path(afile.getpath().tostring()+".csv")); } } } }
Comments
Post a Comment