java - How to output files with a specific extension (like .csv) in Hadoop, using MultipleOutputs class -

- April 15, 2013

i have mapreduce program uses multipleoutputsto output result multiple files. reducer looks this:

private multipleoutputs mo = new multipleoutputs<nullwritable, text>(context); ... public void reduce(edge keys, iterable<nullwritable> values, context context)             throws ioexception, interruptedexception {         string date = records.formatdate(millis);         out.set(keys.get(0) + "\t" + keys.get(1));         parser.parse(key);          string filepath = string.format("%s/part", parser.getfileid());         mo.write(noval, out, filepath);     }

this similar example in book hadoop: definitive guide - however, problem outputs files plain text. want files outputted .csv files , haven't managed find explanation on in book or online.

how can done?

have tried iterate through output folder after completion of job object in driver rename files?

as long emit in reducer (the text should line in csv values separated semicolon or whatever need) can give try this:

job job = new job(getconf()); //... //your job setup, including output config  job.setoutputkeyclass(text.class); job.setoutputvalueclass(nullwritable.class); //... boolean success = job.waitforcompletion(true); if (success){     filesystem hdfs = filesystem.get(getconf());     filestatus fs[] = hdfs.liststatus(new path(outputpath));     if (fs != null){          (filestatus afile : fs) {             if (!afile.isdir()) {                 hdfs.rename(afile.getpath(), new path(afile.getpath().tostring()+".csv"));             }         }     } }

Search This Blog

Arrya Code

java - How to output files with a specific extension (like .csv) in Hadoop, using MultipleOutputs class -

Comments

Post a Comment

Popular posts from this blog

html - Styling progress bar with inline style -

java - Oracle Sql developer error: could not install some modules -

How to use autoclose brackets in Jupyter notebook? -