performance - Parallel processing of files in java with ExecutorService does not use all of the CPU power -
i have directory contains 1000s of csv files need parse. have implemented executorservice class of java job, wherein assign each thread csv file parse. have 4 cores in machine. efficiency compared single-threaded application. however, when see cpu utilization( using task manager) doesn't seem utilising of cpu power, % of cpu used 30%-40%. wanted know if approach correct.
file dir = new file(file); if(dir.isdirectory()){ file[] files = dir.listfiles(); for(file f : files){ string file_abs_path = f.getabsolutepath(); int index = file_abs_path.lastindexof("/") + 1; file_name = file_abs_path.substring(index); futureslist.add(eservice.submit(new myparser(file_abs_path))); } object gpdocs; for(future<list<myobj>> future:futureslist) { try { docs = future.get(); arraylist = (list<myobj>)docs; iterator<myobj> = arraylist.iterator(); while(it.hasnext()){ doc = createdocument(file_name,it.next()); try{ //somefunction(doc); }catch(exception e){} }}catch (interruptedexception e) {} catch (executionexception e) {} }}
i wondering if approach correct? appreciated.
thanks
the code parser :
public list<myobj> call(){ columnpositionmappingstrategy<myobj> strat = new columnpositionmappingstrategy<myobj>(); strat.settype(myobj.class); string[] columns = new string[] {//list of columns in csv file}; strat.setcolumnmapping(columns); csvtobean<myobj> csv = new csvtobean<myobj>(); bufferedreader reader = null; string doc_line = ""; string[] docs; string doc = ""; file dir = new file(file_path); try{ int comma_count = 0; reader = new bufferedreader(new filereader(dir)); while((doc_line = reader.readline()) != null){ docs = doc_line.split(","); doc += docs[i] + " "; } reader.close(); }catch (ioexception e) {/*e.printstacktrace();*/} return(csv.parse(strat,new stringreader(doc))); }
as commented, task io bound, tasks involving io hard-drive are.
the best performance can hope decouple reading threads processing. probably, single reading thread, reading blocks of data large possible , feeding queue processing yield best overall throughput. number of processing threads whatever necessary keep reading.
Comments
Post a Comment