hadoop - How do I include a resource file in Java project to be used with just new File()? -


i'm writing udf pig using java. works fine pig doesn't give me options separate environment. pig script doing geo location ip address.

here's code on geo location part.

private static final string geo_db = "geolite2-city.mmdb"; private static final string geo_file = "/geo/" + geo_db;   public map<string, object> geodata(string ipstr) {         map<string, object> geomap = new hashmap<string, object>();          databasereader reader = new databasereader.builder(new file(geo_db)).build();             // other stuff     } 

geolite2-city.mmdb exists in hdfs that's why can refer absolute path using /geo/geolite2-city.mmdb.

however, can't junit test or have create /geo/geolite2-city.mmdb on local machine , jenkins not ideal. i'm trying figure out way make test passed while using new file(geo_db) , not getclass().getresourceasstream('./geo/geolite2-city.mmdb') because

getclass().getresourceasstream('./geo/geolite2-city.mmdb') 

doesn't work in hadoop.

and if run junit test fail because don't have /geo/geolite2-city.mmdb on local machine.

is there anyway can overcome this? want tests pass without changing code using getclass().getresourceasstream , can't if/else around because pig doesn't give me way pass in parameter or maybe i'm missing something.

and junit test

@test @ignore public void shouldgetgeodata() throws exception {     string iptest = "128.101.101.101";      map<string, object> geojson = new logline2json().geodata(iptest);      assertthat(geojson.get("lla").tostring(), is(equalto("44.9759")));     assertthat(geojson.get("llo").tostring(), is(equalto("-93.2166")));  } 

which works if read database file resource folder. that's why have @ignore

besides, whole code looks pretty un-testable.

every time when directly call new in production code, prevent dependency injection; , thereby make harder test code.

the point not call new file() within production code. instead, use factory gives "ready use" databasereader object. can test factory right thing; , can mock factory when testing code (to return mocked database reader).

so, 1 file instance top of "testing problems" here.

honestly: don't write production code first. tdd: write test cases first; , learn such production code presenting here hard test. , when apply tdd, start "test perspective", , create production code testable.


Comments

Popular posts from this blog

html - Styling progress bar with inline style -

java - Oracle Sql developer error: could not install some modules -

How to use autoclose brackets in Jupyter notebook? -