r - Load spark-csv from Rstudio under Windows environment -


can 1 tell me if can import spark-csv package sparkr using r studio under windows 7 environment? local machine has r 3.2.2, spark-1.6.1-bin-hadoop2.6 , java installed, not maven, scala etc. don't know if miss in order call spark-csv? shall install package (.jar file) , put in folder?

here script:

library(rjava) sys.setenv(spark_home = 'c:/users/***/spark-1.6.1-bin-hadoop2.6')  .libpaths(c(file.path(sys.getenv('spark_home'), 'r', 'lib'), .libpaths())) library(sparkr)  sys.setenv('sparkr_submit_args'='"--packages" "com.databricks:spark-csv_2.11:1.4.0" "sparkr-shell"')  sc <- sparkr.init(master = "local[*]", sparkenvir = list(spark.driver.memory="2g")) sqlcontext <- sparkrsql.init(sc) 

i able call sparkr library , initiate sc, here message:

launching java spark-submit command c:/users/***/spark-1.6.1-bin-hadoop2.6/bin/spark-submit.cmd   --driver-memory "2g" "--packages" "com.databricks:spark-csv_2.11:1.4.0" "sparkr-shell" c:\users\hwu\appdata\local\temp\2\rtmp46mvve\backend_port13b423eed9c  

then, when try load local csv file, failed. put csv file under r's current working directory already.
flights <- read.df(sqlcontext, "nycflights13.csv", "com.databricks.spark.csv", header="true")

i got error message:

error in invokejava(isstatic = true, classname, methodname, ...) :  org.apache.spark.sparkexception: job aborted due stage failure: task 0 in stage 0.0 failed 1 times, recent failure: lost task 0.0 in stage 0.0 (tid 0, localhost): java.lang.nullpointerexception @ java.lang.processbuilder.start(unknown source) @ org.apache.hadoop.util.shell.runcommand(shell.java:482) @ org.apache.hadoop.util.shell.r...(shell.java:455) @ org.apache.hadoop.util.shell$shellcommandexecutor.execute(shell.java:715) @ org.apache.hadoop.fs.fileutil.chmod(fileutil.java:873) @ org.apache.hadoop.fs.fileutil.chmod(fileutil.java:853) @ org.apache.spark.util.utils$.fetchfile(utils.scala:406) @ org.apache.spark.executor.executor$$anonfun$org$apache$spark$executor$executor$$updatedependencies$5.apply(executor.scala:405) @ org.apache.spark.executor.executor$$anonfun$org$apache$spark$executor$executor$$updatedependencies$5.apply(executor.scala:397) @ scala.collection.traversablelike$withfilter$$anonfun$foreach$1.apply(traversablelike.scala:7 

thank advice.

instead of this:

sys.setenv('sparkr_submit_args'='"--packages" "com.databricks:spark-csv_2.11:1.4.0" "sparkr-shell"') 

try this:

sys.setenv(sparkr_submit_args="--packages com.databricks:spark-csv_2.11:1.4.0 sparkr-shell" 

or perhaps this

sc <- sparkr.init(master="local[*]",appname="yourapp",sparkpackages="com.databricks:spark-csv_2.11:1.4.0") 

Comments

Popular posts from this blog

Django REST Framework perform_create: You cannot call `.save()` after accessing `serializer.data` -

Why does Go error when trying to marshal this JSON? -