r - Load spark-csv from Rstudio under Windows environment -
can 1 tell me if can import spark-csv package sparkr using r studio under windows 7 environment? local machine has r 3.2.2, spark-1.6.1-bin-hadoop2.6 , java installed, not maven, scala etc. don't know if miss in order call spark-csv? shall install package (.jar file) , put in folder?
here script:
library(rjava) sys.setenv(spark_home = 'c:/users/***/spark-1.6.1-bin-hadoop2.6') .libpaths(c(file.path(sys.getenv('spark_home'), 'r', 'lib'), .libpaths())) library(sparkr) sys.setenv('sparkr_submit_args'='"--packages" "com.databricks:spark-csv_2.11:1.4.0" "sparkr-shell"') sc <- sparkr.init(master = "local[*]", sparkenvir = list(spark.driver.memory="2g")) sqlcontext <- sparkrsql.init(sc)
i able call sparkr library , initiate sc, here message:
launching java spark-submit command c:/users/***/spark-1.6.1-bin-hadoop2.6/bin/spark-submit.cmd --driver-memory "2g" "--packages" "com.databricks:spark-csv_2.11:1.4.0" "sparkr-shell" c:\users\hwu\appdata\local\temp\2\rtmp46mvve\backend_port13b423eed9c
then, when try load local csv file, failed. put csv file under r's current working directory already.
flights <- read.df(sqlcontext, "nycflights13.csv", "com.databricks.spark.csv", header="true")
i got error message:
error in invokejava(isstatic = true, classname, methodname, ...) : org.apache.spark.sparkexception: job aborted due stage failure: task 0 in stage 0.0 failed 1 times, recent failure: lost task 0.0 in stage 0.0 (tid 0, localhost): java.lang.nullpointerexception @ java.lang.processbuilder.start(unknown source) @ org.apache.hadoop.util.shell.runcommand(shell.java:482) @ org.apache.hadoop.util.shell.r...(shell.java:455) @ org.apache.hadoop.util.shell$shellcommandexecutor.execute(shell.java:715) @ org.apache.hadoop.fs.fileutil.chmod(fileutil.java:873) @ org.apache.hadoop.fs.fileutil.chmod(fileutil.java:853) @ org.apache.spark.util.utils$.fetchfile(utils.scala:406) @ org.apache.spark.executor.executor$$anonfun$org$apache$spark$executor$executor$$updatedependencies$5.apply(executor.scala:405) @ org.apache.spark.executor.executor$$anonfun$org$apache$spark$executor$executor$$updatedependencies$5.apply(executor.scala:397) @ scala.collection.traversablelike$withfilter$$anonfun$foreach$1.apply(traversablelike.scala:7
thank advice.
instead of this:
sys.setenv('sparkr_submit_args'='"--packages" "com.databricks:spark-csv_2.11:1.4.0" "sparkr-shell"')
try this:
sys.setenv(sparkr_submit_args="--packages com.databricks:spark-csv_2.11:1.4.0 sparkr-shell"
or perhaps this
sc <- sparkr.init(master="local[*]",appname="yourapp",sparkpackages="com.databricks:spark-csv_2.11:1.4.0")
Comments
Post a Comment