scala - how to provide CSV input to naive bayes classifier -
hi working on disease classification using naïve bayes model. have csv file have disease along symptoms. format of csv: symptom-1 symptom-2 symptom-3 disease how provide csv naïve bayes model , classify disease based on symptoms there standard code read csv , provide naïve bayes model perform classification using spark machine learning library this.
this modified example mllib doc
import org.apache.spark.mllib.classification.{naivebayes, naivebayesmodel} import org.apache.spark.mllib.linalg.vectors import org.apache.spark.mllib.regression.labeledpoint val data = sc.textfile("your csv path") val parseddata = data.map { line => val parts = line.split(',') // labeled point labeledpoint(disease,(symptom 1,2,3)) // assuming of them numeric labeledpoint(parts(3).todouble,vectors.dense(parts(0).todouble,parts(1).todouble,parts(2).todouble)) } // split data training (60%) , test (40%). val splits = parseddata.randomsplit(array(0.6, 0.4), seed = 11l) val training = splits(0) val test = splits(1) val model = naivebayes.train(training, lambda = 1.0, modeltype = "multinomial") val predictionandlabel = test.map(p => (model.predict(p.features), p.label)) val accuracy = 1.0 * predictionandlabel.filter(x => x._1 == x._2).count() / test.count() // save , load model model.save(sc, "target/tmp/naivebayesmodel") val samemodel = naivebayesmodel.load(sc, "target/tmp/naivebayesmodel")
Comments
Post a Comment