spark ML 使用Word2Vec
1.创建DF
val documentDF = sqlContext.createDataFrame(Seq(
"Hi I heard about Spark".split(" "),
"I wish Java could use case classes".split(" "),
"Logistic regression models are neat".split(" ")
).map(Tuple1.apply)).toDF("text")
JSON的结构:
{"text":["I","wish","Java","could","use","case","classes"]}
{"text":["Logistic","regression","models","are","neat"]}
{"text":["Hi","I","heard","about","Spark"]}
2.创建word2vec
val word2Vec = new Word2Vec()
.setInputCol("text")
.setOutputCol("result")
.setVectorSize(3)
.setMinCount(0)
setVectorSize:把一个words组转换成多少纬度的向量,我们这里选择三个
3.model
val model = word2Vec.fit(documentDF)
val result = model.transform(documentDF)
result.select("result").take(3).foreach(println)
4.
scala> result.select("result").take(3).foreach(println)
[[-7.559644058346749E-4,-0.0235147787258029,9.437099099159241E-4]]
[[-0.06844028996835862,-0.029905967015240873,0.07320201684654291]]
[[0.006268330290913582,0.02445013374090195,0.06141428500413895]]
声明:该文观点仅代表作者本人,入门客AI创业平台信息发布平台仅提供信息存储空间服务,如有疑问请联系rumenke@qq.com。
- 上一篇:没有了
- 下一篇:没有了
