入门客AI创业平台(我带你入门,你带我飞行)
博文笔记

spark ML 使用Word2Vec

创建时间:2016-05-22 投稿人: 浏览次数:3181

1.创建DF

val documentDF = sqlContext.createDataFrame(Seq(
  "Hi I heard about Spark".split(" "),
  "I wish Java could use case classes".split(" "),
  "Logistic regression models are neat".split(" ")
).map(Tuple1.apply)).toDF("text")

JSON的结构:

{"text":["I","wish","Java","could","use","case","classes"]}
{"text":["Logistic","regression","models","are","neat"]}
{"text":["Hi","I","heard","about","Spark"]}

 

2.创建word2vec

val word2Vec = new Word2Vec()
  .setInputCol("text")
  .setOutputCol("result")
  .setVectorSize(3)
  .setMinCount(0)

setVectorSize:把一个words组转换成多少纬度的向量,我们这里选择三个

 

3.model

val model = word2Vec.fit(documentDF)
val result = model.transform(documentDF)
result.select("result").take(3).foreach(println)

 

4.

scala> result.select("result").take(3).foreach(println)
[[-7.559644058346749E-4,-0.0235147787258029,9.437099099159241E-4]]
[[-0.06844028996835862,-0.029905967015240873,0.07320201684654291]]
[[0.006268330290913582,0.02445013374090195,0.06141428500413895]]

声明:该文观点仅代表作者本人,入门客AI创业平台信息发布平台仅提供信息存储空间服务,如有疑问请联系rumenke@qq.com。
  • 上一篇:没有了
  • 下一篇:没有了
未上传头像