Use spark to interact with ElasticSearch

juin 07, 2017

Use the elasticsearch-hadoop package to search for the item in github

example

import org.elasticsearch.spark._
import org.elasticsearch.spark.sql._

val conf = new SparkConf()
    .set("es.nodes","192.168.47.155")
    .set("es.port","9200")
    .setMaster("spark://...")
    .setAppName("es_hdfs")

val sc = new SparkConf(sc)

//查询合作方为abc的数据
val query = """{"query":{"match":{"activity.partnerCode": "abc"}}}"""

//将在es中的查询结果转化为rdd/dataFrame
val esRdd = sc.esRDD(s"index/type",query)
//直接读入全部数据
val esDf = sqlContext.esDF(s"index/type")

//对读入rdd/dataFrame进行操作
esRdd.map(r=>{...})
esDf.flatMap(r=>{......})

//将dataFrame/rdd写入es
esRdd.saveToEs("index/type")
resultDf.saveToEs("index/type")

Rechercher dans ce blog

Big data

Use spark to interact with ElasticSearch

example

Commentaires

Enregistrer un commentaire

Posts les plus consultés de ce blog

Controlling Parallelism in Spark by controlling the input partitions by controlling the input partitions

Spark optimization

Spark performance optimization: shuffle tuning