StreamingPro again supports Structured Streaming

Preface
Before the article has been written, StreamingPro support Spark Structured Streaming , but was only the nature of the game, because the Spark 2.0 + version is actually only try nature, the focus is on the spark 1.6 series. But the time is going, Spark 2.0+ version is still the trend. So this version of the bottom to do a lot of reconstruction, StreamingPro currently supports Flink , Spark 1.6 +, Spark 2.0 + three engines.

Preparing for a job Download the package for streaming software for spark 2.0 and then download the spark 2.1 installation package.
You can also find the spark 1.6+ or flink version in the streamingpro directory . The latest general format in accordance with the following format:

streamingpro-spark-0.4.14-SNAPSHOT.jar  适配  spark 1.6+,scala 2.10
streamingpro-spark-2.0-0.4.14-SNAPSHOT.jar  适配  spark 2.0+,scala 2.11
streamingpro.flink-0.4.14-SNAPSHOT-online-1.2.0.jar 适配 flink 1.2.0, scala 2.10
 
 Test example <br> Write a json file ss.json, as follows:

{
  "scalamaptojson": {
    "desc": "测试",
    "strategy": "spark",
    "algorithm": [],
    "ref": [
    ],
    "compositor": [
      {
        "name": "ss.sources",
        "params": [
          {
            "format": "socket",
            "outputTable": "test",
            "port":"9999",
            "host":"localhost",
            "path": "-"
          },
          {
            "format": "com.databricks.spark.csv",
            "outputTable": "sample",
            "header":"true",
            "path": "/Users/allwefantasy/streamingpro/sample.csv"
          }
        ]
      },
      {
        "name": "ss.sql",
        "params": [
          {
            "sql": "select city from test left join sample on test.value == sample.name",
            "outputTableName": "test3"
          }
        ]
      },
      {
        "name": "ss.outputs",
        "params": [
          {
            "mode": "append",
            "format": "console",
            "inputTableName": "test3",
            "path": "-"
          }
        ]
      }
    ],
    "configParams": {
    }
  }
}
 

Is roughly a socket source, a sample file.  Socket source is streaming, sample file is batch processing.  
 Sample.csv reads as follows:
 
id,name,city,age
1,david,shenzhen,31
2,eason,shenzhen,27
3,jarry,wuhan,35
 
Then you in the terminal implementation nc -lk 9999 just fine. 

  Then run the spark program:
SHome=/Users/allwefantasy/streamingpro
./bin/spark-submit   --class streaming.core.StreamingApp \
--master local[2] \
--name test \
$SHome/streamingpro-spark-2.0-0.4.14-SNAPSHOT.jar    \
-streaming.name test    \
-streaming.platform spark_structrued_streaming \
-streaming.job.file.path file://$SHome/ss.json   
 
 
 
  

Commentaires

Posts les plus consultés de ce blog

Controlling Parallelism in Spark by controlling the input partitions by controlling the input partitions

Spark performance optimization: shuffle tuning

Spark optimization