StreamingPro again supports Structured Streaming

mai 30, 2017

Preface
Before the article has been written, StreamingPro support Spark Structured Streaming , but was only the nature of the game, because the Spark 2.0 + version is actually only try nature, the focus is on the spark 1.6 series. But the time is going, Spark 2.0+ version is still the trend. So this version of the bottom to do a lot of reconstruction, StreamingPro currently supports Flink , Spark 1.6 +, Spark 2.0 + three engines.

Preparing for a job Download the package for streaming software for spark 2.0 and then download the spark 2.1 installation package.
You can also find the spark 1.6+ or flink version in the streamingpro directory . The latest general format in accordance with the following format:

streamingpro-spark-0.4.14-SNAPSHOT.jar  适配  spark 1.6+,scala 2.10
streamingpro-spark-2.0-0.4.14-SNAPSHOT.jar  适配  spark 2.0+,scala 2.11
streamingpro.flink-0.4.14-SNAPSHOT-online-1.2.0.jar 适配 flink 1.2.0, scala 2.10

Test example <br> Write a json file ss.json, as follows:

{
  "scalamaptojson": {
    "desc": "测试",
    "strategy": "spark",
    "algorithm": [],
    "ref": [
    ],
    "compositor": [
      {
        "name": "ss.sources",
        "params": [
          {
            "format": "socket",
            "outputTable": "test",
            "port":"9999",
            "host":"localhost",
            "path": "-"
          },
          {
            "format": "com.databricks.spark.csv",
            "outputTable": "sample",
            "header":"true",
            "path": "/Users/allwefantasy/streamingpro/sample.csv"
          }
        ]
      },
      {
        "name": "ss.sql",
        "params": [
          {
            "sql": "select city from test left join sample on test.value == sample.name",
            "outputTableName": "test3"
          }
        ]
      },
      {
        "name": "ss.outputs",
        "params": [
          {
            "mode": "append",
            "format": "console",
            "inputTableName": "test3",
            "path": "-"
          }
        ]
      }
    ],
    "configParams": {
    }
  }
}

Is roughly a socket source, a sample file.  Socket source is streaming, sample file is batch processing.

 Sample.csv reads as follows:

id,name,city,age
1,david,shenzhen,31
2,eason,shenzhen,27
3,jarry,wuhan,35

Then you in the terminal implementation nc -lk 9999 just fine. 

  Then run the spark program:

SHome=/Users/allwefantasy/streamingpro
./bin/spark-submit   --class streaming.core.StreamingApp \
--master local[2] \
--name test \
$SHome/streamingpro-spark-2.0-0.4.14-SNAPSHOT.jar    \
-streaming.name test    \
-streaming.platform spark_structrued_streaming \
-streaming.job.file.path file://$SHome/ss.json

Rechercher dans ce blog

Big data

StreamingPro again supports Structured Streaming

Commentaires

Enregistrer un commentaire

Posts les plus consultés de ce blog

Controlling Parallelism in Spark by controlling the input partitions by controlling the input partitions

Spark optimization

Spark performance optimization: shuffle tuning