Structured Streaming Programming Guide
Miscellaneous Notes
- Several configurations are not modifiable after the query has run. To change them, discard the checkpoint and start a new query. These configurations include:
    - spark.sql.shuffle.partitions- This is due to the physical partitioning of state: state is partitioned via applying hash function to key, hence the number of partitions for state should be unchanged.
- If you want to run fewer tasks for stateful operations, coalescewould help with avoiding unnecessary repartitioning.- After coalesce, the number of (reduced) tasks will be kept unless another shuffle happens.
 
- After 
 
- spark.sql.streaming.stateStore.providerClass: To read the previous state of the query properly, the class of state store provider should be unchanged.
- spark.sql.streaming.multipleWatermarkPolicy: Modification of this would lead inconsistent watermark value when query contains multiple watermarks, hence the policy should be unchanged.
 
Related Resources
Further Reading
- See and run the
Python/Scala/Java/R
examples.
    - Instructions on how to run Spark examples
 
- Read about integrating with Kafka in the Structured Streaming Kafka Integration Guide
- Read more details about using DataFrames/Datasets in the Spark SQL Programming Guide
- Third-party Blog Posts
Talks
- Spark Summit Europe 2017
    - Easy, Scalable, Fault-tolerant Stream Processing with Structured Streaming in Apache Spark - Part 1 slides/video, Part 2 slides/video
- Deep Dive into Stateful Stream Processing in Structured Streaming - slides/video
 
- Spark Summit 2016
    - A Deep Dive into Structured Streaming - slides/video
 
Migration Guide
The migration guide is now archived on this page.