aboutsummaryrefslogtreecommitdiff
path: root/docs/streaming-programming-guide.md
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-19550][BUILD][CORE][WIP] Remove Java 7 supportSean Owen2017-02-161-146/+73
| | | | | | | | | | | | | | | | | | | | | | | | - Move external/java8-tests tests into core, streaming, sql and remove - Remove MaxPermGen and related options - Fix some reflection / TODOs around Java 8+ methods - Update doc references to 1.7/1.8 differences - Remove Java 7/8 related build profiles - Update some plugins for better Java 8 compatibility - Fix a few Java-related warnings For the future: - Update Java 8 examples to fully use Java 8 - Update Java tests to use lambdas for simplicity - Update Java internal implementations to use lambdas ## How was this patch tested? Existing tests Author: Sean Owen <sowen@cloudera.com> Closes #16871 from srowen/SPARK-19493.
* [SPARK-19520][STREAMING] Do not encrypt data written to the WAL.Marcelo Vanzin2017-02-131-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Spark's I/O encryption uses an ephemeral key for each driver instance. So driver B cannot decrypt data written by driver A since it doesn't have the correct key. The write ahead log is used for recovery, thus needs to be readable by a different driver. So it cannot be encrypted by Spark's I/O encryption code. The BlockManager APIs used by the WAL code to write the data automatically encrypt data, so changes are needed so that callers can to opt out of encryption. Aside from that, the "putBytes" API in the BlockManager does not do encryption, so a separate situation arised where the WAL would write unencrypted data to the BM and, when those blocks were read, decryption would fail. So the WAL code needs to ask the BM to encrypt that data when encryption is enabled; this code is not optimal since it results in a (temporary) second copy of the data block in memory, but should be OK for now until a more performant solution is added. The non-encryption case should not be affected. Tested with new unit tests, and by running streaming apps that do recovery using the WAL data with I/O encryption turned on. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #16862 from vanzin/SPARK-19520.
* [SPARK-19003][DOCS] Add Java example in Spark Streaming Guide, section ↵adesharatushar2016-12-291-0/+72
| | | | | | | | | | | | | | | | | | | Design Patterns for using foreachRDD ## What changes were proposed in this pull request? Added missing Java example under section "Design Patterns for using foreachRDD". Now this section has examples in all 3 languages, improving consistency of documentation. ## How was this patch tested? Manual. Generated docs using command "SKIP_API=1 jekyll build" and verified generated HTML page manually. The syntax of example has been tested for correctness using sample code on Java1.7 and Spark 2.2.0-SNAPSHOT. Author: adesharatushar <tushar_adeshara@persistent.com> Closes #16408 from adesharatushar/streaming-doc-fix.
* [SPARK-18073][DOCS][WIP] Migrate wiki to spark.apache.org web siteSean Owen2016-11-231-1/+1
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Updates links to the wiki to links to the new location of content on spark.apache.org. ## How was this patch tested? Doc builds Author: Sean Owen <sowen@cloudera.com> Closes #15967 from srowen/SPARK-18073.1.
* [SPARK-18445][BUILD][DOCS] Fix the markdown for `Note:`/`NOTE:`/`Note ↵hyukjinkwon2016-11-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | that`/`'''Note:'''` across Scala/Java API documentation ## What changes were proposed in this pull request? It seems in Scala/Java, - `Note:` - `NOTE:` - `Note that` - `'''Note:'''` - `note` This PR proposes to fix those to `note` to be consistent. **Before** - Scala ![2016-11-17 6 16 39](https://cloud.githubusercontent.com/assets/6477701/20383180/1a7aed8c-acf2-11e6-9611-5eaf6d52c2e0.png) - Java ![2016-11-17 6 14 41](https://cloud.githubusercontent.com/assets/6477701/20383096/c8ffc680-acf1-11e6-914a-33460bf1401d.png) **After** - Scala ![2016-11-17 6 16 44](https://cloud.githubusercontent.com/assets/6477701/20383167/09940490-acf2-11e6-937a-0d5e1dc2cadf.png) - Java ![2016-11-17 6 13 39](https://cloud.githubusercontent.com/assets/6477701/20383132/e7c2a57e-acf1-11e6-9c47-b849674d4d88.png) ## How was this patch tested? The notes were found via ```bash grep -r "NOTE: " . | \ # Note:|NOTE:|Note that|'''Note:''' grep -v "// NOTE: " | \ # starting with // does not appear in API documentation. grep -E '.scala|.java' | \ # java/scala files grep -v Suite | \ # exclude tests grep -v Test | \ # exclude tests grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation -e 'org.apache.spark.api.java.function' \ # note that this is a regular expression. So actual matches were mostly `org/apache/spark/api/java/functions ...` -e 'org.apache.spark.api.r' \ ... ``` ```bash grep -r "Note that " . | \ # Note:|NOTE:|Note that|'''Note:''' grep -v "// Note that " | \ # starting with // does not appear in API documentation. grep -E '.scala|.java' | \ # java/scala files grep -v Suite | \ # exclude tests grep -v Test | \ # exclude tests grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation -e 'org.apache.spark.api.java.function' \ -e 'org.apache.spark.api.r' \ ... ``` ```bash grep -r "Note: " . | \ # Note:|NOTE:|Note that|'''Note:''' grep -v "// Note: " | \ # starting with // does not appear in API documentation. grep -E '.scala|.java' | \ # java/scala files grep -v Suite | \ # exclude tests grep -v Test | \ # exclude tests grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation -e 'org.apache.spark.api.java.function' \ -e 'org.apache.spark.api.r' \ ... ``` ```bash grep -r "'''Note:'''" . | \ # Note:|NOTE:|Note that|'''Note:''' grep -v "// '''Note:''' " | \ # starting with // does not appear in API documentation. grep -E '.scala|.java' | \ # java/scala files grep -v Suite | \ # exclude tests grep -v Test | \ # exclude tests grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation -e 'org.apache.spark.api.java.function' \ -e 'org.apache.spark.api.r' \ ... ``` And then fixed one by one comparing with API documentation/access modifiers. After that, manually tested via `jekyll build`. Author: hyukjinkwon <gurwls223@gmail.com> Closes #15889 from HyukjinKwon/SPARK-18437.
* [DOCS] Reorganize explanation of Accumulators and Broadcast VariablesJosé Hiram Soltren2016-09-291-164/+164
| | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? The discussion of the interaction of Accumulators and Broadcast Variables should logically follow the discussion on Checkpointing. As currently written, this section discusses Checkpointing before it is formally introduced. To remedy this: - Rename this section to "Accumulators, Broadcast Variables, and Checkpoints", and - Move this section after "Checkpointing". ## How was this patch tested? Testing: ran $ SKIP_API=1 jekyll build , and verified changes in a Web browser pointed at docs/_site/index.html. Author: José Hiram Soltren <jose@cloudera.com> Closes #15281 from jsoltren/doc-changes.
* [SPARK-17445][DOCS] Reference an ASF page as the main place to find ↵Sean Owen2016-09-141-1/+1
| | | | | | | | | | | | | | | | | | third-party packages ## What changes were proposed in this pull request? Point references to spark-packages.org to https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects This will be accompanied by a parallel change to the spark-website repo, and additional changes to this wiki. ## How was this patch tested? Jenkins tests. Author: Sean Owen <sowen@cloudera.com> Closes #15075 from srowen/SPARK-17445.
* Streaming doc correction.Satendra Kumar2016-09-091-1/+1
| | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) Streaming doc correction. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: Satendra Kumar <satendra@knoldus.com> Closes #14996 from satendrakumar06/patch-1.
* [MINOR][DOCS] Fix minor typos in python example codeDmitriy Sokolov2016-08-301-17/+16
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Fix minor typos python example code in streaming programming guide ## How was this patch tested? N/A Author: Dmitriy Sokolov <silentsokolov@gmail.com> Closes #14805 from silentsokolov/fix-typos.
* [SPARK-17242][DOCUMENT] Update links of external dstream projectsShixiong Zhu2016-08-251-6/+2
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Updated links of external dstream projects. ## How was this patch tested? Just document changes. Author: Shixiong Zhu <shixiong@databricks.com> Closes #14814 from zsxwing/dstream-link.
* [SPARK-12370][DOCUMENTATION] Documentation should link to examples …Jagadeesan2016-08-131-11/+11
| | | | | | | | | | | | ## What changes were proposed in this pull request? When documentation is built is should reference examples from the same build. There are times when the docs have links that point to files in the GitHub head which may not be valid on the current release. Changed that in URLs to make them point to the right tag in git using ```SPARK_VERSION_SHORT``` …from its own release version] [Streaming programming guide] Author: Jagadeesan <as2@us.ibm.com> Closes #14596 from jagadeesanas2/SPARK-12370.
* [MINOR][DOC] Fix style in examples across documentationhyukjinkwon2016-08-121-14/+14
| | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR fixes the documentation as below: - Python has 4 spaces and Java and Scala has 2 spaces (See https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide). - Avoid excessive parentheses and curly braces for anonymous functions. (See https://github.com/databricks/scala-style-guide#anonymous) ## How was this patch tested? N/A Author: hyukjinkwon <gurwls223@gmail.com> Closes #14593 from HyukjinKwon/minor-documentation.
* [SPARK-16911] Fix the links in the programming guideShivansh2016-08-071-45/+0
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Fix the broken links in the programming guide of the Graphx Migration and understanding closures ## How was this patch tested? By running the test cases and checking the links. Author: Shivansh <shiv4nsh@gmail.com> Closes #14503 from shiv4nsh/SPARK-16911.
* [SPARK-16312][STREAMING][KAFKA][DOC] Doc for Kafka 0.10 integrationcody koeninger2016-08-051-2/+2
| | | | | | | | | | | | ## What changes were proposed in this pull request? Doc for the Kafka 0.10 integration ## How was this patch tested? Scala code examples were taken from my example repo, so hopefully they compile. Author: cody koeninger <cody@koeninger.org> Closes #14385 from koeninger/SPARK-16312.
* [MINOR][DOC] missing keyword newBartek Wiśniewski2016-07-271-1/+1
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? added missing keyword for java example ## How was this patch tested? wasn't Author: Bartek Wiśniewski <wedi@Ava.local> Closes #14381 from wedi-dev/quickfix/missing_keyword.
* [SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API primary in MLlib guideJoseph K. Bradley2016-07-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Made DataFrame-based API primary * Spark doc menu bar and other places now link to ml-guide.html, not mllib-guide.html * mllib-guide.html keeps RDD-specific list of features, with a link at the top redirecting people to ml-guide.html * ml-guide.html includes a "maintenance mode" announcement about the RDD-based API * **Reviewers: please check this carefully** * (minor) Titles for DF API no longer include "- spark.ml" suffix. Titles for RDD API have "- RDD-based API" suffix * Moved migration guide to ml-guide from mllib-guide * Also moved past guides from mllib-migration-guides to ml-migration-guides, with a redirect link on mllib-migration-guides * **Reviewers**: I did not change any of the content of the migration guides. Reorganized DataFrame-based guide: * ml-guide.html mimics the old mllib-guide.html page in terms of content: overview, migration guide, etc. * Moved Pipeline description into ml-pipeline.html and moved tuning into ml-tuning.html * **Reviewers**: I did not change the content of these guides, except some intro text. * Sidebar remains the same, but with pipeline and tuning sections added Other: * ml-classification-regression.html: Moved text about linear methods to new section in page ## How was this patch tested? Generated docs locally Author: Joseph K. Bradley <joseph@databricks.com> Closes #14213 from jkbradley/ml-guide-2.0.
* [DOC][SQL] update out-of-date code snippets using SQLContext in all documents.WeichenXu2016-07-061-18/+21
| | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? I search the whole documents directory using SQLContext, and update the following places: - docs/configuration.md, sparkR code snippets. - docs/streaming-programming-guide.md, several example code. ## How was this patch tested? N/A Author: WeichenXu <WeichenXu123@outlook.com> Closes #14025 from WeichenXu123/WIP_SQLContext_update.
* [SPARK-7848][STREAMING][UPDATE SPARKSTREAMING DOCS TO INCORPORATE IMPORTANT ↵Nirman Narang2016-06-151-0/+19
| | | | | | | | | | POINTS.] Updated the SparkStreaming Doc with some important points. Author: Nirman Narang <narang@us.ibm.com> Closes #11114 from nirmannarang/SPARK-7848.
* [SPARK-15086][CORE][STREAMING] Deprecate old Java accumulator APISean Owen2016-06-121-4/+4
| | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? - Deprecate old Java accumulator API; should use Scala now - Update Java tests and examples - Don't bother testing old accumulator API in Java 8 (too) - (fix a misspelling too) ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #13606 from srowen/SPARK-15086.
* [MINOR] fix typo in documentsWeichenXu2016-06-071-1/+1
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? I use spell check tools checks typo in spark documents and fix them. ## How was this patch tested? N/A Author: WeichenXu <WeichenXu123@outlook.com> Closes #13538 from WeichenXu123/fix_doc_typo.
* [SPARK-15208][WIP][CORE][STREAMING][DOCS] Update Spark examples with ↵Liwei Lin2016-06-021-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | AccumulatorV2 ## What changes were proposed in this pull request? The patch updates the codes & docs in the example module as well as the related doc module: - [ ] [docs] `streaming-programming-guide.md` - [x] scala code part - [ ] java code part - [ ] python code part - [x] [examples] `RecoverableNetworkWordCount.scala` - [ ] [examples] `JavaRecoverableNetworkWordCount.java` - [ ] [examples] `recoverable_network_wordcount.py` ## How was this patch tested? Ran the examples and verified results manually. Author: Liwei Lin <lwlin7@gmail.com> Closes #12981 from lw-lin/accumulatorV2-examples.
* [DOCS] fix example code issues in documentationMatthew Wise2016-05-301-2/+2
| | | | | | | | | | | | ## What changes were proposed in this pull request? Fixed broken java code examples in streaming documentation Attn: tdas Author: Matthew Wise <matthew.rs.wise@gmail.com> Closes #13388 from mawise/fix_docs_java_streaming_example.
* [MINOR] Fix Typos 'a -> an'Zheng RuiFeng2016-05-261-2/+2
| | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? `a` -> `an` I use regex to generate potential error lines: `grep -in ' a [aeiou]' mllib/src/main/scala/org/apache/spark/ml/*/*scala` and review them line by line. ## How was this patch tested? local build `lint-java` checking Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #13317 from zhengruifeng/a_an.
* [SPARK-15171][SQL] Remove the references to deprecated method ↵Sean Zhong2016-05-181-6/+6
| | | | | | | | | | | | | | | | | dataset.registerTempTable ## What changes were proposed in this pull request? Update the unit test code, examples, and documents to remove calls to deprecated method `dataset.registerTempTable`. ## How was this patch tested? This PR only changes the unit test code, examples, and comments. It should be safe. This is a follow up of PR https://github.com/apache/spark/pull/12945 which was merged. Author: Sean Zhong <seanzhong@databricks.com> Closes #13098 from clockfly/spark-15171-remove-deprecation.
* [SPARK-15085][STREAMING][KAFKA] Rename streaming-kafka artifactcody koeninger2016-05-111-2/+2
| | | | | | | | | | | | ## What changes were proposed in this pull request? Renaming the streaming-kafka artifact to include kafka version, in anticipation of needing a different artifact for later kafka versions ## How was this patch tested? Unit tests Author: cody koeninger <cody@koeninger.org> Closes #12946 from koeninger/SPARK-15085.
* [SPARK-14342][CORE][DOCS][TESTS] Remove straggler references to TachyonLiwei Lin2016-04-021-1/+1
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Straggler references to Tachyon were removed: - for docs, `tachyon` has been generalized as `off-heap memory`; - for Mesos test suits, the key-value `tachyon:true`/`tachyon:false` has been changed to `os:centos`/`os:ubuntu`, since `os` is an example constrain used by the [Mesos official docs](http://mesos.apache.org/documentation/attributes-resources/). ## How was this patch tested? Existing test suites. Author: Liwei Lin <lwlin7@gmail.com> Closes #12129 from lw-lin/tachyon-cleanup.
* [SPARK-13874][DOC] Remove docs of streaming-akka, streaming-zeromq, ↵Shixiong Zhu2016-03-261-54/+13
| | | | | | | | | | | | | | | | | | streaming-mqtt and streaming-twitter ## What changes were proposed in this pull request? This PR removes all docs about the old streaming-akka, streaming-zeromq, streaming-mqtt and streaming-twitter projects since I have already copied them to https://github.com/spark-packages Also remove mqtt_wordcount.py that I forgot to remove previously. ## How was this patch tested? Jenkins PR Build. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11824 from zsxwing/remove-doc.
* [SPARK-13888][DOC] Remove Akka Receiver doc and refer to the DStream Akka ↵Shixiong Zhu2016-03-141-17/+7
| | | | | | | | | | | | | | | | | | | | project ## What changes were proposed in this pull request? I have copied the docs of Streaming Akka to https://github.com/spark-packages/dstream-akka/blob/master/README.md So we can remove them from Spark now. ## How was this patch tested? Only document changes. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: Shixiong Zhu <shixiong@databricks.com> Closes #11711 from zsxwing/remove-akka-doc.
* [SPARK-13702][CORE][SQL][MLLIB] Use diamond operator for generic instance ↵Dongjoon Hyun2016-03-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | creation in Java code. ## What changes were proposed in this pull request? In order to make `docs/examples` (and other related code) more simple/readable/user-friendly, this PR replaces existing codes like the followings by using `diamond` operator. ``` - final ArrayList<Product2<Object, Object>> dataToWrite = - new ArrayList<Product2<Object, Object>>(); + final ArrayList<Product2<Object, Object>> dataToWrite = new ArrayList<>(); ``` Java 7 or higher supports **diamond** operator which replaces the type arguments required to invoke the constructor of a generic class with an empty set of type parameters (<>). Currently, Spark Java code use mixed usage of this. ## How was this patch tested? Manual. Pass the existing tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11541 from dongjoon-hyun/SPARK-13702.
* [SPARK-13705][DOCS] UpdateStateByKey Operation documentation incorrectly ↵rmishra2016-03-071-4/+1
| | | | | | | | | | | | | | refers to StatefulNetworkWordCount ## What changes were proposed in this pull request? The reference to StatefulNetworkWordCount.scala from updateStatesByKey documentation should be removed, till there is a example for updateStatesByKey. ## How was this patch tested? Have tested the new documentation with jekyll build. Author: rmishra <rmishra@pivotal.io> Closes #11545 from rishitesh/SPARK-13705.
* [MINOR][DOCS] Fix all typos in markdown files of `doc` and similar patterns ↵Dongjoon Hyun2016-02-221-1/+1
| | | | | | | | | | | | | | | | | in other comments ## What changes were proposed in this pull request? This PR tries to fix all typos in all markdown files under `docs` module, and fixes similar typos in other comments, too. ## How was the this patch tested? manual tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11300 from dongjoon-hyun/minor_fix_typos.
* [SPARK-13339][DOCS] Clarify commutative / associative operator requirements ↵Sean Owen2016-02-191-2/+2
| | | | | | | | | | | | for reduce, fold Clarify that reduce functions need to be commutative, and fold functions do not See https://github.com/apache/spark/pull/11091 Author: Sean Owen <sowen@cloudera.com> Closes #11217 from srowen/SPARK-13339.
* [SPARK-12414][CORE] Remove closure serializerSean Owen2016-02-101-2/+0
| | | | | | | | | | Remove spark.closure.serializer option and use JavaSerializer always CC andrewor14 rxin I see there's a discussion in the JIRA but just thought I'd offer this for a look at what the change would be. Author: Sean Owen <sowen@cloudera.com> Closes #11150 from srowen/SPARK-12414.
* [SPARK-3369][CORE][STREAMING] Java mapPartitions Iterator->Iterable is ↵Sean Owen2016-01-261-2/+2
| | | | | | | | | | | | inconsistent with Scala's Iterator->Iterator Fix Java function API methods for flatMap and mapPartitions to require producing only an Iterator, not Iterable. Also fix DStream.flatMap to require a function producing TraversableOnce only, not Traversable. CC rxin pwendell for API change; tdas since it also touches streaming. Author: Sean Owen <sowen@cloudera.com> Closes #10413 from srowen/SPARK-3369.
* [SPARK-7799][SPARK-12786][STREAMING] Add "streaming-akka" projectShixiong Zhu2016-01-201-2/+2
| | | | | | | | | | | | | Include the following changes: 1. Add "streaming-akka" project and org.apache.spark.streaming.akka.AkkaUtils for creating an actorStream 2. Remove "StreamingContext.actorStream" and "JavaStreamingContext.actorStream" 3. Update the ActorWordCount example and add the JavaActorWordCount example 4. Make "streaming-zeromq" depend on "streaming-akka" and update the codes accordingly Author: Shixiong Zhu <shixiong@databricks.com> Closes #10744 from zsxwing/streaming-akka-2.
* [SPARK-4819] Remove Guava's "Optional" from public APISean Owen2016-01-081-1/+0
| | | | | | | | | | Replace Guava `Optional` with (an API clone of) Java 8 `java.util.Optional` (edit: and a clone of Guava `Optional`) See also https://github.com/apache/spark/pull/10512 Author: Sean Owen <sowen@cloudera.com> Closes #10513 from srowen/SPARK-4819.
* [SPARK-12507][STREAMING][DOCUMENT] Expose closeFileAfterWrite and ↵Shixiong Zhu2016-01-071-7/+5
| | | | | | | | | | allowBatching configurations for Streaming /cc tdas brkyvz Author: Shixiong Zhu <shixiong@databricks.com> Closes #10453 from zsxwing/streaming-conf.
* [SPARK-12429][STREAMING][DOC] Add Accumulator and Broadcast example for ↵Shixiong Zhu2015-12-221-0/+165
| | | | | | | | | | Streaming This PR adds Scala, Java and Python examples to show how to use Accumulator and Broadcast in Spark Streaming to support checkpointing. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10385 from zsxwing/accumulator-broadcast-example.
* [SPARK-11910][STREAMING][DOCS] Update twitter4j dependency versionLuciano Resende2015-11-231-1/+1
| | | | | | Author: Luciano Resende <lresende@apache.org> Closes #9892 from lresende/SPARK-11910.
* [SPARK-11751] Doc describe error in the "Spark Streaming Programming Guide" pageyangping.wu2015-11-171-2/+1
| | | | | | | | | | | In the **[Task Launching Overheads](http://spark.apache.org/docs/latest/streaming-programming-guide.html#task-launching-overheads)** section, >Task Serialization: Using Kryo serialization for serializing tasks can reduce the task sizes, and therefore reduce the time taken to send them to the slaves. as we known **Task Serialization** is configuration by **spark.closure.serializer** parameter, but currently only the Java serializer is supported. If we set **spark.closure.serializer** to **org.apache.spark.serializer.KryoSerializer**, then this will throw a exception. Author: yangping.wu <wyphao.2007@163.com> Closes #9734 from 397090770/397090770-patch-1.
* [DOCS] Fix typo for Python section on unifying Kafka streamschriskang902015-11-091-2/+2
| | | | | | | | | | | 1) kafkaStreams is a list. The list should be unpacked when passing it into the streaming context union method, which accepts a variable number of streams. 2) print() should be pprint() for pyspark. This contribution is my original work, and I license the work to the project under the project's open source license. Author: chriskang90 <jckang@uchicago.edu> Closes #9545 from c-kang/streaming_python_typo.
* [SPARK-10492] [STREAMING] [DOCUMENTATION] Update Streaming documentation ↵Tathagata Das2015-09-081-1/+12
| | | | | | | | | | about rate limiting and backpressure Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #8656 from tdas/SPARK-10492 and squashes the following commits: 986cdd6 [Tathagata Das] Added information on backpressure
* [SPARK-10440] [STREAMING] [DOCS] Update python API stuff in the programming ↵Tathagata Das2015-09-041-10/+4
| | | | | | | | | | | guides and python docs - Fixed information around Python API tags in streaming programming guides - Added missing stuff in python docs Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #8595 from tdas/SPARK-10440.
* Update streaming-programming-guide.mdKeiji Yoshida2015-08-231-1/+1
| | | | | | | | Update `See the Scala example` to `See the Java example`. Author: Keiji Yoshida <yoshida.keiji.84@gmail.com> Closes #8376 from yosssi/patch-1.
* [SPARK-5155] [PYSPARK] [STREAMING] Mqtt streaming support in PythonPrabeesh K2015-08-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This PR is based on #4229, thanks prabeesh. Closes #4229 Author: Prabeesh K <prabsmails@gmail.com> Author: zsxwing <zsxwing@gmail.com> Author: prabs <prabsmails@gmail.com> Author: Prabeesh K <prabeesh.k@namshi.com> Closes #7833 from zsxwing/pr4229 and squashes the following commits: 9570bec [zsxwing] Fix the variable name and check null in finally 4a9c79e [zsxwing] Fix pom.xml indentation abf5f18 [zsxwing] Merge branch 'master' into pr4229 935615c [zsxwing] Fix the flaky MQTT tests 47278c5 [zsxwing] Include the project class files 478f844 [zsxwing] Add unpack 5f8a1d4 [zsxwing] Make the maven build generate the test jar for Python MQTT tests 734db99 [zsxwing] Merge branch 'master' into pr4229 126608a [Prabeesh K] address the comments b90b709 [Prabeesh K] Merge pull request #1 from zsxwing/pr4229 d07f454 [zsxwing] Register StreamingListerner before starting StreamingContext; Revert unncessary changes; fix the python unit test a6747cb [Prabeesh K] wait for starting the receiver before publishing data 87fc677 [Prabeesh K] address the comments: 97244ec [zsxwing] Make sbt build the assembly test jar for streaming mqtt 80474d1 [Prabeesh K] fix 1f0cfe9 [Prabeesh K] python style fix e1ee016 [Prabeesh K] scala style fix a5a8f9f [Prabeesh K] added Python test 9767d82 [Prabeesh K] implemented Python-friendly class a11968b [Prabeesh K] fixed python style 795ec27 [Prabeesh K] address comments ee387ae [Prabeesh K] Fix assembly jar location of mqtt-assembly 3f4df12 [Prabeesh K] updated version b34c3c1 [prabs] adress comments 3aa7fff [prabs] Added Python streaming mqtt word count example b7d42ff [prabs] Mqtt streaming support in Python
* [SPARK-9601] [DOCS] Fix JavaPairDStream signature for stream-stream and ↵Namit Katariya2015-08-051-2/+2
| | | | | | | | | | windowed join in streaming guide doc Author: Namit Katariya <katariya.namit@gmail.com> Closes #7935 from namitk/SPARK-9601 and squashes the following commits: 03b5784 [Namit Katariya] [SPARK-9601] Fix signature of JavaPairDStream for stream-stream and windowed join in streaming guide doc
* [SPARK-9490] [DOCS] [MLLIB] MLlib evaluation metrics guide example python ↵Sean Owen2015-07-311-1/+1
| | | | | | | | | | | | | | code uses deprecated print statement Use print(x) not print x for Python 3 in eval examples CC sethah mengxr -- just wanted to close this out before 1.5 Author: Sean Owen <sowen@cloudera.com> Closes #7822 from srowen/SPARK-9490 and squashes the following commits: 01abeba [Sean Owen] Change "print x" to "print(x)" in the rest of the docs too bd7f7fb [Sean Owen] Use print(x) not print x for Python 3 in eval examples
* [DOCS] Added important updateStateByKey detailsMichael Vogiatzis2015-07-091-0/+2
| | | | | | | | | | | | | Runs for *all* existing keys and returning "None" will remove the key-value pair. Author: Michael Vogiatzis <michaelvogiatzis@gmail.com> Closes #7229 from mvogiatzis/patch-1 and squashes the following commits: e7a2946 [Michael Vogiatzis] Updated updateStateByKey text 00283ed [Michael Vogiatzis] Removed space c2656f9 [Michael Vogiatzis] Moved description farther up 0a42551 [Michael Vogiatzis] Added important updateStateByKey details
* [SPARK-8378] [STREAMING] Add the Python API for Flumezsxwing2015-07-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Author: zsxwing <zsxwing@gmail.com> Closes #6830 from zsxwing/flume-python and squashes the following commits: 78dfdac [zsxwing] Fix the compile error in the test code f1bf3c0 [zsxwing] Address TD's comments 0449723 [zsxwing] Add sbt goal streaming-flume-assembly/assembly e93736b [zsxwing] Fix the test case for determine_modules_to_test 9d5821e [zsxwing] Fix pyspark_core dependencies f9ee681 [zsxwing] Merge branch 'master' into flume-python 7a55837 [zsxwing] Add streaming_flume_assembly to run-tests.py b96b0de [zsxwing] Merge branch 'master' into flume-python ce85e83 [zsxwing] Fix incompatible issues for Python 3 01cbb3d [zsxwing] Add import sys 152364c [zsxwing] Fix the issue that StringIO doesn't work in Python 3 14ba0ff [zsxwing] Add flume-assembly for sbt building b8d5551 [zsxwing] Merge branch 'master' into flume-python 4762c34 [zsxwing] Fix the doc 0336579 [zsxwing] Refactor Flume unit tests and also add tests for Python API 9f33873 [zsxwing] Add the Python API for Flume
* [SPARK-8320] [STREAMING] Add example in streaming programming guide that ↵Neelesh Srinivas Salian2015-06-181-0/+8
| | | | | | | | | | | | | | | | | | | shows union of multiple input streams Added python code to https://spark.apache.org/docs/latest/streaming-programming-guide.html to the Level of Parallelism in Data Receiving section. Please review and let me know if there are any additional changes that are needed. Thank you. Author: Neelesh Srinivas Salian <nsalian@cloudera.com> Closes #6862 from nssalian/SPARK-8320 and squashes the following commits: 4bfd126 [Neelesh Srinivas Salian] Changed loop structure to be more in line with Python style e5345de [Neelesh Srinivas Salian] Changes to kafak append, for loop and show to print() 3fc5c6d [Neelesh Srinivas Salian] SPARK-8320