spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-11627] Add initial input rate limit for spark streaming backpressure ↵	junhao	2016-02-16	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	mechanism. https://issues.apache.org/jira/browse/SPARK-11627 Spark Streaming backpressure mechanism has no initial input rate limit, it might cause OOM exception. In the firest batch task ,receivers receive data at the maximum speed they can reach,it might exhaust executors memory resources. Add a initial input rate limit value can make sure the Streaming job execute success in the first batch,then the backpressure mechanism can adjust receiving rate adaptively. Author: junhao <junhao@mogujie.com> Closes #9593 from junhaoMg/junhao-dev.
*	[SPARK-13280][STREAMING] Use a better logger name for FileBasedWriteAheadLog.	Marcelo Vanzin	2016-02-16	1	-5/+15
\| \| \| \| \| \| \| \| \| \|	The new logger name is under the org.apache.spark namespace. The detection of the caller name was also enhanced a bit to ignore some common things that show up in the call stack. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11165 from vanzin/SPARK-13280.
*	[SPARK-13172][CORE][SQL] Stop using RichException.getStackTrace it is deprecated	Sean Owen	2016-02-13	1	-2/+2
\| \| \| \| \| \| \| \|	Replace `getStackTraceString` with `Utils.exceptionString` Author: Sean Owen <sowen@cloudera.com> Closes #11182 from srowen/SPARK-13172.
*	[STREAMING][TEST] Fix flaky streaming.FailureSuite	Tathagata Das	2016-02-11	2	-2/+6
\| \| \| \| \| \| \| \| \| \|	Under some corner cases, the test suite failed to shutdown the SparkContext causing cascaded failures. This fix does two things - Makes sure no SparkContext is active after every test - Makes sure StreamingContext is always shutdown (prevents leaking of StreamingContexts as well, just in case) Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #11166 from tdas/fix-failuresuite.
*	[SPARK-13170][STREAMING] Investigate replacing SynchronizedQueue as it is ↵	Sean Owen	2016-02-09	3	-20/+34
\| \| \| \| \| \| \| \| \| \|	deprecated Replace SynchronizeQueue with synchronized access to a Queue Author: Sean Owen <sowen@cloudera.com> Closes #11111 from srowen/SPARK-13170.
*	[SPARK-13165][STREAMING] Replace deprecated synchronizedBuffer in streaming	Holden Karau	2016-02-09	15	-176/+188
\| \| \| \| \| \| \| \| \| \| \| \|	Building with Scala 2.11 results in the warning trait SynchronizedBuffer in package mutable is deprecated: Synchronization via traits is deprecated as it is inherently unreliable. Consider java.util.concurrent.ConcurrentLinkedQueue as an alternative - we already use ConcurrentLinkedQueue elsewhere so lets replace it. Some notes about how behaviour is different for reviewers: The Seq from a SynchronizedBuffer that was implicitly converted would continue to receive updates - however when we do the same conversion explicitly on the ConcurrentLinkedQueue this isn't the case. Hence changing some of the (internal & test) APIs to pass an Iterable. toSeq is safe to use if there are no more updates. Author: Holden Karau <holden@us.ibm.com> Author: tedyu <yuzhihong@gmail.com> Closes #11067 from holdenk/SPARK-13165-replace-deprecated-synchronizedBuffer-in-streaming.
*	[SPARK-13195][STREAMING] Fix NoSuchElementException when a state is not set ↵	Shixiong Zhu	2016-02-04	2	-1/+7
\| \| \| \| \| \| \| \| \| \|	but timeoutThreshold is defined Check the state Existence before calling get. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11081 from zsxwing/SPARK-13195.
*	[SPARK-12739][STREAMING] Details of batch in Streaming tab uses two Duration ↵	Mario Briggs	2016-02-03	2	-4/+5
\| \| \| \| \| \| \| \| \| \| \|	columns I have clearly prefix the two 'Duration' columns in 'Details of Batch' Streaming tab as 'Output Op Duration' and 'Job Duration' Author: Mario Briggs <mario.briggs@in.ibm.com> Author: mariobriggs <mariobriggs@in.ibm.com> Closes #11022 from mariobriggs/spark-12739.
*	[SPARK-13121][STREAMING] java mapWithState mishandles scala Option	Gabriele Nizzoli	2016-02-02	1	-1/+1
\| \| \| \| \| \| \| \|	Already merged into 1.6 branch, this PR is to commit to master the same change Author: Gabriele Nizzoli <mail@nizzoli.net> Closes #11028 from gabrielenizzoli/patch-1.
*	[SPARK-6847][CORE][STREAMING] Fix stack overflow issue when updateStateByKey ↵	Shixiong Zhu	2016-02-01	3	-2/+79
\| \| \| \| \| \| \| \| \| \|	is followed by a checkpointed dstream Add a local property to indicate if checkpointing all RDDs that are marked with the checkpoint flag, and enable it in Streaming Author: Shixiong Zhu <shixiong@databricks.com> Closes #10934 from zsxwing/recursive-checkpoint.
*	[SPARK-6363][BUILD] Make Scala 2.11 the default Scala version	Josh Rosen	2016-01-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds). The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance). After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break. Author: Josh Rosen <joshrosen@databricks.com> Closes #10608 from JoshRosen/SPARK-6363.
*	[SPARK-3369][CORE][STREAMING] Java mapPartitions Iterator->Iterable is ↵	Sean Owen	2016-01-26	4	-18/+16
\| \| \| \| \| \| \| \| \| \| \| \|	inconsistent with Scala's Iterator->Iterator Fix Java function API methods for flatMap and mapPartitions to require producing only an Iterator, not Iterable. Also fix DStream.flatMap to require a function producing TraversableOnce only, not Traversable. CC rxin pwendell for API change; tdas since it also touches streaming. Author: Sean Owen <sowen@cloudera.com> Closes #10413 from srowen/SPARK-3369.
*	[STREAMING][MINOR] Scaladoc + logs	Jacek Laskowski	2016-01-23	4	-6/+5
\| \| \| \| \| \| \| \|	Found while doing code review Author: Jacek Laskowski <jacek@japila.pl> Closes #10878 from jaceklaskowski/streaming-scaladoc-logs-tiny-fixes.
*	[SPARK-11137][STREAMING] Make StreamingContext.stop() exception-safe	jayadevanmurali	2016-01-23	1	-4/+12
\| \| \| \| \| \| \| \|	Make StreamingContext.stop() exception-safe Author: jayadevanmurali <jayadevan.m@tcs.com> Closes #10807 from jayadevanmurali/branch-0.1-SPARK-11137.
*	[SPARK-12859][STREAMING][WEB UI] Names of input streams with receivers don't ↵	Alex Bozarth	2016-01-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	fit in Streaming page Added CSS style to force names of input streams with receivers to wrap Author: Alex Bozarth <ajbozart@us.ibm.com> Closes #10873 from ajbozarth/spark12859.
*	[SPARK-7997][CORE] Remove Akka from Spark Core and Streaming	Shixiong Zhu	2016-01-22	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \|	- Remove Akka dependency from core. Note: the streaming-akka project still uses Akka. - Remove HttpFileServer - Remove Akka configs from SparkConf and SSLOptions - Rename `spark.akka.frameSize` to `spark.rpc.message.maxSize`. I think it's still worth to keep this config because using `DirectTaskResult` or `IndirectTaskResult` depends on it. - Update comments and docs Author: Shixiong Zhu <shixiong@databricks.com> Closes #10854 from zsxwing/remove-akka.
*	[SPARK-7799][SPARK-12786][STREAMING] Add "streaming-akka" project	Shixiong Zhu	2016-01-20	3	-332/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Include the following changes: 1. Add "streaming-akka" project and org.apache.spark.streaming.akka.AkkaUtils for creating an actorStream 2. Remove "StreamingContext.actorStream" and "JavaStreamingContext.actorStream" 3. Update the ActorWordCount example and add the JavaActorWordCount example 4. Make "streaming-zeromq" depend on "streaming-akka" and update the codes accordingly Author: Shixiong Zhu <shixiong@databricks.com> Closes #10744 from zsxwing/streaming-akka-2.
*	[SPARK-12847][CORE][STREAMING] Remove StreamingListenerBus and post all ↵	Shixiong Zhu	2016-01-20	6	-25/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Streaming events to the same thread as Spark events Including the following changes: 1. Add StreamingListenerForwardingBus to WrappedStreamingListenerEvent process events in `onOtherEvent` to StreamingListener 2. Remove StreamingListenerBus 3. Merge AsynchronousListenerBus and LiveListenerBus to the same class LiveListenerBus 4. Add `logEvent` method to SparkListenerEvent so that EventLoggingListener can use it to ignore WrappedStreamingListenerEvents Author: Shixiong Zhu <shixiong@databricks.com> Closes #10779 from zsxwing/streaming-listener.
*	[SPARK-10985][CORE] Avoid passing evicted blocks throughout BlockManager	Josh Rosen	2016-01-18	1	-4/+4
\| \| \| \| \| \| \| \|	This patch refactors portions of the BlockManager and CacheManager in order to avoid having to pass `evictedBlocks` lists throughout the code. It appears that these lists were only consumed by `TaskContext.taskMetrics`, so the new code now directly updates the metrics from the lower-level BlockManager methods. Author: Josh Rosen <joshrosen@databricks.com> Closes #10776 from JoshRosen/SPARK-10985.
*	[SPARK-12652][PYSPARK] Upgrade Py4J to 0.9.1	Shixiong Zhu	2016-01-12	1	-10/+0
\| \| \| \| \| \| \| \| \| \| \| \|	- [x] Upgrade Py4J to 0.9.1 - [x] SPARK-12657: Revert SPARK-12617 - [x] SPARK-12658: Revert SPARK-12511 - Still keep the change that only reading checkpoint once. This is a manual change and worth to take a look carefully. https://github.com/zsxwing/spark/commit/bfd4b5c040eb29394c3132af3c670b1a7272457c - [x] Verify no leak any more after reverting our workarounds Author: Shixiong Zhu <shixiong@databricks.com> Closes #10692 from zsxwing/py4j-0.9.1.
*	[SPARK-12692][BUILD][STREAMING] Scala style: Fix the style violation (Space ↵	Kousuke Saruta	2016-01-11	21	-67/+67
\| \| \| \| \| \| \| \| \| \| \|	before "," or ":") Fix the style violation (space before , and :). This PR is a followup for #10643. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10685 from sarutak/SPARK-12692-followup-streaming.
*	[SPARK-3873][BUILD] Enable import ordering error checking.	Marcelo Vanzin	2016-01-10	3	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	Turn import ordering violations into build errors, plus a few adjustments to account for how the checker behaves. I'm a little on the fence about whether the existing code is right, but it's easier to appease the checker than to discuss what's the more correct order here. Plus a few fixes to imports that cropped in since my recent cleanups. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10612 from vanzin/SPARK-3873-enable.
*	[SPARK-4819] Remove Guava's "Optional" from public API	Sean Owen	2016-01-08	4	-10/+11
\| \| \| \| \| \| \| \| \| \|	Replace Guava `Optional` with (an API clone of) Java 8 `java.util.Optional` (edit: and a clone of Guava `Optional`) See also https://github.com/apache/spark/pull/10512 Author: Sean Owen <sowen@cloudera.com> Closes #10513 from srowen/SPARK-4819.
*	[SPARK-12618][CORE][STREAMING][SQL] Clean up build warnings: 2.0.0 edition	Sean Owen	2016-01-08	3	-49/+33
\| \| \| \| \| \| \| \|	Fix most build warnings: mostly deprecated API usages. I'll annotate some of the changes below. CC rxin who is leading the charge to remove the deprecated APIs. Author: Sean Owen <sowen@cloudera.com> Closes #10570 from srowen/SPARK-12618.
*	[SPARK-12591][STREAMING] Register OpenHashMapBasedStateMap for Kryo	Shixiong Zhu	2016-01-07	2	-34/+133
\| \| \| \| \| \| \| \|	The default serializer in Kryo is FieldSerializer and it ignores transient fields and never calls `writeObject` or `readObject`. So we should register OpenHashMapBasedStateMap using `DefaultSerializer` to make it work with Kryo. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10609 from zsxwing/SPARK-12591.
*	[SPARK-12510][STREAMING] Refactor ActorReceiver to support Java	Shixiong Zhu	2016-01-07	2	-12/+56
\| \| \| \| \| \| \| \| \| \| \| \| \|	This PR includes the following changes: 1. Rename `ActorReceiver` to `ActorReceiverSupervisor` 2. Remove `ActorHelper` 3. Add a new `ActorReceiver` for Scala and `JavaActorReceiver` for Java 4. Add `JavaActorWordCount` example Author: Shixiong Zhu <shixiong@databricks.com> Closes #10457 from zsxwing/java-actor-stream.
*	[STREAMING][MINOR] More contextual information in logs + minor code i…	Jacek Laskowski	2016-01-07	8	-66/+61
\| \| \| \| \| \| \| \| \| \|	…mprovements Please review and merge at your convenience. Thanks! Author: Jacek Laskowski <jacek@japila.pl> Closes #10595 from jaceklaskowski/streaming-minor-fixes.
*	[SPARK-7689] Remove TTL-based metadata cleaning in Spark 2.0	Josh Rosen	2016-01-06	3	-24/+12
\| \| \| \| \| \| \| \| \| \| \| \|	This PR removes `spark.cleaner.ttl` and the associated TTL-based metadata cleaning code. Now that we have the `ContextCleaner` and a timer to trigger periodic GCs, I don't think that `spark.cleaner.ttl` is necessary anymore. The TTL-based cleaning isn't enabled by default, isn't included in our end-to-end tests, and has been a source of user confusion when it is misconfigured. If the TTL is set too low, data which is still being used may be evicted / deleted, leading to hard to diagnose bugs. For all of these reasons, I think that we should remove this functionality in Spark 2.0. Additional benefits of doing this include marginally reduced memory usage, since we no longer need to store timetsamps in hashmaps, and a handful fewer threads. Author: Josh Rosen <joshrosen@databricks.com> Closes #10534 from JoshRosen/remove-ttl-based-cleaning.
*	[SPARK-12604][CORE] Java count(AprroxDistinct)ByKey methods return Scala ↵	Sean Owen	2016-01-06	2	-12/+13
\| \| \| \| \| \| \| \| \| \|	Long not Java Change Java countByKey, countApproxDistinctByKey return types to use Java Long, not Scala; update similar methods for consistency on java.long.Long.valueOf with no API change Author: Sean Owen <sowen@cloudera.com> Closes #10554 from srowen/SPARK-12604.
*	Revert "[SPARK-12672][STREAMING][UI] Use the uiRoot function instead of ↵	Shixiong Zhu	2016-01-06	1	-3/+2
\| \| \| \| \| \|	default root path to gain the streaming batch url." This reverts commit 19e4e9febf9bb4fd69f6d7bc13a54844e4e096f1. Will merge #10618 instead.
*	[SPARK-12672][STREAMING][UI] Use the uiRoot function instead of default root ↵	huangzhaowei	2016-01-06	1	-2/+3
\| \| \| \| \| \| \| \|	path to gain the streaming batch url. Author: huangzhaowei <carlmartinmax@gmail.com> Closes #10617 from SaintBacchus/SPARK-12672.
*	[SPARK-3873][TESTS] Import ordering fixes.	Marcelo Vanzin	2016-01-05	17	-48/+46
\| \| \| \| \| \|	Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10582 from vanzin/SPARK-3873-tests.
*	[SPARK-12511] [PYSPARK] [STREAMING] Make sure ↵	Shixiong Zhu	2016-01-05	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \|	PythonDStream.registerSerializer is called only once There is an issue that Py4J's PythonProxyHandler.finalize blocks forever. (https://github.com/bartdag/py4j/pull/184) Py4j will create a PythonProxyHandler in Java for "transformer_serializer" when calling "registerSerializer". If we call "registerSerializer" twice, the second PythonProxyHandler will override the first one, then the first one will be GCed and trigger "PythonProxyHandler.finalize". To avoid that, we should not call"registerSerializer" more than once, so that "PythonProxyHandler" in Java side won't be GCed. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10514 from zsxwing/SPARK-12511.
*	[SPARK-12608][STREAMING] Remove submitJobThreadPool since submitJob doesn't ↵	Shixiong Zhu	2016-01-04	1	-6/+1
\| \| \| \| \| \| \| \| \| \|	create a separate thread to wait for the job result Before #9264, submitJob would create a separate thread to wait for the job result. `submitJobThreadPool` was a workaround in `ReceiverTracker` to run these waiting-job-result threads. Now #9264 has been merged to master and resolved this blocking issue, `submitJobThreadPool` can be removed now. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10560 from zsxwing/remove-submitJobThreadPool.
*	[SPARK-12513][STREAMING] SocketReceiver hang in Netcat example	guoxu1231	2016-01-04	1	-14/+24
\| \| \| \| \| \| \| \| \|	Explicitly close client side socket connection before restart socket receiver. Author: guoxu1231 <guoxu1231@gmail.com> Author: Shawn Guo <guoxu1231@gmail.com> Closes #10464 from guoxu1231/SPARK-12513.
*	[SPARK-12481][CORE][STREAMING][SQL] Remove usage of Hadoop deprecated APIs ↵	Sean Owen	2016-01-02	3	-11/+5
\| \| \| \| \| \| \| \| \| \|	and reflection that supported 1.x Remove use of deprecated Hadoop APIs now that 2.2+ is required Author: Sean Owen <sowen@cloudera.com> Closes #10446 from srowen/SPARK-12481.
*	[SPARK-3873][STREAMING] Import order fixes for streaming.	Marcelo Vanzin	2015-12-31	53	-125/+126
\| \| \| \| \| \| \| \|	Also included a few miscelaneous other modules that had very few violations. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10532 from vanzin/SPARK-3873-streaming.
*	[SPARK-12311][CORE] Restore previous value of "os.arch" property in test ↵	Kazuaki Ishizaki	2015-12-24	9	-21/+70
\| \| \| \| \| \| \| \| \| \| \| \|	suites after forcing to set specific value to "os.arch" property Restore the original value of os.arch property after each test Since some of tests forced to set the specific value to os.arch property, we need to set the original value. Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #10289 from kiszk/SPARK-12311.
*	[MINOR] Fix typos in JavaStreamingContext	Shixiong Zhu	2015-12-21	1	-4/+4
\| \| \| \| \| \|	Author: Shixiong Zhu <shixiong@databricks.com> Closes #10424 from zsxwing/typo.
*	Bump master version to 2.0.0-SNAPSHOT.	Reynold Xin	2015-12-19	1	-1/+1
\| \| \| \| \| \|	Author: Reynold Xin <rxin@databricks.com> Closes #10387 from rxin/version-bump.
*	[SPARK-11749][STREAMING] Duplicate creating the RDD in file stream when ↵	jhu-chang	2015-12-17	2	-9/+62
\| \| \| \| \| \| \| \| \| \|	recovering from checkpoint data Add a transient flag `DStream.restoredFromCheckpointData` to control the restore processing in DStream to avoid duplicate works: check this flag first in `DStream.restoreCheckpointData`, only when `false`, the restore process will be executed. Author: jhu-chang <gt.hu.chang@gmail.com> Closes #9765 from jhu-chang/SPARK-11749.
*	[SPARK-12410][STREAMING] Fix places that use '.' and '\|' directly in split	Shixiong Zhu	2015-12-17	1	-1/+1
\| \| \| \| \| \| \| \|	String.split accepts a regular expression, so we should escape "." and "\|". Author: Shixiong Zhu <shixiong@databricks.com> Closes #10361 from zsxwing/reg-bug.
*	[SPARK-12304][STREAMING] Make Spark Streaming web UI display more fri…	proflin	2015-12-15	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	…endly Receiver graphs Currently, the Spark Streaming web UI uses the same maxY when displays 'Input Rate Times& Histograms' and 'Per-Receiver Times& Histograms'. This may lead to somewhat un-friendly graphs: once we have tens of Receivers or more, every 'Per-Receiver Times' line almost hits the ground. This issue proposes to calculate a new maxY against the original one, which is shared among all the `Per-Receiver Times& Histograms' graphs. Before: ![before-5](https://cloud.githubusercontent.com/assets/15843379/11761362/d790c356-a0fa-11e5-860e-4b834603de1d.png) After: ![after-5](https://cloud.githubusercontent.com/assets/15843379/11761361/cfabf692-a0fa-11e5-97d0-4ad124aaca2a.png) Author: proflin <proflin.me@gmail.com> Closes #10318 from proflin/SPARK-12304.
*	[STREAMING][MINOR] Fix typo in function name of StateImpl	jerryshao	2015-12-15	3	-3/+3
\| \| \| \| \| \| \| \|	cc\ tdas zsxwing , please review. Thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes #10305 from jerryshao/fix-typo-state-impl.
*	[SPARK-12273][STREAMING] Make Spark Streaming web UI list Receivers in order	proflin	2015-12-11	1	-2/+3
\| \| \| \| \| \| \| \| \| \|	Currently the Streaming web UI does NOT list Receivers in order; however, it seems more convenient for the users if Receivers are listed in order. ![spark-12273](https://cloud.githubusercontent.com/assets/15843379/11736602/0bb7f7a8-a00b-11e5-8e86-96ba9297fb12.png) Author: proflin <proflin.me@gmail.com> Closes #10264 from proflin/Spark-12273.
*	[SPARK-11713] [PYSPARK] [STREAMING] Initial RDD updateStateByKey for PySpark	Bryan Cutler	2015-12-10	1	-2/+12
\| \| \| \| \| \| \| \|	Adding ability to define an initial state RDD for use with updateStateByKey PySpark. Added unit test and changed stateful_network_wordcount example to use initial RDD. Author: Bryan Cutler <bjcutler@us.ibm.com> Closes #10082 from BryanCutler/initial-rdd-updateStateByKey-SPARK-11713.
*	[SPARK-12136][STREAMING] rddToFileName does not properly handle prefix and ↵	bomeng	2015-12-10	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	suffix parameters The original code does not properly handle the cases where the prefix is null, but suffix is not null - the suffix should be used but is not. The fix is using StringBuilder to construct the proper file name. Author: bomeng <bmeng@us.ibm.com> Author: Bo Meng <mengbo@bos-macbook-pro.usca.ibm.com> Closes #10185 from bomeng/SPARK-12136.
*	[SPARK-12244][SPARK-12245][STREAMING] Rename trackStateByKey to mapWithState ↵	Tathagata Das	2015-12-09	10	-358/+367
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	and change tracking function signature SPARK-12244: Based on feedback from early users and personal experience attempting to explain it, the name trackStateByKey had two problem. "trackState" is a completely new term which really does not give any intuition on what the operation is the resultant data stream of objects returned by the function is called in docs as the "emitted" data for the lack of a better. "mapWithState" makes sense because the API is like a mapping function like (Key, Value) => T with State as an additional parameter. The resultant data stream is "mapped data". So both problems are solved. SPARK-12245: From initial experiences, not having the key in the function makes it hard to return mapped stuff, as the whole information of the records is not there. Basically the user is restricted to doing something like mapValue() instead of map(). So adding the key as a parameter. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #10224 from tdas/rename.
*	[SPARK-11932][STREAMING] Partition previous TrackStateRDD if partitioner not ↵	Tathagata Das	2015-12-07	6	-84/+258
\| \| \| \| \| \| \| \| \| \| \| \|	present The reason is that TrackStateRDDs generated by trackStateByKey expect the previous batch's TrackStateRDDs to have a partitioner. However, when recovery from DStream checkpoints, the RDDs recovered from RDD checkpoints do not have a partitioner attached to it. This is because RDD checkpoints do not preserve the partitioner (SPARK-12004). While #9983 solves SPARK-12004 by preserving the partitioner through RDD checkpoints, there may be a non-zero chance that the saving and recovery fails. To be resilient, this PR repartitions the previous state RDD if the partitioner is not detected. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #9988 from tdas/SPARK-11932.
*	[SPARK-12106][STREAMING][FLAKY-TEST] BatchedWAL test transiently flaky when ↵	Burak Yavuz	2015-12-07	2	-6/+14
\| \| \| \| \| \| \| \| \| \|	Jenkins load is high We need to make sure that the last entry is indeed the last entry in the queue. Author: Burak Yavuz <brkyvz@gmail.com> Closes #10110 from brkyvz/batch-wal-test-fix.