spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Various broken links in documentation	Patrick Wendell	2013-12-07	1	-4/+4
\|
*	Add a `repartition` operator.	Patrick Wendell	2013-10-24	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds an operator called repartition with more straightforward semantics than the current `coalesce` operator. There are a few use cases where this operator is useful: 1. If a user wants to increase the number of partitions in the RDD. This is more common now with streaming. E.g. a user is ingesting data on one node but they want to add more partitions to ensure parallelism of subsequent operations across threads or the cluster. Right now they have to call rdd.coalesce(numSplits, shuffle=true) - that's super confusing. 2. If a user has input data where the number of partitions is not known. E.g. > sc.textFile("some file").coalesce(50).... This is both vague semantically (am I growing or shrinking this RDD) but also, may not work correctly if the base RDD has fewer than 50 partitions. The new operator forces shuffles every time, so it will always produce exactly the number of new partitions. It also throws an exception rather than silently not-working if a bad input is passed. I am currently adding streaming tests (requires refactoring some of the test suite to allow testing at partition granularity), so this is not ready for merge yet. But feedback is welcome.
*	Add docs for standalone scheduler fault tolerance	Aaron Davidson	2013-10-08	1	-3/+2
\| \| \| \|	Also fix a couple HTML/Markdown issues in other files.
*	More fixes	Matei Zaharia	2013-09-01	1	-2/+10
\|
*	Fix more URLs in docs	Matei Zaharia	2013-09-01	1	-3/+3
\|
*	Update docs for new package	Matei Zaharia	2013-09-01	1	-14/+14
\|
*	Change build and run instructions to use assemblies	Matei Zaharia	2013-08-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit makes Spark invocation saner by using an assembly JAR to find all of Spark's dependencies instead of adding all the JARs in lib_managed. It also packages the examples into an assembly and uses that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script with two better-named scripts: "run-examples" for examples, and "spark-class" for Spark internal classes (e.g. REPL, master, etc). This is also designed to minimize the confusion people have in trying to use "run" to run their own classes; it's not meant to do that, but now at least if they look at it, they can modify run-examples to do a decent job for them. As part of this, Bagel's examples are also now properly moved to the examples package instead of bagel.
*	Linking custom receiver guide	Prashant Sharma	2013-08-23	1	-0/+3
\|
*	Fixes typos in Spark Streaming Programming Guide	Andy Konwinski	2013-07-12	1	-2/+2
\| \| \|	These typos were reported on the spark-users mailing list, see: https://groups.google.com/d/msg/spark-users/SyLGgJlKCrI/LpeBypOkSMUJ
*	Typos: cluser -> cluster	Andrew Ash	2013-04-10	1	-2/+2
\|
*	More doc tweaks	Matei Zaharia	2013-02-26	1	-0/+1
\|
*	Merge pull request #500 from pwendell/streaming-docs	Tathagata Das	2013-02-25	1	-2/+2
\|\ \| \| \| \|	Minor changes based on feedback
\| *	meta-data	Patrick Wendell	2013-02-25	1	-1/+1
\| \|
\| *	One more change done with TD	Patrick Wendell	2013-02-25	1	-1/+1
\| \|
\| *	Minor changes based on feedback	Patrick Wendell	2013-02-25	1	-2/+2
\| \|
* \|	Merge branch 'master' of github.com:mesos/spark	Matei Zaharia	2013-02-25	1	-4/+6
\|\\|
\| *	Some changes to streaming failure docs.	Patrick Wendell	2013-02-25	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \|	TD gave me the go-ahead to just make these changes: - Define stateful dstream - Some minor wording fixes
* \|	Allow passing sparkHome and JARs to StreamingContext constructor	Matei Zaharia	2013-02-25	1	-7/+3
\| \| \| \| \| \| \| \| \| \|	Also warns if spark.cleaner.ttl is not set in the version where you pass your own SparkContext.
* \|	Some tweaks to docs	Matei Zaharia	2013-02-25	1	-5/+5
\|/
*	Fixed class paths and dependencies based on Matei's comments.	Tathagata Das	2013-02-24	1	-3/+3
\|
*	Updated streaming programming guide with Java API info, and comments from ↵	Tathagata Das	2013-02-23	1	-11/+74
\| \| \| \|	Patrick.
*	Change spark.cleaner.delay to spark.cleaner.ttl. Updated docs.	Tathagata Das	2013-02-23	1	-1/+1
\|
*	Changed networkStream to socketStream and pluggableNetworkStream to become ↵	Tathagata Das	2013-02-18	1	-5/+5
\| \| \| \|	networkStream as a way to create streams from arbitrary network receiver.
*	Added checkpointing and fault-tolerance semantics to the programming guide. ↵	Tathagata Das	2013-02-18	1	-52/+194
\| \| \| \|	Fixed default checkpoint interval to being a multiple of slide duration. Fixed visibility of some classes and objects to clean up docs.
*	Added documentation for PairDStreamFunctions.	Tathagata Das	2013-01-13	1	-20/+25
\|
*	Renamed examples and added documentation.	Tathagata Das	2013-01-07	1	-7/+7
\|
*	Updated Streaming Programming Guide.	Tathagata Das	2013-01-01	1	-13/+154
\|
*	Improved jekyll and scala docs. Made many classes and method private to ↵	Tathagata Das	2012-12-29	1	-26/+30
\| \| \| \|	remove them from scala docs.
*	Streaming programming guide. STREAMING-2 #resolve	Patrick Wendell	2012-11-13	1	-0/+163