spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
...
\| * \| \| \| \| \| \|	Adding fix covering combineCombinersByKey as well	Patrick Wendell	2014-01-14	1	-4/+12
\| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \|	Deprecate rather than remove old combineValuesByKey function	Patrick Wendell	2014-01-14	1	-2/+6
\| \|/ / / / / /
* \| \| \| \| \| \|	Merge pull request #425 from rxin/scaladoc	Reynold Xin	2014-01-14	5	-17/+115
\|\ \ \ \ \ \ \ \| \|/ / / / / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	API doc update & make Broadcast public In #413 Broadcast was mistakenly made private[spark]. I changed it to public again. Also exposing id in public given the R frontend requires that. Copied some of the documentation from the programming guide to API Doc for Broadcast and Accumulator. This should be cherry picked into branch-0.9 as well for 0.9.0 release.
\| * \| \| \| \| \|	Fixed a typo in JavaSparkContext's API doc.	Reynold Xin	2014-01-14	1	-5/+6
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Maintain Serializable API compatibility by reverting back to ↵	Reynold Xin	2014-01-14	2	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	java.io.Serializable for Broadcast and Accumulator.
\| * \| \| \| \| \|	Added license header for package.scala in the Java API package.	Reynold Xin	2014-01-14	1	-0/+17
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Added package doc for the Java API.	Reynold Xin	2014-01-14	1	-0/+6
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Updated API doc for Accumulable and Accumulator.	Reynold Xin	2014-01-14	1	-9/+31
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Broadcast variable visibility change & doc update.	Reynold Xin	2014-01-14	2	-3/+54
\|/ / / / / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Note that previously Broadcast class was accidentally marked as private[spark]. It needs to be public for broadcast variables to work. Also exposing the broadcast varaible id.
* \| \| \| \| \|	Merge pull request #423 from jegonzal/GraphXProgrammingGuide	Reynold Xin	2014-01-14	1	-26/+37
\|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improving the graphx-programming-guide This PR will track a few minor improvements to the content and formatting of the graphx-programming-guide.
\| * \| \| \| \| \|	Improving the graphx-programming-guide.	Joseph E. Gonzalez	2014-01-14	1	-26/+37
\|/ / / / / /
* \| \| \| \| \|	Merge pull request #420 from pwendell/header-files	Patrick Wendell	2014-01-14	55	-0/+935
\|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add missing header files
\| * \| \| \| \| \|	Add missing header files	Patrick Wendell	2014-01-14	55	-0/+935
\|/ / / / / /
* \| \| \| \| \|	Merge pull request #416 from tdas/filestream-fix	Patrick Wendell	2014-01-14	21	-107/+110
\|\ \ \ \ \ \ \| \| \|/ / / / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Removed unnecessary DStream operations and updated docs Removed StreamingContext.registerInputStream and registerOutputStream - they were useless. InputDStream has been made to register itself, and just registering a DStream as output stream cause RDD objects to be created but the RDDs will not be computed at all.. Also made DStream.register() private[streaming] for the same reasons. Updated docs, specially added package documentation for streaming package. Also, changed NetworkWordCount's input storage level to use MEMORY_ONLY, replication on the local machine causes warning messages (as replication fails) which is scary for a new user trying out his/her first example.
\| * \| \| \| \|	Fixed loose ends in docs.	Tathagata Das	2014-01-14	2	-4/+2
\| \| \| \| \| \|
\| * \| \| \| \|	Merge remote-tracking branch 'apache/master' into filestream-fix	Tathagata Das	2014-01-13	134	-211/+7772
\| \|\ \ \ \ \ \| \| \| \|_\|/ / \| \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala
\| * \| \| \| \|	Removed StreamingContext.registerInputStream and registerOutputStream - they ↵	Tathagata Das	2014-01-13	21	-107/+115
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	were useless as InputDStream has been made to register itself. Also made DStream.register() private[streaming] - not useful to expose the confusing function. Updated a lot of documentation.
\| * \| \| \| \|	Merge remote-tracking branch 'apache/master' into filestream-fix	Tathagata Das	2014-01-13	1	-1/+1
\| \|\ \ \ \ \ \| \| \| \|_\|/ / \| \| \|/\| \| \|
* \| \| \| \| \|	Merge pull request #415 from pwendell/shuffle-compress	Patrick Wendell	2014-01-13	2	-2/+2
\|\ \ \ \ \ \ \| \|_\|_\|/ / / \|/\| \| \| \| \| \| \| \| \| \| \|	Enable compression by default for spills
\| * \| \| \| \|	Enable compression by default for spills	Patrick Wendell	2014-01-13	2	-2/+2
\|/ / / / /
* \| \| \| \|	Merge pull request #380 from mateiz/py-bayes	Patrick Wendell	2014-01-13	20	-58/+297
\|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add Naive Bayes to Python MLlib, and some API fixes - Added a Python wrapper for Naive Bayes - Updated the Scala Naive Bayes to match the style of our other algorithms better and in particular make it easier to call from Java (added builder pattern, removed default value in train method) - Updated Python MLlib functions to not require a SparkContext; we can get that from the RDD the user gives - Added a toString method in LabeledPoint - Made the Python MLlib tests run as part of run-tests as well (before they could only be run individually through each file)
\| * \| \| \| \|	Disable MLlib tests for now while Jenkins is still on Python 2.6	Matei Zaharia	2014-01-13	1	-5/+5
\| \| \| \| \| \|
\| * \| \| \| \|	Fix Scala version in docs (it was printed as 2.1)	Matei Zaharia	2014-01-12	1	-1/+1
\| \| \| \| \| \|
\| * \| \| \| \|	Update Python required version to 2.7, and mention MLlib support	Matei Zaharia	2014-01-12	1	-1/+7
\| \| \| \| \| \|
\| * \| \| \| \|	Log Python exceptions to stderr as well	Matei Zaharia	2014-01-12	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This helps in case the exception happened while serializing a record to be sent to Java, leaving the stream to Java in an inconsistent state where PythonRDD won't be able to read the error.
\| * \| \| \| \|	Added Java unit test, data, and main method for Naive Bayes	Matei Zaharia	2014-01-11	8	-4/+111
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also fixes mains of a few other algorithms to print the final model
\| * \| \| \| \|	Update some Python MLlib parameters to use camelCase, and tweak docs	Matei Zaharia	2014-01-11	3	-21/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've used camel case in other Spark methods so it felt reasonable to keep using it here and make the code match Scala/Java as much as possible. Note that parameter names matter in Python because it allows passing optional parameters by name.
\| * \| \| \| \|	Add Naive Bayes to Python MLlib, and some API fixes	Matei Zaharia	2014-01-11	10	-37/+150
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Added a Python wrapper for Naive Bayes - Updated the Scala Naive Bayes to match the style of our other algorithms better and in particular make it easier to call from Java (added builder pattern, removed default value in train method) - Updated Python MLlib functions to not require a SparkContext; we can get that from the RDD the user gives - Added a toString method in LabeledPoint - Made the Python MLlib tests run as part of run-tests as well (before they could only be run individually through each file)
* \| \| \| \| \|	Merge pull request #367 from ankurdave/graphx	Patrick Wendell	2014-01-13	76	-21/+7132
\|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GraphX: Unifying Graphs and Tables GraphX extends Spark's distributed fault-tolerant collections API and interactive console with a new graph API which leverages recent advances in graph systems (e.g., [GraphLab](http://graphlab.org)) to enable users to easily and interactively build, transform, and reason about graph structured data at scale. See http://amplab.github.io/graphx/. Thanks to @jegonzal, @rxin, @ankurdave, @dcrankshaw, @jianpingjwang, @amatsukawa, @kellrott, and @adamnovak. Tasks left: - [x] Graph-level uncache - [x] Uncache previous iterations in Pregel - [x] ~~Uncache previous iterations in GraphLab~~ (postponed to post-release) - [x] - Describe GC issue with GraphLab - [ ] Write `docs/graphx-programming-guide.md` - [x] - Mention future Bagel support in docs - [ ] - Section on caching/uncaching in docs: As with Spark, cache something that is used more than once. In an iterative algorithm, try to cache and force (i.e., materialize) something every iteration, then uncache the cached things that depended on the newly materialized RDD but that won't be referenced again. - [x] Undo modifications to core collections and instead copy them to org.apache.spark.graphx - [x] Make Graph serializable to work around capture in Spark shell - [x] Rename graph -> graphx in package name and subproject - [x] Remove standalone PageRank - [x] ~~Fix amplab/graphx#52 by checking `iter.hasNext`~~
\| * \| \| \| \| \|	Adding minimal additional functionality to EdgeRDD	Joseph E. Gonzalez	2014-01-13	1	-0/+17
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	adding documentation about EdgeRDD	Joseph E. Gonzalez	2014-01-13	1	-2/+40
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Fix all code examples in guide	Ankur Dave	2014-01-13	2	-29/+30
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Finish 6f6f8c928ce493357d4d32e46971c5e401682ea8	Ankur Dave	2014-01-13	1	-2/+4
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Fix bug in GraphLoader.edgeListFile that caused srcId > dstId	Ankur Dave	2014-01-13	1	-1/+1
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Edge object must be public for Edge case class	Ankur Dave	2014-01-13	1	-2/+2
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Wrap methods in the appropriate class/object declaration	Ankur Dave	2014-01-13	1	-64/+85
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Write Graph Builders section in guide	Ankur Dave	2014-01-13	1	-5/+49
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Remove K-Core and LDA sections from guide; they are unimplemented	Ankur Dave	2014-01-13	1	-4/+0
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Improve scaladoc links	Ankur Dave	2014-01-13	2	-6/+6
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Fix Pregel SSSP example in programming guide	Ankur Dave	2014-01-13	1	-8/+14
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Fix infinite loop in GraphGenerators.generateRandomEdges	Ankur Dave	2014-01-13	1	-8/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The loop occurred when numEdges < numVertices. This commit fixes it by allowing generateRandomEdges to generate a multigraph.
\| * \| \| \| \| \|	Make Graph{,Impl,Ops} serializable to work around capture	Ankur Dave	2014-01-13	3	-3/+3
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Remove Graph.statistics and GraphImpl.printLineage	Ankur Dave	2014-01-13	3	-77/+1
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Finished documenting vertexrdd.	Joseph E. Gonzalez	2014-01-13	1	-0/+53
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Merge branch 'graphx' of github.com:ankurdave/incubator-spark into graphx	Reynold Xin	2014-01-13	1	-15/+35
\| \|\ \ \ \ \ \
\| \| * \| \| \| \| \|	Finished second pass on pregel docs.	Joseph E. Gonzalez	2014-01-13	1	-12/+33
\| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \|	Minor changes in graphx programming guide.	Joseph E. Gonzalez	2014-01-13	1	-3/+2
\| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \|	Updated doc for PageRank.	Reynold Xin	2014-01-13	1	-47/+39
\| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \|	More cleanup.	Reynold Xin	2014-01-13	4	-9/+10
\| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \|	Moved SVDPlusPlusConf into SVDPlusPlus object itself.	Reynold Xin	2014-01-13	2	-15/+17
\| \|/ / / / / /