spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[maven-release-plugin] prepare release v1.0.0-rc11	Tathagata Das	2014-05-25	1	-1/+1
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc10"	Tathagata Das	2014-05-25	1	-1/+1
\| \| \| \|	This reverts commit d807023479ce10aec28ef3c1ab646ddefc2e663c.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Tathagata Das	2014-05-25	1	-1/+1
\| \| \| \|	This reverts commit 67dd53d2556f03ce292e6889128cf441f1aa48f8.
*	[maven-release-plugin] prepare for next development iteration	Tathagata Das	2014-05-20	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.0.0-rc10	Tathagata Das	2014-05-20	1	-1/+1
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc9"	Tathagata Das	2014-05-19	1	-1/+1
\| \| \| \|	This reverts commit 920f947eb5a22a679c0c3186cf69ee75f6041c75.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Tathagata Das	2014-05-19	1	-1/+1
\| \| \| \|	This reverts commit f8e611955096c5c1c7db5764b9d2851b1d295f0d.
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-17	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.0.0-rc9	Patrick Wendell	2014-05-17	1	-1/+1
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc8"	Patrick Wendell	2014-05-16	1	-1/+1
\| \| \| \|	This reverts commit 80eea0f111c06260ffaa780d2f3f7facd09c17bc.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-05-16	1	-1/+1
\| \| \| \|	This reverts commit e5436b8c1a79ce108f3af402455ac5f6dc5d1eb3.
*	bugfix: overflow of graphx Edge compare function	Zhen Peng	2014-05-16	2	-2/+47
\| \| \| \| \| \| \| \| \| \| \| \|	Author: Zhen Peng <zhenpeng01@baidu.com> Closes #769 from zhpengg/bugfix-graphx-edge-compare and squashes the following commits: 8a978ff [Zhen Peng] add ut for graphx Edge.lexicographicOrdering.compare 413c258 [Zhen Peng] there maybe a overflow for two Long's substraction (cherry picked from commit fa6de408a131a3e84350a60af74a92c323dfc5eb) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-16	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.0.0-rc8	Patrick Wendell	2014-05-16	1	-1/+1
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc7"	Patrick Wendell	2014-05-16	1	-1/+1
\| \| \| \|	This reverts commit 9212b3e5bb5545ccfce242da8d89108e6fb1c464.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-05-16	1	-1/+1
\| \| \| \|	This reverts commit c4746aa6fe4aaf383e69e34353114d36d1eb9ba6.
*	Fixes a misplaced comment.	Prashant Sharma	2014-05-15	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes a misplaced comment from #785. @pwendell Author: Prashant Sharma <prashant.s@imaginea.com> Closes #788 from ScrapCodes/patch-1 and squashes the following commits: 3ef6a69 [Prashant Sharma] Update package-info.java 67d9461 [Prashant Sharma] Update package-info.java (cherry picked from commit e1e3416c4e5f6f32983597d74866dbb809cf6a5e) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-15	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.0.0-rc7	Patrick Wendell	2014-05-15	1	-1/+1
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc6"	Patrick Wendell	2014-05-14	1	-1/+1
\| \| \| \|	This reverts commit 54133abdce0246f6643a1112a5204afb2c4caa82.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-05-14	1	-1/+1
\| \| \| \|	This reverts commit e480bcfbd269ae1d7a6a92cfb50466cf192fe1fb.
*	Package docs	Prashant Sharma	2014-05-14	5	-0/+110
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a few changes based on the original patch by @scrapcodes. Author: Prashant Sharma <prashant.s@imaginea.com> Author: Patrick Wendell <pwendell@gmail.com> Closes #785 from pwendell/package-docs and squashes the following commits: c32b731 [Patrick Wendell] Changes based on Prashant's patch c0463d3 [Prashant Sharma] added eof new line ce8bf73 [Prashant Sharma] Added eof new line to all files. 4c35f2e [Prashant Sharma] SPARK-1563 Add package-info.java and package.scala files for all packages that appear in docs (cherry picked from commit 46324279dae2fa803267d788f7c56b0ed643b4c8) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-14	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.0.0-rc6	Patrick Wendell	2014-05-14	1	-1/+1
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc5"	Patrick Wendell	2014-05-14	1	-1/+1
\| \| \| \|	This reverts commit 18f062303303824139998e8fc8f4158217b0dbc3.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-05-14	1	-1/+1
\| \| \| \|	This reverts commit d08e9604fc9958b7c768e91715c8152db2ed6fd0.
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-13	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.0.0-rc5	Patrick Wendell	2014-05-13	1	-1/+1
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc4"	Patrick Wendell	2014-05-12	1	-1/+1
\| \| \| \|	This reverts commit 3d0a44833ab50360bf9feccc861cb5e8c44a4866.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-05-12	1	-1/+1
\| \| \| \|	This reverts commit 9772d85c6f3893d42044f4bab0e16f8b6287613a.
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-13	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.0.0-rc4	Patrick Wendell	2014-05-13	1	-1/+1
\|
*	Rollback versions for 1.0.0-rc4	Patrick Wendell	2014-05-12	1	-1/+1
\|
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-12	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.0.0-rc4	Patrick Wendell	2014-05-12	1	-1/+1
\|
*	SPARK-1798. Tests should clean up temp files	Sean Owen	2014-05-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Three issues related to temp files that tests generate – these should be touched up for hygiene but are not urgent. Modules have a log4j.properties which directs the unit-test.log output file to a directory like `[module]/target/unit-test.log`. But this ends up creating `[module]/[module]/target/unit-test.log` instead of former. The `work/` directory is not deleted by "mvn clean", in the parent and in modules. Neither is the `checkpoint/` directory created under the various external modules. Many tests create a temp directory, which is not usually deleted. This can be largely resolved by calling `deleteOnExit()` at creation and trying to call `Utils.deleteRecursively` consistently to clean up, sometimes in an `@After` method. _If anyone seconds the motion, I can create a more significant change that introduces a new test trait along the lines of `LocalSparkContext`, which provides management of temp directories for subclasses to take advantage of._ Author: Sean Owen <sowen@cloudera.com> Closes #732 from srowen/SPARK-1798 and squashes the following commits: 5af578e [Sean Owen] Try to consistently delete test temp dirs and files, and set deleteOnExit() for each b21b356 [Sean Owen] Remove work/ and checkpoint/ dirs with mvn clean bdd0f41 [Sean Owen] Remove duplicate module dir in log4j.properties output path for tests (cherry picked from commit 7120a2979d0a9f0f54a88b2416be7ca10e74f409) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	SPARK-1786: Reopening PR 724	Ankur Dave	2014-05-12	11	-23/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Addressing issue in MimaBuild.scala. Author: Ankur Dave <ankurdave@gmail.com> Author: Joseph E. Gonzalez <joseph.e.gonzalez@gmail.com> Closes #742 from jegonzal/edge_partition_serialization and squashes the following commits: 8ba6e0d [Ankur Dave] Add concatenation operators to MimaBuild.scala cb2ed3a [Joseph E. Gonzalez] addressing missing exclusion in MimaBuild.scala 5d27824 [Ankur Dave] Disable reference tracking to fix serialization test c0a9ae5 [Ankur Dave] Add failing test for EdgePartition Kryo serialization a4a3faa [Joseph E. Gonzalez] Making EdgePartition serializable. (cherry picked from commit 0e2bde2030f8e455c5a269fc38d4ff05b395ca32) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	Revert "SPARK-1786: Edge Partition Serialization"	Patrick Wendell	2014-05-12	11	-44/+23
\| \| \| \|	This reverts commit 09e7aa4eed8834b446c0f59ebfc1034e1f109ed6.
*	SPARK-1786: Edge Partition Serialization	Ankur Dave	2014-05-11	11	-23/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This appears to address the issue with edge partition serialization. The solution appears to be just registering the `PrimitiveKeyOpenHashMap`. However I noticed that we appear to have forked that code in GraphX but retained the same name (which is confusing). I also renamed our local copy to `GraphXPrimitiveKeyOpenHashMap`. We should consider dropping that and using the one in Spark if possible. Author: Ankur Dave <ankurdave@gmail.com> Author: Joseph E. Gonzalez <joseph.e.gonzalez@gmail.com> Closes #724 from jegonzal/edge_partition_serialization and squashes the following commits: b0a525a [Ankur Dave] Disable reference tracking to fix serialization test bb7f548 [Ankur Dave] Add failing test for EdgePartition Kryo serialization 67dac22 [Joseph E. Gonzalez] Making EdgePartition serializable. (cherry picked from commit a6b02fb7486356493474c7f42bb714c9cce215ca) Signed-off-by: Matei Zaharia <matei@databricks.com>
*	Fix error in 2d Graph Partitioner	Joseph E. Gonzalez	2014-05-11	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Their was a minor bug in which negative partition ids could be generated when constructing a 2D partitioning of a graph. This could lead to an inefficient 2D partition for large vertex id values. Author: Joseph E. Gonzalez <joseph.e.gonzalez@gmail.com> Closes #709 from jegonzal/fix_2d_partitioning and squashes the following commits: 937c562 [Joseph E. Gonzalez] fixing bug in 2d partitioning algorithm where negative partition ids could be generated. (cherry picked from commit f938a155b2a9c126b292d5403aca31de83d5105a) Signed-off-by: Matei Zaharia <matei@databricks.com>
*	Unify GraphImpl RDDs + other graph load optimizations	Ankur Dave	2014-05-10	26	-843/+1337
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR makes the following changes, primarily in e4fbd329aef85fe2c38b0167255d2a712893d683: 1. Unify RDDs to avoid zipPartitions. A graph used to be four RDDs: vertices, edges, routing table, and triplet view. This commit merges them down to two: vertices (with routing table), and edges (with replicated vertices). 2. Avoid duplicate shuffle in graph building. We used to do two shuffles when building a graph: one to extract routing information from the edges and move it to the vertices, and another to find nonexistent vertices referred to by edges. With this commit, the latter is done as a side effect of the former. 3. Avoid no-op shuffle when joins are fully eliminated. This is a side effect of unifying the edges and the triplet view. 4. Join elimination for mapTriplets. 5. Ship only the needed vertex attributes when upgrading the triplet view. If the triplet view already contains source attributes, and we now need both attributes, only ship destination attributes rather than re-shipping both. This is done in `ReplicatedVertexView#upgrade`. Author: Ankur Dave <ankurdave@gmail.com> Closes #497 from ankurdave/unify-rdds and squashes the following commits: 332ab43 [Ankur Dave] Merge remote-tracking branch 'apache-spark/master' into unify-rdds 4933e2e [Ankur Dave] Exclude RoutingTable from binary compatibility check 5ba8789 [Ankur Dave] Add GraphX upgrade guide from Spark 0.9.1 13ac845 [Ankur Dave] Merge remote-tracking branch 'apache-spark/master' into unify-rdds a04765c [Ankur Dave] Remove unnecessary toOps call 57202e8 [Ankur Dave] Replace case with pair parameter 75af062 [Ankur Dave] Add explicit return types 04d3ae5 [Ankur Dave] Convert implicit parameter to context bound c88b269 [Ankur Dave] Revert upgradeIterator to if-in-a-loop 0d3584c [Ankur Dave] EdgePartition.size should be val 2a928b2 [Ankur Dave] Set locality wait 10b3596 [Ankur Dave] Clean up public API ae36110 [Ankur Dave] Fix style errors e4fbd32 [Ankur Dave] Unify GraphImpl RDDs + other graph load optimizations d6d60e2 [Ankur Dave] In GraphLoader, coalesce to minEdgePartitions 62c7b78 [Ankur Dave] In Analytics, take PageRank numIter d64e8d4 [Ankur Dave] Log current Pregel iteration (cherry picked from commit 905173df57b90f90ebafb22e43f55164445330e6) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	SPARK-1708. Add a ClassTag on Serializer and things that depend on it	Matei Zaharia	2014-05-10	2	-23/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This pull request contains a rebased patch from @heathermiller (https://github.com/heathermiller/spark/pull/1) to add ClassTags on Serializer and types that depend on it (Broadcast and AccumulableCollection). Putting these in the public API signatures now will allow us to use Scala Pickling for serialization down the line without breaking binary compatibility. One question remaining is whether we also want them on Accumulator -- Accumulator is passed as part of a bigger Task or TaskResult object via the closure serializer so it doesn't seem super useful to add the ClassTag there. Broadcast and AccumulableCollection in contrast were being serialized directly. CC @rxin, @pwendell, @heathermiller Author: Matei Zaharia <matei@databricks.com> Closes #700 from mateiz/spark-1708 and squashes the following commits: 1a3d8b0 [Matei Zaharia] Use fake ClassTag in Java 3b449ed [Matei Zaharia] test fix 2209a27 [Matei Zaharia] Code style fixes 9d48830 [Matei Zaharia] Add a ClassTag on Serializer and things that depend on it
*	SPARK-1565, update examples to be used with spark-submit script.	Prashant Sharma	2014-05-08	1	-7/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit for initial feedback, basically I am curious if we should prompt user for providing args esp. when its mandatory. And can we skip if they are not ? Also few other things that did not work like `bin/spark-submit examples/target/scala-2.10/spark-examples-1.0.0-SNAPSHOT-hadoop1.0.4.jar --class org.apache.spark.examples.SparkALS --arg 100 500 10 5 2` Not all the args get passed properly, may be I have messed up something will try to sort it out hopefully. Author: Prashant Sharma <prashant.s@imaginea.com> Closes #552 from ScrapCodes/SPARK-1565/update-examples and squashes the following commits: 669dd23 [Prashant Sharma] Review comments 2727e70 [Prashant Sharma] SPARK-1565, update examples to be used with spark-submit script. (cherry picked from commit 44dd57fb66bb676d753ad8d9757f9f4c03364113) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[SPARK-1460] Returning SchemaRDD instead of normal RDD on Set operations...	Kan Zhang	2014-05-07	2	-16/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	... that do not change schema Author: Kan Zhang <kzhang@apache.org> Closes #448 from kanzhang/SPARK-1460 and squashes the following commits: 111e388 [Kan Zhang] silence MiMa errors in EdgeRDD and VertexRDD 91dc787 [Kan Zhang] Taking into account newly added Ordering param 79ed52a [Kan Zhang] [SPARK-1460] Returning SchemaRDD on Set operations that do not change schema (cherry picked from commit 967635a2425a769b932eea0984fe697d6721cab0) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-04-29	1	-1/+1
\|
*	[maven-release-plugin] prepare release v1.0.0-rc3	Patrick Wendell	2014-04-29	1	-1/+1
\|
*	Manual revert of rc2 version changes.	Patrick Wendell	2014-04-28	1	-1/+1
\|
*	Improved build configuration	witgo	2014-04-28	1	-14/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1, Fix SPARK-1441: compile spark core error with hadoop 0.23.x 2, Fix SPARK-1491: maven hadoop-provided profile fails to build 3, Fix org.scala-lang: * ,org.apache.avro:* inconsistent versions dependency 4, A modified on the sql/catalyst/pom.xml,sql/hive/pom.xml,sql/core/pom.xml (Four spaces formatted into two spaces) Author: witgo <witgo@qq.com> Closes #480 from witgo/format_pom and squashes the following commits: 03f652f [witgo] review commit b452680 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom bee920d [witgo] revert fix SPARK-1629: Spark Core missing commons-lang dependence 7382a07 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom 6902c91 [witgo] fix SPARK-1629: Spark Core missing commons-lang dependence 0da4bc3 [witgo] merge master d1718ed [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom e345919 [witgo] add avro dependency to yarn-alpha 77fad08 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom 62d0862 [witgo] Fix org.scala-lang: * inconsistent versions dependency 1a162d7 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom 934f24d [witgo] review commit cf46edc [witgo] exclude jruby 06e7328 [witgo] Merge branch 'SparkBuild' into format_pom 99464d2 [witgo] fix maven hadoop-provided profile fails to build 0c6c1fc [witgo] Fix compile spark core error with hadoop 0.23.x 6851bec [witgo] Maintain consistent SparkBuild.scala, pom.xml (cherry picked from commit 030f2c2126d5075576cd6d83a1ee7462c48b953b) Conflicts: sql/catalyst/pom.xml sql/core/pom.xml sql/hive/pom.xml
*	Fix Scala Style	Sandeep	2014-04-24	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Any comments are welcome Author: Sandeep <sandeep@techaddict.me> Closes #531 from techaddict/stylefix-1 and squashes the following commits: 7492730 [Sandeep] Pass 4 98b2428 [Sandeep] fix rxin suggestions b5e2e6f [Sandeep] Pass 3 05932d7 [Sandeep] fix if else styling 2 08690e5 [Sandeep] fix if else styling (cherry picked from commit a03ac222d84025a1036750e1179136a13f75dea7) Signed-off-by: Reynold Xin <rxin@apache.org>
*	Mark all fields of EdgePartition, Graph, and GraphOps transient	Ankur Dave	2014-04-23	3	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These classes are only serializable to work around closure capture, so their fields should all be marked `@transient` to avoid wasteful serialization. This PR supersedes apache/spark#519 and fixes the same bug. Author: Ankur Dave <ankurdave@gmail.com> Closes #520 from ankurdave/graphx-transient and squashes the following commits: 6431760 [Ankur Dave] Mark all fields of EdgePartition, Graph, and GraphOps `@transient` (cherry picked from commit 1d6abe3a4b58f28fc4e0e690e02c19b2568ce1ee) Signed-off-by: Reynold Xin <rxin@apache.org>