spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-18692][BUILD][DOCS] Test Java 8 unidoc build on Jenkins	hyukjinkwon	2017-04-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR proposes to run Spark unidoc to test Javadoc 8 build as Javadoc 8 is easily re-breakable. There are several problems with it: - It introduces little extra bit of time to run the tests. In my case, it took 1.5 mins more (`Elapsed :[94.8746569157]`). How it was tested is described in "How was this patch tested?". - > One problem that I noticed was that Unidoc appeared to be processing test sources: if we can find a way to exclude those from being processed in the first place then that might significantly speed things up. (see joshrosen's [comment](https://issues.apache.org/jira/browse/SPARK-18692?focusedCommentId=15947627&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15947627)) To complete this automated build, It also suggests to fix existing Javadoc breaks / ones introduced by test codes as described above. There fixes are similar instances that previously fixed. Please refer https://github.com/apache/spark/pull/15999 and https://github.com/apache/spark/pull/16013 Note that this only fixes errors not warnings. Please see my observation https://github.com/apache/spark/pull/17389#issuecomment-288438704 for spurious errors by warnings. ## How was this patch tested? Manually via `jekyll build` for building tests. Also, tested via running `./dev/run-tests`. This was tested via manually adding `time.time()` as below: ```diff profiles_and_goals = build_profiles + sbt_goals print("[info] Building Spark unidoc (w/Hive 1.2.1) using SBT with these arguments: ", " ".join(profiles_and_goals)) + import time + st = time.time() exec_sbt(profiles_and_goals) + print("Elapsed :[%s]" % str(time.time() - st)) ``` produces ``` ... ======================================================================== Building Unidoc API Documentation ======================================================================== ... [info] Main Java API documentation successful. ... Elapsed :[94.8746569157] ... Author: hyukjinkwon <gurwls223@gmail.com> Closes #17477 from HyukjinKwon/SPARK-18692.
*	[SPARK-18847][GRAPHX] PageRank gives incorrect results for graphs with sinks	Andrew Ray	2017-03-17	2	-59/+144
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Graphs with sinks (vertices with no outgoing edges) don't have the expected rank sum of n (or 1 for personalized). We fix this by normalizing to the expected sum at the end of each implementation. Additionally this fixes the dynamic version of personal pagerank which gave incorrect answers that were not detected by existing unit tests. ## How was this patch tested? Revamped existing and additional unit tests with reference values (and reproduction code) from igraph and NetworkX. Note that for comparison on personal pagerank we use the arpack algorithm in igraph as prpack (the current default) redistributes rank to all vertices uniformly instead of just to the personalization source. We could take the alternate convention (redistribute rank to all vertices uniformly) but that would involve more extensive changes to the algorithms (the dynamic version would no longer be able to use Pregel). Author: Andrew Ray <ray.andrew@gmail.com> Closes #16483 from aray/pagerank-sink2.
*	[SPARK-14804][SPARK][GRAPHX] Fix checkpointing of VertexRDD/EdgeRDD	Tathagata Das	2017-01-25	2	-0/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? EdgeRDD/VertexRDD overrides checkpoint() and isCheckpointed() to forward these to the internal partitionRDD. So when checkpoint() is called on them, its the partitionRDD that actually gets checkpointed. However since isCheckpointed() also overridden to call partitionRDD.isCheckpointed, EdgeRDD/VertexRDD.isCheckpointed returns true even though this RDD is actually not checkpointed. This would have been fine except the RDD's internal logic for computing the RDD depends on isCheckpointed(). So for VertexRDD/EdgeRDD, since isCheckpointed is true, when computing Spark tries to read checkpoint data of VertexRDD/EdgeRDD even though they are not actually checkpointed. Through a crazy sequence of call forwarding, it reads checkpoint data of partitionsRDD and tries to cast it to types in Vertex/EdgeRDD. This leads to ClassCastException. The minimal fix that does not change any public behavior is to modify RDD internal to not use public override-able API for internal logic. ## How was this patch tested? New unit tests. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #15396 from tdas/SPARK-14804.
*	[SPARK-3249][DOC] Fix links in ScalaDoc that cause warning messages in ↵	hyukjinkwon	2017-01-17	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	`sbt/sbt unidoc` ## What changes were proposed in this pull request? This PR proposes to fix ambiguous link warnings by simply making them as code blocks for both javadoc and scaladoc. ``` [warn] .../spark/core/src/main/scala/org/apache/spark/Accumulator.scala:20: The link target "SparkContext#accumulator" is ambiguous. Several members fit the target: [warn] .../spark/mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala:281: The link target "runMiniBatchSGD" is ambiguous. Several members fit the target: [warn] .../spark/mllib/src/main/scala/org/apache/spark/mllib/fpm/AssociationRules.scala:83: The link target "run" is ambiguous. Several members fit the target: ... ``` This PR also fixes javadoc8 break as below: ``` [error] .../spark/sql/core/target/java/org/apache/spark/sql/LowPrioritySQLImplicits.java:7: error: reference not found [error] * newProductEncoder - to disambiguate for {link List}s which are both {link Seq} and {link Product} [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/LowPrioritySQLImplicits.java:7: error: reference not found [error] * newProductEncoder - to disambiguate for {link List}s which are both {link Seq} and {link Product} [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/LowPrioritySQLImplicits.java:7: error: reference not found [error] * newProductEncoder - to disambiguate for {link List}s which are both {link Seq} and {link Product} [error] ^ [info] 3 errors ``` ## How was this patch tested? Manually via `sbt unidoc > output.txt` and the checked it via `cat output.txt \| grep ambiguous` and `sbt unidoc \| grep error`. Author: hyukjinkwon <gurwls223@gmail.com> Closes #16604 from HyukjinKwon/SPARK-3249.
*	[SPARK-18845][GRAPHX] PageRank has incorrect initialization value that leads ↵	Andrew Ray	2016-12-15	2	-16/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	to slow convergence ## What changes were proposed in this pull request? Change the initial value in all PageRank implementations to be `1.0` instead of `resetProb` (default `0.15`) and use `outerJoinVertices` instead of `joinVertices` so that source vertices get updated in each iteration. This seems to have been introduced a long time ago in https://github.com/apache/spark/commit/15a564598fe63003652b1e24527c432080b5976c#diff-b2bf3f97dcd2f19d61c921836159cda9L90 With the exception of graphs with sinks (which currently give incorrect results see SPARK-18847) this gives faster convergence as the sum of ranks is already correct (sum of ranks should be number of vertices). Convergence comparision benchmark for small graph: http://imgur.com/a/HkkZf Code for benchmark: https://gist.github.com/aray/a7de1f3801a810f8b1fa00c271a1fefd ## How was this patch tested? (corrected) existing unit tests and additional test that verifies against result of igraph and NetworkX on a loop with a source. Author: Andrew Ray <ray.andrew@gmail.com> Closes #16271 from aray/pagerank-initial-value.
*	[SPARK-18615][DOCS] Switch to multi-line doc to avoid a genjavadoc bug for ↵	hyukjinkwon	2016-11-29	3	-5/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	backticks ## What changes were proposed in this pull request? Currently, single line comment does not mark down backticks to `<code>..</code>` but prints as they are (`` `..` ``). For example, the line below: ```scala /** Return an RDD with the pairs from `this` whose keys are not in `other`. / ``` So, we could work around this as below: ```scala /* * Return an RDD with the pairs from `this` whose keys are not in `other`. / ``` - javadoc - Before* ![2016-11-29 10 39 14](https://cloud.githubusercontent.com/assets/6477701/20693606/e64c8f90-b622-11e6-8dfc-4a029216e23d.png) - After ![2016-11-29 10 39 08](https://cloud.githubusercontent.com/assets/6477701/20693607/e7280d36-b622-11e6-8502-d2e21cd5556b.png) - scaladoc (this one looks fine either way) - Before ![2016-11-29 10 38 22](https://cloud.githubusercontent.com/assets/6477701/20693640/12c18aa8-b623-11e6-901a-693e2f6f8066.png) - After ![2016-11-29 10 40 05](https://cloud.githubusercontent.com/assets/6477701/20693642/14eb043a-b623-11e6-82ac-7cd0000106d1.png) I suspect this is related with SPARK-16153 and genjavadoc issue in ` typesafehub/genjavadoc#85`. ## How was this patch tested? I found them via ``` grep -r "\/\\.*\`" . \| grep .scala ```` and then checked if each is in the public API documentation with manually built docs (`jekyll build`) with Java 7. Author: hyukjinkwon <gurwls223@gmail.com> Closes #16050 from HyukjinKwon/javadoc-markdown.
*	[SPARK-3359][DOCS] Make javadoc8 working for unidoc/genjavadoc compatibility ↵	hyukjinkwon	2016-11-29	4	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in Java API documentation ## What changes were proposed in this pull request? This PR make `sbt unidoc` complete with Java 8. This PR roughly includes several fixes as below: - Fix unrecognisable class and method links in javadoc by changing it from `[[..]]` to `` `...` `` ```diff - * A column that will be computed based on the data in a [[DataFrame]]. + * A column that will be computed based on the data in a `DataFrame`. ``` - Fix throws annotations so that they are recognisable in javadoc - Fix URL links to `<a href="http..."></a>`. ```diff - * [[http://en.wikipedia.org/wiki/Decision_tree_learning Decision tree]] model for regression. + * <a href="http://en.wikipedia.org/wiki/Decision_tree_learning"> + * Decision tree (Wikipedia)</a> model for regression. ``` ```diff - * see http://en.wikipedia.org/wiki/Receiver_operating_characteristic + * see <a href="http://en.wikipedia.org/wiki/Receiver_operating_characteristic"> + * Receiver operating characteristic (Wikipedia)</a> ``` - Fix < to > to - `greater than`/`greater than or equal to` or `less than`/`less than or equal to` where applicable. - Wrap it with `{{{...}}}` to print them in javadoc or use `{code ...}` or `{literal ..}`. Please refer https://github.com/apache/spark/pull/16013#discussion_r89665558 - Fix `</p>` complaint ## How was this patch tested? Manually tested by `jekyll build` with Java 7 and 8 ``` java version "1.7.0_80" Java(TM) SE Runtime Environment (build 1.7.0_80-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode) ``` ``` java version "1.8.0_45" Java(TM) SE Runtime Environment (build 1.8.0_45-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode) ``` Author: hyukjinkwon <gurwls223@gmail.com> Closes #16013 from HyukjinKwon/SPARK-3359-errors-more.
*	[SPARK-3359][BUILD][DOCS] More changes to resolve javadoc 8 errors that will ↵	hyukjinkwon	2016-11-25	6	-8/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	help unidoc/genjavadoc compatibility ## What changes were proposed in this pull request? This PR only tries to fix things that looks pretty straightforward and were fixed in other previous PRs before. This PR roughly fixes several things as below: - Fix unrecognisable class and method links in javadoc by changing it from `[[..]]` to `` `...` `` ``` [error] .../spark/sql/core/target/java/org/apache/spark/sql/streaming/DataStreamReader.java:226: error: reference not found [error] * Loads text files and returns a {link DataFrame} whose schema starts with a string column named ``` - Fix an exception annotation and remove code backticks in `throws` annotation Currently, sbt unidoc with Java 8 complains as below: ``` [error] .../java/org/apache/spark/sql/streaming/StreamingQuery.java:72: error: unexpected text [error] * throws StreamingQueryException, if <code>this</code> query has terminated with an exception. ``` `throws` should specify the correct class name from `StreamingQueryException,` to `StreamingQueryException` without backticks. (see [JDK-8007644](https://bugs.openjdk.java.net/browse/JDK-8007644)). - Fix `[[http..]]` to `<a href="http..."></a>`. ```diff - * [[https://blogs.oracle.com/java-platform-group/entry/diagnosing_tls_ssl_and_https Oracle - * blog page]]. + * <a href="https://blogs.oracle.com/java-platform-group/entry/diagnosing_tls_ssl_and_https"> + * Oracle blog page</a>. ``` `[[http...]]` link markdown in scaladoc is unrecognisable in javadoc. - It seems class can't have `return` annotation. So, two cases of this were removed. ``` [error] .../java/org/apache/spark/mllib/regression/IsotonicRegression.java:27: error: invalid use of return [error] * return New instance of IsotonicRegression. ``` - Fix < to `<` and > to `>` according to HTML rules. - Fix `</p>` complaint - Exclude unrecognisable in javadoc, `constructor`, `todo` and `groupname`. ## How was this patch tested? Manually tested by `jekyll build` with Java 7 and 8 ``` java version "1.7.0_80" Java(TM) SE Runtime Environment (build 1.7.0_80-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode) ``` ``` java version "1.8.0_45" Java(TM) SE Runtime Environment (build 1.8.0_45-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode) ``` Note: this does not yet make sbt unidoc suceed with Java 8 yet but it reduces the number of errors with Java 8. Author: hyukjinkwon <gurwls223@gmail.com> Closes #15999 from HyukjinKwon/SPARK-3359-errors.
*	[SPARK-18445][BUILD][DOCS] Fix the markdown for `Note:`/`NOTE:`/`Note ↵	hyukjinkwon	2016-11-19	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	that`/`'''Note:'''` across Scala/Java API documentation ## What changes were proposed in this pull request? It seems in Scala/Java, - `Note:` - `NOTE:` - `Note that` - `'''Note:'''` - `note` This PR proposes to fix those to `note` to be consistent. Before - Scala ![2016-11-17 6 16 39](https://cloud.githubusercontent.com/assets/6477701/20383180/1a7aed8c-acf2-11e6-9611-5eaf6d52c2e0.png) - Java ![2016-11-17 6 14 41](https://cloud.githubusercontent.com/assets/6477701/20383096/c8ffc680-acf1-11e6-914a-33460bf1401d.png) After - Scala ![2016-11-17 6 16 44](https://cloud.githubusercontent.com/assets/6477701/20383167/09940490-acf2-11e6-937a-0d5e1dc2cadf.png) - Java ![2016-11-17 6 13 39](https://cloud.githubusercontent.com/assets/6477701/20383132/e7c2a57e-acf1-11e6-9c47-b849674d4d88.png) ## How was this patch tested? The notes were found via ```bash grep -r "NOTE: " . \| \ # Note:\|NOTE:\|Note that\|'''Note:''' grep -v "// NOTE: " \| \ # starting with // does not appear in API documentation. grep -E '.scala\|.java' \| \ # java/scala files grep -v Suite \| \ # exclude tests grep -v Test \| \ # exclude tests grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation -e 'org.apache.spark.api.java.function' \ # note that this is a regular expression. So actual matches were mostly `org/apache/spark/api/java/functions ...` -e 'org.apache.spark.api.r' \ ... ``` ```bash grep -r "Note that " . \| \ # Note:\|NOTE:\|Note that\|'''Note:''' grep -v "// Note that " \| \ # starting with // does not appear in API documentation. grep -E '.scala\|.java' \| \ # java/scala files grep -v Suite \| \ # exclude tests grep -v Test \| \ # exclude tests grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation -e 'org.apache.spark.api.java.function' \ -e 'org.apache.spark.api.r' \ ... ``` ```bash grep -r "Note: " . \| \ # Note:\|NOTE:\|Note that\|'''Note:''' grep -v "// Note: " \| \ # starting with // does not appear in API documentation. grep -E '.scala\|.java' \| \ # java/scala files grep -v Suite \| \ # exclude tests grep -v Test \| \ # exclude tests grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation -e 'org.apache.spark.api.java.function' \ -e 'org.apache.spark.api.r' \ ... ``` ```bash grep -r "'''Note:'''" . \| \ # Note:\|NOTE:\|Note that\|'''Note:''' grep -v "// '''Note:''' " \| \ # starting with // does not appear in API documentation. grep -E '.scala\|.java' \| \ # java/scala files grep -v Suite \| \ # exclude tests grep -v Test \| \ # exclude tests grep -e 'org.apache.spark.api.java' \ # packages appear in API documenation -e 'org.apache.spark.api.java.function' \ -e 'org.apache.spark.api.r' \ ... ``` And then fixed one by one comparing with API documentation/access modifiers. After that, manually tested via `jekyll build`. Author: hyukjinkwon <gurwls223@gmail.com> Closes #15889 from HyukjinKwon/SPARK-18437.
*	[SPARK-11496][GRAPHX][FOLLOWUP] Add param checking for ↵	Zheng RuiFeng	2016-11-14	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	runParallelPersonalizedPageRank ## What changes were proposed in this pull request? add the param checking to keep in line with other algos ## How was this patch tested? existing tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #15876 from zhengruifeng/param_check_runParallelPersonalizedPageRank.
*	[SPARK-11496][GRAPHX] Parallel implementation of personalized pagerank	Yves Raimond	2016-09-10	3	-1/+116
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	(Updated version of [PR-9457](https://github.com/apache/spark/pull/9457), rebased on latest Spark master, and using mllib-local). This implements a parallel version of personalized pagerank, which runs all propagations for a list of source vertices in parallel. I ran a few benchmarks on the full [DBpedia](http://dbpedia.org/) graph. When running personalized pagerank for only one source node, the existing implementation is twice as fast as the parallel one (because of the SparseVector overhead). However for 10 source nodes, the parallel implementation is four times as fast. When increasing the number of source nodes, this difference becomes even greater. ![image](https://cloud.githubusercontent.com/assets/2491/10927702/dd82e4fa-8256-11e5-89a8-4799b407f502.png) Author: Yves Raimond <yraimond@netflix.com> Closes #14998 from moustaki/parallel-ppr.
*	[SPARK-16779][TRIVIAL] Avoid using postfix operators where they do not add ↵	Holden Karau	2016-08-08	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	much and remove whitelisting ## What changes were proposed in this pull request? Avoid using postfix operation for command execution in SQLQuerySuite where it wasn't whitelisted and audit existing whitelistings removing postfix operators from most places. Some notable places where postfix operation remains is in the XML parsing & time units (seconds, millis, etc.) where it arguably can improve readability. ## How was this patch tested? Existing tests. Author: Holden Karau <holden@us.ibm.com> Closes #14407 from holdenk/SPARK-16779.
*	[SPARK-16694][CORE] Use for/foreach rather than map for Unit expressions ↵	Sean Owen	2016-07-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	whose side effects are required ## What changes were proposed in this pull request? Use foreach/for instead of map where operation requires execution of body, not actually defining a transformation ## How was this patch tested? Jenkins Author: Sean Owen <sowen@cloudera.com> Closes #14332 from srowen/SPARK-16694.
*	[SPARK-16478] graphX (added graph caching in strongly connected components)	Michał Wesołowski	2016-07-19	1	-36/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? I added caching in every iteration for sccGraph that is returned in strongly connected components. Without this cache strongly connected components returned graph that needed to be computed from scratch when some intermediary caches didn't existed anymore. ## How was this patch tested? I tested it by running code similar to the one [on databrics](https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/4889410027417133/3634650767364730/3117184429335832/latest.html). Basically I generated large graph and computed strongly connected components with changed code, than simply run count on vertices and edges. Count after this update takes few seconds instead 20 minutes. # statement contribution is my original work and I license the work to the project under the project's open source license. Author: Michał Wesołowski <michal.wesolowski@bzwbk.pl> Closes #14137 from wesolowskim/SPARK-16478.
*	[SPARK-3359][DOCS] More changes to resolve javadoc 8 errors that will help ↵	Sean Owen	2016-07-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	unidoc/genjavadoc compatibility ## What changes were proposed in this pull request? These are yet more changes that resolve problems with unidoc/genjavadoc and Java 8. It does not fully resolve the problem, but gets rid of as many errors as we can from this end. ## How was this patch tested? Jenkins build of docs Author: Sean Owen <sowen@cloudera.com> Closes #14221 from srowen/SPARK-3359.3.
*	[MINOR] Fix Typos 'an -> a'	Zheng RuiFeng	2016-06-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? `an -> a` Use cmds like `find . -name '*.R' \| xargs -i sh -c "grep -in ' an [^aeiou]' {} && echo {}"` to generate candidates, and review them one by one. ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #13515 from zhengruifeng/an_a.
*	[SPARK-15057][GRAPHX] Remove stale TODO comment for making `enum` in ↵	Dongjoon Hyun	2016-05-03	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GraphGenerators ## What changes were proposed in this pull request? This PR removes a stale TODO comment in `GraphGenerators.scala` ## How was this patch tested? Just comment removed. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12839 from dongjoon-hyun/SPARK-15057.
*	[MINOR][DOCS] Minor typo fixes	Jacek Laskowski	2016-04-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Minor typo fixes (too minor to deserve separate a JIRA) ## How was this patch tested? local build Author: Jacek Laskowski <jacek@japila.pl> Closes #12469 from jaceklaskowski/minor-typo-fixes.
*	[SPARK-14868][BUILD] Enable NewLineAtEofChecker in checkstyle and fix ↵	Dongjoon Hyun	2016-04-24	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	lint-java errors ## What changes were proposed in this pull request? Spark uses `NewLineAtEofChecker` rule in Scala by ScalaStyle. And, most Java code also comply with the rule. This PR aims to enforce the same rule `NewlineAtEndOfFile` by CheckStyle explicitly. Also, this fixes lint-java errors since SPARK-14465. The followings are the items. - Adds a new line at the end of the files (19 files) - Fixes 25 lint-java errors (12 RedundantModifier, 6 ArrayTypeStyle, 2 LineLength, 2 UnusedImports, 2 RegexpSingleline, 1 ModifierOrder) ## How was this patch tested? After the Jenkins test succeeds, `dev/lint-java` should pass. (Currently, Jenkins dose not run lint-java.) ```bash $ dev/lint-java Using `mvn` from path: /usr/local/bin/mvn Checkstyle checks passed. ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12632 from dongjoon-hyun/SPARK-14868.
*	[SPARK-14134][CORE] Change the package name used for shading classes.	Marcelo Vanzin	2016-04-06	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current package name uses a dash, which is a little weird but seemed to work. That is, until a new test tried to mock a class that references one of those shaded types, and then things started failing. Most changes are just noise to fix the logging configs. For reference, SPARK-8815 also raised this issue, although at the time it did not cause any issues in Spark, so it was not addressed. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11941 from vanzin/SPARK-14134.
*	Added omitted word in error message	Victor Chima	2016-04-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Added an omitted word in the error message displayed by the Graphx Pregel API when `maxIterations <= 0` ## How was this patch tested? Manual test Author: Victor Chima <blazy2k9@gmail.com> Closes #12205 from blazy2k9/hotfix/pregel-error-message.
*	[MINOR][DOCS] Use multi-line JavaDoc comments in Scala code.	Dongjoon Hyun	2016-04-02	2	-14/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR aims to fix all Scala-Style multiline comments into Java-Style multiline comments in Scala codes. (All comment-only changes over 77 files: +786 lines, −747 lines) ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12130 from dongjoon-hyun/use_multiine_javadoc_comments.
*	[SPARK-14219][GRAPHX] Fix `pickRandomVertex` not to fall into infinite loops ↵	Dongjoon Hyun	2016-03-28	2	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	for graphs with one vertex ## What changes were proposed in this pull request? Currently, `GraphOps.pickRandomVertex()` falls into infinite loops for graphs having only one vertex. This PR fixes it by modifying the following termination-checking condition. ```scala - if (selectedVertices.count > 1) { + if (selectedVertices.count > 0) { ``` ## How was this patch tested? Pass the Jenkins tests (including new test case). Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12018 from dongjoon-hyun/SPARK-14219.
*	[MINOR] Fix newly added java-lint errors	Dongjoon Hyun	2016-03-26	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR fixes some newly added java-lint errors(unused-imports, line-lengsth). ## How was this patch tested? Pass the Jenkins tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11968 from dongjoon-hyun/SPARK-14167.
*	[SPARK-13928] Move org.apache.spark.Logging into ↵	Wenchen Fan	2016-03-17	5	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	org.apache.spark.internal.Logging ## What changes were proposed in this pull request? Logging was made private in Spark 2.0. If we move it, then users would be able to create a Logging trait themselves to avoid changing their own code. ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes #11764 from cloud-fan/logger.
*	[SPARK-13816][GRAPHX] Add parameter checks for algorithms in Graphx	Zheng RuiFeng	2016-03-16	6	-2/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	JIRA: https://issues.apache.org/jira/browse/SPARK-13816 ## What changes were proposed in this pull request? Add parameter checks for algorithms in Graphx: Pregel,LabelPropagation,PageRank,SVDPlusPlus ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #11655 from zhengruifeng/graphx_param_check.
*	[MINOR][DOCS] Fix more typos in comments/strings.	Dongjoon Hyun	2016-03-14	7	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR fixes 135 typos over 107 files: * 121 typos in comments * 11 typos in testcase name * 3 typos in log messages ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11689 from dongjoon-hyun/fix_more_typos.
*	[SPARK-13823][CORE][STREAMING][SQL] Always specify Charset in String <-> ↵	Sean Owen	2016-03-13	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	byte[] conversions (and remaining Coverity items) ## What changes were proposed in this pull request? - Fixes calls to `new String(byte[])` or `String.getBytes()` that rely on platform default encoding, to use UTF-8 - Same for `InputStreamReader` and `OutputStreamWriter` constructors - Standardizes on UTF-8 everywhere - Standardizes specifying the encoding with `StandardCharsets.UTF-8`, not the Guava constant or "UTF-8" (which means handling `UnuspportedEncodingException`) - (also addresses the other remaining Coverity scan issues, which are pretty trivial; these are separated into commit https://github.com/srowen/spark/commit/1deecd8d9ca986d8adb1a42d315890ce5349d29c ) ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #11657 from srowen/SPARK-13823.
*	[MINOR] Fix typos in comments and testcase name of code	Dongjoon Hyun	2016-03-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR fixes typos in comments and testcase name of code. ## How was this patch tested? manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11481 from dongjoon-hyun/minor_fix_typos_in_code.
*	[SPARK-13583][CORE][STREAMING] Remove unused imports and add checkstyle rule	Dongjoon Hyun	2016-03-03	15	-38/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? After SPARK-6990, `dev/lint-java` keeps Java code healthy and helps PR review by saving much time. This issue aims remove unused imports from Java/Scala code and add `UnusedImports` checkstyle rule to help developers. ## How was this patch tested? ``` ./dev/lint-java ./build/sbt compile ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11438 from dongjoon-hyun/SPARK-13583.
*	[SPARK-13423][WIP][CORE][SQL][STREAMING] Static analysis fixes for 2.x	Sean Owen	2016-03-03	10	-25/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Make some cross-cutting code improvements according to static analysis. These are individually up for discussion since they exist in separate commits that can be reverted. The changes are broadly: - Inner class should be static - Mismatched hashCode/equals - Overflow in compareTo - Unchecked warnings - Misuse of assert, vs junit.assert - get(a) + getOrElse(b) -> getOrElse(a,b) - Array/String .size -> .length (occasionally, -> .isEmpty / .nonEmpty) to avoid implicit conversions - Dead code - tailrec - exists(_ == ) -> contains find + nonEmpty -> exists filter + size -> count - reduce(_+_) -> sum map + flatten -> map The most controversial may be .size -> .length simply because of its size. It is intended to avoid implicits that might be expensive in some places. ## How was the this patch tested? Existing Jenkins unit tests. Author: Sean Owen <sowen@cloudera.com> Closes #11292 from srowen/SPARK-13423.
*	[MINOR][DOCS] Fix all typos in markdown files of `doc` and similar patterns ↵	Dongjoon Hyun	2016-02-22	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in other comments ## What changes were proposed in this pull request? This PR tries to fix all typos in all markdown files under `docs` module, and fixes similar typos in other comments, too. ## How was the this patch tested? manual tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11300 from dongjoon-hyun/minor_fix_typos.
*	[SPARK-3650][GRAPHX] Triangle Count handles reverse edges incorrectly	Robin East	2016-02-21	4	-23/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	jegonzal ankurdave please could you review ## What changes were proposed in this pull request? Reworking of jegonzal PR #2495 to address the issue identified in SPARK-3650. Code amended to use the convertToCanonicalEdges method. ## How was the this patch tested? Patch was tested using the unit tests created in PR #2495 Author: Robin East <robin.east@xense.co.uk> Author: Joseph E. Gonzalez <joseph.e.gonzalez@gmail.com> Closes #11290 from insidedctm/spark-3650.
*	[SPARK-13416][GraphX] Add positive check for option 'numIter' in ↵	Zheng RuiFeng	2016-02-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	StronglyConnectedComponents JIRA: https://issues.apache.org/jira/browse/SPARK-13416 ## What changes were proposed in this pull request? The output of StronglyConnectedComponents with numIter no greater than 1 may make no sense. So I just add require check in it. ## How was the this patch tested? unit tests passed Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #11284 from zhengruifeng/scccheck.
*	[SPARK-13386][GRAPHX] ConnectedComponents should support maxIteration option	Zheng RuiFeng	2016-02-20	2	-6/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	JIRA: https://issues.apache.org/jira/browse/SPARK-13386 ## What changes were proposed in this pull request? add maxIteration option for ConnectedComponents algorithm ## How was the this patch tested? unit tests passed Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #11268 from zhengruifeng/ccwithmax.
*	[SPARK-12995][GRAPHX] Remove deprecate APIs from Pregel	Takeshi YAMAMURO	2016-02-15	6	-134/+36
\| \| \| \| \| \|	Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #10918 from maropu/RemoveDeprecateInPregel.
*	[SPARK-12655][GRAPHX] GraphX does not unpersist RDDs	Jason Lee	2016-01-15	3	-2/+20
\| \| \| \| \| \| \| \| \| \|	Some VertexRDD and EdgeRDD are created during the intermediate step of g.connectedComponents() but unnecessarily left cached after the method is done. The fix is to unpersist these RDDs once they are no longer in use. A test case is added to confirm the fix for the reported bug. Author: Jason Lee <cjlee@us.ibm.com> Closes #10713 from jasoncl/SPARK-12655.
*	[SPARK-12692][BUILD][GRAPHX] Scala style: Fix the style violation (Space ↵	Kousuke Saruta	2016-01-10	5	-9/+8
\| \| \| \| \| \| \| \| \| \| \|	before "," or ":") Fix the style violation (space before `,` and `:`). This PR is a followup for #10643. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10683 from sarutak/SPARK-12692-followup-graphx.
*	[SPARK-12665][CORE][GRAPHX] Remove Vector, VectorSuite and ↵	Kousuke Saruta	2016-01-06	1	-49/+0
\| \| \| \| \| \| \| \| \| \|	GraphKryoRegistrator which are deprecated and no longer used Whole code of Vector.scala, VectorSuite.scala and GraphKryoRegistrator.scala are no longer used so it's time to remove them in Spark 2.0. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10613 from sarutak/SPARK-12665.
*	[SPARK-3873][TESTS] Import ordering fixes.	Marcelo Vanzin	2016-01-05	2	-4/+2
\| \| \| \| \| \|	Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10582 from vanzin/SPARK-3873-tests.
*	[SPARK-3873][GRAPHX] Import order fixes.	Marcelo Vanzin	2015-12-30	23	-50/+33
\| \| \| \| \| \| \| \|	There's one warning left, caused by a bug in the checker. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10537 from vanzin/SPARK-3873-graphx.
*	[SPARK-5882][GRAPHX] Add a test for GraphLoader.edgeListFile	Takeshi YAMAMURO	2015-12-21	1	-0/+47
\| \| \| \| \| \|	Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #4674 from maropu/AddGraphLoaderSuite.
*	[SPARK-12112][BUILD] Upgrade to SBT 0.13.9	Josh Rosen	2015-12-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	We should upgrade to SBT 0.13.9, since this is a requirement in order to use SBT's new Maven-style resolution features (which will be done in a separate patch, because it's blocked by some binary compatibility issues in the POM reader plugin). I also upgraded Scalastyle to version 0.8.0, which was necessary in order to fix a Scala 2.10.5 compatibility issue (see https://github.com/scalastyle/scalastyle/issues/156). The newer Scalastyle is slightly stricter about whitespace surrounding tokens, so I fixed the new style violations. Author: Josh Rosen <joshrosen@databricks.com> Closes #10112 from JoshRosen/upgrade-to-sbt-0.13.9.
*	Fixed error in scaladoc of convertToCanonicalEdges	Gaurav Kumar	2015-11-12	1	-1/+1
\| \| \| \| \| \| \| \|	The code convertToCanonicalEdges is such that srcIds are smaller than dstIds but the scaladoc suggested otherwise. Have fixed the same. Author: Gaurav Kumar <gauravkumar37@gmail.com> Closes #9666 from gauravkumar37/patch-1.
*	[SPARK-6152] Use shaded ASM5 to support closure cleaning of Java 8 compiled ↵	Josh Rosen	2015-11-11	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	classes This patch modifies Spark's closure cleaner (and a few other places) to use ASM 5, which is necessary in order to support cleaning of closures that were compiled by Java 8. In order to avoid ASM dependency conflicts, Spark excludes ASM from all of its dependencies and uses a shaded version of ASM 4 that comes from `reflectasm` (see [SPARK-782](https://issues.apache.org/jira/browse/SPARK-782) and #232). This patch updates Spark to use a shaded version of ASM 5.0.4 that was published by the Apache XBean project; the POM used to create the shaded artifact can be found at https://github.com/apache/geronimo-xbean/blob/xbean-4.4/xbean-asm5-shaded/pom.xml. http://movingfulcrum.tumblr.com/post/80826553604/asm-framework-50-the-missing-migration-guide was a useful resource while upgrading the code to use the new ASM5 opcodes. I also added a new regression tests in the `java8-tests` subproject; the existing tests were insufficient to catch this bug, which only affected Scala 2.11 user code which was compiled targeting Java 8. Author: Josh Rosen <joshrosen@databricks.com> Closes #9512 from JoshRosen/SPARK-6152.
*	[SPARK-11432][GRAPHX] Personalized PageRank shouldn't use uniform initialization	Yves Raimond	2015-11-02	2	-15/+27
\| \| \| \| \| \| \| \|	Changes the personalized pagerank initialization to be non-uniform. Author: Yves Raimond <yraimond@netflix.com> Closes #9386 from moustaki/personalized-pagerank-init.
*	[SPARK-10598] [DOCS]	Robin East	2015-09-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Comments preceding toMessage method state: "The edge partition is encoded in the lower * 30 bytes of the Int, and the position is encoded in the upper 2 bytes of the Int.". References to bytes should be changed to bits. This contribution is my original work and I license the work to the Spark project under it's open source license. Author: Robin East <robin.east@xense.co.uk> Closes #8756 from insidedctm/master.
*	[SPARK-10576] [BUILD] Move .java files out of src/main/scala	Sean Owen	2015-09-14	2	-0/+0
\| \| \| \| \| \| \| \|	Move .java files in `src/main/scala` to `src/main/java` root, except for `package-info.java` (to stay next to package.scala) Author: Sean Owen <sowen@cloudera.com> Closes #8736 from srowen/SPARK-10576.
*	[SPARK-10227] fatal warnings with sbt on Scala 2.11	Luc Bourlier	2015-09-09	3	-7/+7
\| \| \| \| \| \| \| \| \| \| \|	The bulk of the changes are on `transient` annotation on class parameter. Often the compiler doesn't generate a field for this parameters, so the the transient annotation would be unnecessary. But if the class parameter are used in methods, then fields are created. So it is safer to keep the annotations. The remainder are some potential bugs, and deprecated syntax. Author: Luc Bourlier <luc.bourlier@typesafe.com> Closes #8433 from skyluc/issue/sbt-2.11.
*	[SPARK-9960] [GRAPHX] sendMessage type fix in LabelPropagation.scala	zc he	2015-08-14	1	-1/+1
\| \| \| \| \| \|	Author: zc he <farseer90718@gmail.com> Closes #8188 from farseer90718/farseer-patch-1.