aboutsummaryrefslogtreecommitdiff
path: root/examples
Commit message (Collapse)AuthorAgeFilesLines
* Preparing Spark release v1.2.1-rc3v1.2.1Patrick Wendell2015-02-031-1/+1
|
* Revert "Preparing Spark release v1.2.1-rc2"Patrick Wendell2015-02-021-1/+1
| | | | This reverts commit b77f87673d1f9f03d4c83cf583158227c551359b.
* Revert "Preparing development version 1.2.2-SNAPSHOT"Patrick Wendell2015-02-021-1/+1
| | | | This reverts commit 0a16abadc59082b7d3a24d7f3625236658632813.
* Preparing development version 1.2.2-SNAPSHOTPatrick Wendell2015-01-281-1/+1
|
* Preparing Spark release v1.2.1-rc2Patrick Wendell2015-01-281-1/+1
|
* Revert "Preparing Spark release v1.2.1-rc1"Patrick Wendell2015-01-271-1/+1
| | | | This reverts commit 3e2d7d310b76c293b9ac787f204e6880f508f6ec.
* Revert "Preparing development version 1.2.2-SNAPSHOT"Patrick Wendell2015-01-271-1/+1
| | | | This reverts commit f53a4319ba5f0843c077e64ae5a41e2fac835a5b.
* Preparing development version 1.2.2-SNAPSHOTPatrick Wendell2015-01-271-1/+1
|
* Preparing Spark release v1.2.1-rc1Patrick Wendell2015-01-271-1/+1
|
* Revert "Preparing Spark release v1.2.1-rc1"Patrick Wendell2015-01-261-1/+1
| | | | This reverts commit e87eb2b42f137c22194cfbca2abf06fecdf943da.
* Revert "Preparing development version 1.2.2-SNAPSHOT"Patrick Wendell2015-01-261-1/+1
| | | | This reverts commit adfed7086f10fa8db4eeac7996c84cf98f625e9a.
* Preparing development version 1.2.2-SNAPSHOTUbuntu2015-01-271-1/+1
|
* Preparing Spark release v1.2.1-rc1Ubuntu2015-01-271-1/+1
|
* [SPARK-5233][Streaming] Fix error replaying of WAL introduced bugjerryshao2015-01-221-1/+1
| | | | | | | | | | | | | | | | Because of lacking of `BlockAllocationEvent` in WAL recovery, the dangled event will mix into the new batch, which will lead to the wrong result. Details can be seen in [SPARK-5233](https://issues.apache.org/jira/browse/SPARK-5233). Author: jerryshao <saisai.shao@intel.com> Closes #4032 from jerryshao/SPARK-5233 and squashes the following commits: f0b0c0b [jerryshao] Further address the comments a237c75 [jerryshao] Address the comments e356258 [jerryshao] Fix bug in unit test 558bdc3 [jerryshao] Correctly replay the WAL log when recovering from failure (cherry picked from commit 3c3fa632e6ba45ce536065aa1145698385301fb2) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
* [SPARK-4033][Examples]Input of the SparkPi too big causes the emption exceptionhuangzhaowei2015-01-161-2/+2
| | | | | | | | | | | | | | | If input of the SparkPi args is larger than the 25000, the integer 'n' inside the code will be overflow, and may be a negative number. And it causes the (0 until n) Seq as an empty seq, then doing the action 'reduce' will throw the UnsupportedOperationException("empty collection"). The max size of the input of sc.parallelize is Int.MaxValue - 1, not the Int.MaxValue. Author: huangzhaowei <carlmartinmax@gmail.com> Closes #2874 from SaintBacchus/SparkPi and squashes the following commits: 62d7cd7 [huangzhaowei] Add a commit to explain the modify 4cdc388 [huangzhaowei] Update SparkPi.scala 9a2fb7b [huangzhaowei] Input of the SparkPi is too big
* [SPARK-5234][ml]examples for ml don't have sparkContext.stopYuhao Yang2015-01-143-0/+6
| | | | | | | | | | | | | | | JIRA issue: https://issues.apache.org/jira/browse/SPARK-5234 simply add the call. Author: Yuhao Yang <yuhao@yuhaodevbox.sh.intel.com> Closes #4044 from hhbyyh/addscStop and squashes the following commits: c1f75ac [Yuhao Yang] add SparkContext.stop to 3 ml examples (cherry picked from commit 76389c5b99183e456ff85fd92ea68d95c4c13e82) Signed-off-by: Xiangrui Meng <meng@databricks.com>
* [SPARK-1010] Clean up uses of System.setProperty in unit testsJosh Rosen2014-12-311-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several of our tests call System.setProperty (or test code which implicitly sets system properties) and don't always reset/clear the modified properties, which can create ordering dependencies between tests and cause hard-to-diagnose failures. This patch removes most uses of System.setProperty from our tests, since in most cases we can use SparkConf to set these configurations (there are a few exceptions, including the tests of SparkConf itself). For the cases where we continue to use System.setProperty, this patch introduces a `ResetSystemProperties` ScalaTest mixin class which snapshots the system properties before individual tests and to automatically restores them on test completion / failure. See the block comment at the top of the ResetSystemProperties class for more details. Author: Josh Rosen <joshrosen@databricks.com> Closes #3739 from JoshRosen/cleanup-system-properties-in-tests and squashes the following commits: 0236d66 [Josh Rosen] Replace setProperty uses in two example programs / tools 3888fe3 [Josh Rosen] Remove setProperty use in LocalJavaStreamingContext 4f4031d [Josh Rosen] Add note on why SparkSubmitSuite needs ResetSystemProperties 4742a5b [Josh Rosen] Clarify ResetSystemProperties trait inheritance ordering. 0eaf0b6 [Josh Rosen] Remove setProperty call in TaskResultGetterSuite. 7a3d224 [Josh Rosen] Fix trait ordering 3fdb554 [Josh Rosen] Remove setProperty call in TaskSchedulerImplSuite bee20df [Josh Rosen] Remove setProperty calls in SparkContextSchedulerCreationSuite 655587c [Josh Rosen] Remove setProperty calls in JobCancellationSuite 3f2f955 [Josh Rosen] Remove System.setProperty calls in DistributedSuite cfe9cce [Josh Rosen] Remove use of system properties in SparkContextSuite 8783ab0 [Josh Rosen] Remove TestUtils.setSystemProperty, since it is subsumed by the ResetSystemProperties trait. 633a84a [Josh Rosen] Remove use of system properties in FileServerSuite 25bfce2 [Josh Rosen] Use ResetSystemProperties in UtilsSuite 1d1aa5a [Josh Rosen] Use ResetSystemProperties in SizeEstimatorSuite dd9492b [Josh Rosen] Use ResetSystemProperties in AkkaUtilsSuite b0daff2 [Josh Rosen] Use ResetSystemProperties in BlockManagerSuite e9ded62 [Josh Rosen] Use ResetSystemProperties in TaskSchedulerImplSuite 5b3cb54 [Josh Rosen] Use ResetSystemProperties in SparkListenerSuite 0995c4b [Josh Rosen] Use ResetSystemProperties in SparkContextSchedulerCreationSuite c83ded8 [Josh Rosen] Use ResetSystemProperties in SparkConfSuite 51aa870 [Josh Rosen] Use withSystemProperty in ShuffleSuite 60a63a1 [Josh Rosen] Use ResetSystemProperties in JobCancellationSuite 14a92e4 [Josh Rosen] Use withSystemProperty in FileServerSuite 628f46c [Josh Rosen] Use ResetSystemProperties in DistributedSuite 9e3e0dd [Josh Rosen] Add ResetSystemProperties test fixture mixin; use it in SparkSubmitSuite. 4dcea38 [Josh Rosen] Move withSystemProperty to TestUtils class. (cherry picked from commit 352ed6bbe3c3b67e52e298e7c535ae414d96beca) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
* [SPARK-4932] Add help comments in AnalyticsTakeshi Yamamuro2014-12-231-0/+4
| | | | | | | | | | | | | Trivial modifications for usability. Author: Takeshi Yamamuro <linguin.m.s@gmail.com> Closes #3775 from maropu/AddHelpCommentInAnalytics and squashes the following commits: fbea8f5 [Takeshi Yamamuro] Add help comments in Analytics (cherry picked from commit 9c251c555f5ee527143d0cdb9e6c3cb7530fc8f8) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
* [Minor] Improve some code in BroadcastTest for shortcarlmartin2014-12-221-4/+1
| | | | | | | | | | | | | | | | | Using val arr1 = (0 until num).toArray instead of val arr1 = new Array[Int](num) for (i <- 0 until arr1.length) { arr1(i) = i } for short. Author: carlmartin <carlmartinmax@gmail.com> Closes #3750 from SaintBacchus/BroadcastTest and squashes the following commits: 43adb70 [carlmartin] Improve some code in BroadcastTest for short
* [SPARK-4880] remove spark.locality.wait in AnalyticsErnest2014-12-181-1/+1
| | | | | | | | | | | | | | spark.locality.wait set to 100000 in examples/graphx/Analytics.scala. Should be left to the user. Author: Ernest <earneyzxl@gmail.com> Closes #3730 from Earne/SPARK-4880 and squashes the following commits: d79ed04 [Ernest] remove spark.locality.wait in Analytics (cherry picked from commit a7ed6f3cc537f57de87d28e8466ca88fbfff53b5) Signed-off-by: Reynold Xin <rxin@databricks.com>
* Preparing development version 1.2.1-SNAPSHOTPatrick Wendell2014-12-101-1/+1
|
* Preparing Spark release v1.2.0-rc2v1.2.0Patrick Wendell2014-12-101-1/+1
|
* Revert "Preparing Spark release v1.2.0-rc2"Patrick Wendell2014-12-101-1/+1
| | | | This reverts commit 2b72c569a674cccf79ebbe8d067b8dbaaf78007f.
* Revert "Preparing development version 1.2.1-SNAPSHOT"Patrick Wendell2014-12-101-1/+1
| | | | This reverts commit bc05df8a23ba7ad485f6844f28f96551b13ba461.
* [SPARK-4774] [SQL] Makes HiveFromSpark more portableKostas Sakellis2014-12-081-2/+11
| | | | | | | | | | | | | | | HiveFromSpark read the kv1.txt file from SPARK_HOME/examples/src/main/resources/kv1.txt which assumed you had a source tree checked out. Now we copy the kv1.txt file to a temporary file and delete it when the jvm shuts down. This allows us to run this example outside of a spark source tree. Author: Kostas Sakellis <kostas@cloudera.com> Closes #3628 from ksakellis/kostas-spark-4774 and squashes the following commits: 6770f83 [Kostas Sakellis] [SPARK-4774] [SQL] Makes HiveFromSpark more portable (cherry picked from commit d6a972b3e4dc35a2d95df47d256462b325f4bda6) Signed-off-by: Michael Armbrust <michael@databricks.com>
* Preparing development version 1.2.1-SNAPSHOTPatrick Wendell2014-12-041-1/+1
|
* Preparing Spark release v1.2.0-rc2Patrick Wendell2014-12-041-1/+1
|
* Revert "Preparing Spark release v1.2.0-rc1"Patrick Wendell2014-12-041-1/+1
| | | | This reverts commit 1056e9ec13203d0c51564265e94d77a054498fdb.
* Revert "Preparing development version 1.2.1-SNAPSHOT"Patrick Wendell2014-12-041-1/+1
| | | | This reverts commit 00316cc87983b844f6603f351a8f0b84fe1f6035.
* [FIX][DOC] Fix broken links in ml-guide.mdXiangrui Meng2014-12-042-2/+1
| | | | | | | | | | | | | | | and some minor changes in ScalaDoc. Author: Xiangrui Meng <meng@databricks.com> Closes #3601 from mengxr/SPARK-4575-fix and squashes the following commits: c559768 [Xiangrui Meng] minor code update ce94da8 [Xiangrui Meng] Java Bean -> JavaBean 0b5c182 [Xiangrui Meng] fix links in ml-guide (cherry picked from commit 7e758d709286e73d2c878d4a2d2b4606386142c7) Signed-off-by: Xiangrui Meng <meng@databricks.com>
* [SPARK-4575] [mllib] [docs] spark.ml pipelines doc + bug fixesJoseph K. Bradley2014-12-046-5/+457
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Documentation: * Added ml-guide.md, linked from mllib-guide.md * Updated mllib-guide.md with small section pointing to ml-guide.md Examples: * CrossValidatorExample * SimpleParamsExample * (I copied these + the SimpleTextClassificationPipeline example into the ml-guide.md) Bug fixes: * PipelineModel: did not use ParamMaps correctly * UnaryTransformer: issues with TypeTag serialization (Thanks to mengxr for that fix!) CC: mengxr shivaram etrain Documentation for Pipelines: I know the docs are not complete, but the goal is to have enough to let interested people get started using spark.ml and to add more docs once the package is more established/complete. Author: Joseph K. Bradley <joseph@databricks.com> Author: jkbradley <joseph.kurata.bradley@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Closes #3588 from jkbradley/ml-package-docs and squashes the following commits: d393b5c [Joseph K. Bradley] fixed bug in Pipeline (typo from last commit). updated examples for CV and Params for spark.ml c38469c [Joseph K. Bradley] Updated ml-guide with CV examples 99f88c2 [Joseph K. Bradley] Fixed bug in PipelineModel.transform* with usage of params. Updated CrossValidatorExample to use more training examples so it is less likely to get a 0-size fold. ea34dc6 [jkbradley] Merge pull request #4 from mengxr/ml-package-docs 3b83ec0 [Xiangrui Meng] replace TypeTag with explicit datatype 41ad9b1 [Joseph K. Bradley] Added examples for spark.ml: SimpleParamsExample + Java version, CrossValidatorExample + Java version. CrossValidatorExample not working yet. Added programming guide for spark.ml, but need to add CrossValidatorExample to it once CrossValidatorExample works. (cherry picked from commit 469a6e5f3bdd5593b3254bc916be8236e7c6cb74) Signed-off-by: Xiangrui Meng <meng@databricks.com>
* [SPARK-4580] [SPARK-4610] [mllib] [docs] Documentation for tree ensembles + ↵Joseph K. Bradley2014-12-046-10/+241
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DecisionTree API fix Major changes: * Added programming guide sections for tree ensembles * Added examples for tree ensembles * Updated DecisionTree programming guide with more info on parameters * **API change**: Standardized the tree parameter for the number of classes (for classification) Minor changes: * Updated decision tree documentation * Updated existing tree and tree ensemble examples * Use train/test split, and compute test error instead of training error. * Fixed decision_tree_runner.py to actually use the number of classes it computes from data. (small bug fix) Note: I know this is a lot of lines, but most is covered by: * Programming guide sections for gradient boosting and random forests. (The changes are probably best viewed by generating the docs locally.) * New examples (which were copied from the programming guide) * The "numClasses" renaming I have run all examples and relevant unit tests. CC: mengxr manishamde codedeft Author: Joseph K. Bradley <joseph@databricks.com> Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com> Closes #3461 from jkbradley/ensemble-docs and squashes the following commits: 70a75f3 [Joseph K. Bradley] updated forest vs boosting comparison d1de753 [Joseph K. Bradley] Added note about toString and toDebugString for DecisionTree to migration guide 8e87f8f [Joseph K. Bradley] Combined GBT and RandomForest guides into one ensembles guide 6fab846 [Joseph K. Bradley] small fixes based on review b9f8576 [Joseph K. Bradley] updated decision tree doc 375204c [Joseph K. Bradley] fixed python style 2b60b6e [Joseph K. Bradley] merged Java RandomForest examples into 1 file. added header. Fixed small bug in same example in the programming guide. 706d332 [Joseph K. Bradley] updated python DT runner to print full model if it is small c76c823 [Joseph K. Bradley] added migration guide for mllib abe5ed7 [Joseph K. Bradley] added examples for random forest in Java and Python to examples folder 07fc11d [Joseph K. Bradley] Renamed numClassesForClassification to numClasses everywhere in trees and ensembles. This is a breaking API change, but it was necessary to correct an API inconsistency in Spark 1.1 (where Python DecisionTree used numClasses but Scala used numClassesForClassification). cdfdfbc [Joseph K. Bradley] added examples for GBT 6372a2b [Joseph K. Bradley] updated decision tree examples to use random split. tested all of them. ad3e695 [Joseph K. Bradley] added gbt and random forest to programming guide. still need to update their examples (cherry picked from commit 657a88835d8bf22488b53d50f75281d7dc32442e) Signed-off-by: Xiangrui Meng <meng@databricks.com>
* [SPARK-4710] [mllib] Eliminate MLlib compilation warningsJoseph K. Bradley2014-12-032-8/+10
| | | | | | | | | | | | | | | | | Renamed StreamingKMeans to StreamingKMeansExample to avoid warning about name conflict with StreamingKMeans class. Added import to DecisionTreeRunner to eliminate warning. CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #3568 from jkbradley/ml-compilation-warnings and squashes the following commits: 64d6bc4 [Joseph K. Bradley] Updated DecisionTreeRunner.scala and StreamingKMeans.scala to eliminate compilation warnings, including renaming StreamingKMeans to StreamingKMeansExample. (cherry picked from commit 4ac21511547dc6227d05bf61821cd2d9ab5ede74) Signed-off-by: Xiangrui Meng <meng@databricks.com>
* [SQL] Minor fix for doc and commentwangfei2014-12-011-3/+4
| | | | | | | | | | | Author: wangfei <wangfei1@huawei.com> Closes #3533 from scwf/sql-doc1 and squashes the following commits: 962910b [wangfei] doc and comment fix (cherry picked from commit 7b79957879db4dfcc7c3601cb40ac4fd576259a5) Signed-off-by: Michael Armbrust <michael@databricks.com>
* Preparing development version 1.2.1-SNAPSHOTPatrick Wendell2014-11-281-1/+1
|
* Preparing Spark release v1.2.0-rc1Patrick Wendell2014-11-281-1/+1
|
* Revert "Preparing Spark release v1.2.0-rc1"Patrick Wendell2014-11-281-1/+1
| | | | This reverts commit 39c7d1c1f9a7785285cf4c20dfbffd96f72d5634.
* Revert "Preparing development version 1.2.1-SNAPSHOT"Patrick Wendell2014-11-281-1/+1
| | | | This reverts commit fc7bff00ac731d2632213a98cd92dc5e84ce7dcd.
* Preparing development version 1.2.1-SNAPSHOTPatrick Wendell2014-11-281-1/+1
|
* Preparing Spark release v1.2.0-rc1Patrick Wendell2014-11-281-1/+1
|
* Revert "Preparing Spark release v1.2.0-rc1"Patrick Wendell2014-11-261-1/+1
| | | | This reverts commit cc2c05e4ee81d2f34873a2ebb9a5272867cb65c2.
* Revert "Preparing development version 1.2.1-SNAPSHOT"Patrick Wendell2014-11-261-1/+1
| | | | This reverts commit 380eba5f49eca1dbd4084e6c84e19866fffd4efa.
* Preparing development version 1.2.1-SNAPSHOTPatrick Wendell2014-11-261-1/+1
|
* Preparing Spark release v1.2.0-rc1Patrick Wendell2014-11-261-1/+1
|
* Revert "Preparing Spark release v1.2.0-rc1"Patrick Wendell2014-11-261-1/+1
| | | | This reverts commit 5247dd859b95a440baa562b9827bdeb26aa6530e.
* Revert "Preparing development version 1.2.1-SNAPSHOT"Patrick Wendell2014-11-261-1/+1
| | | | This reverts commit 79df6b43ae762263a8120f423ddb4a0811dd4b6f.
* Preparing development version 1.2.1-SNAPSHOTPatrick Wendell2014-11-261-1/+1
|
* Preparing Spark release v1.2.0-rc1Patrick Wendell2014-11-261-1/+1
|
* Revert "Preparing Spark release v1.2.0-rc1"Patrick Wendell2014-11-261-1/+1
| | | | This reverts commit db7f4a898af22a02b36428507f8ef2b429d78dc1.
* Revert "Preparing development version 1.2.1-SNAPSHOT"Patrick Wendell2014-11-261-1/+1
| | | | This reverts commit d7b1ecb25676d228deb6fe05efdb4e2ab9c3e30b.