aboutsummaryrefslogtreecommitdiff
path: root/docs/scala-programming-guide.md
Commit message (Collapse)AuthorAgeFilesLines
* misleading task number of groupByKeyChen Chao2014-04-161-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | "By default, this uses only 8 parallel tasks to do the grouping." is a big misleading. Please refer to https://github.com/apache/spark/pull/389 detail is as following code : def defaultPartitioner(rdd: RDD[_], others: RDD[_]*): Partitioner = { val bySize = (Seq(rdd) ++ others).sortBy(_.partitions.size).reverse for (r <- bySize if r.partitioner.isDefined) { return r.partitioner.get } if (rdd.context.conf.contains("spark.default.parallelism")) { new HashPartitioner(rdd.context.defaultParallelism) } else { new HashPartitioner(bySize.head.partitions.size) } } Author: Chen Chao <crazyjvm@gmail.com> Closes #403 from CrazyJvm/patch-4 and squashes the following commits: 42f6c9e [Chen Chao] fix format 829a995 [Chen Chao] fix format 1568336 [Chen Chao] misleading task number of groupByKey
* SPARK-1099: Introduce local[*] mode to infer number of coresAaron Davidson2014-04-071-2/+3
| | | | | | | | | | | This is the default mode for running spark-shell and pyspark, intended to allow users running spark for the first time to see the performance benefits of using multiple cores, while not breaking backwards compatibility for users who use "local" mode and expect exactly 1 core. Author: Aaron Davidson <aaron@databricks.com> Closes #182 from aarondav/110 and squashes the following commits: a88294c [Aaron Davidson] Rebased changes for new spark-shell a9f393e [Aaron Davidson] SPARK-1099: Introduce local[*] mode to infer number of cores
* SPARK-1305: Support persisting RDD's directly to TachyonHaoyuan Li2014-04-041-34/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the PR#468 of apache-incubator-spark to the apache-spark "Adding an option to persist Spark RDD blocks into Tachyon." Author: Haoyuan Li <haoyuan@cs.berkeley.edu> Author: RongGu <gurongwalker@gmail.com> Closes #158 from RongGu/master and squashes the following commits: 72b7768 [Haoyuan Li] merge master 9f7fa1b [Haoyuan Li] fix code style ae7834b [Haoyuan Li] minor cleanup a8b3ec6 [Haoyuan Li] merge master branch e0f4891 [Haoyuan Li] better check offheap. 55b5918 [RongGu] address matei's comment on the replication of offHeap storagelevel 7cd4600 [RongGu] remove some logic code for tachyonstore's replication 51149e7 [RongGu] address aaron's comment on returning value of the remove() function in tachyonstore 8adfcfa [RongGu] address arron's comment on inTachyonSize 120e48a [RongGu] changed the root-level dir name in Tachyon 5cc041c [Haoyuan Li] address aaron's comments 9b97935 [Haoyuan Li] address aaron's comments d9a6438 [Haoyuan Li] fix for pspark 77d2703 [Haoyuan Li] change python api.git status 3dcace4 [Haoyuan Li] address matei's comments 91fa09d [Haoyuan Li] address patrick's comments 589eafe [Haoyuan Li] use TRY_CACHE instead of MUST_CACHE 64348b2 [Haoyuan Li] update conf docs. ed73e19 [Haoyuan Li] Merge branch 'master' of github.com:RongGu/spark-1 619a9a8 [RongGu] set number of directories in TachyonStore back to 64; added a TODO tag for duplicated code from the DiskStore be79d77 [RongGu] find a way to clean up some unnecessay metods and classed to make the code simpler 49cc724 [Haoyuan Li] update docs with off_headp option 4572f9f [RongGu] reserving the old apply function API of StorageLevel 04301d3 [RongGu] rename StorageLevel.TACHYON to Storage.OFF_HEAP c9aeabf [RongGu] rename the StorgeLevel.TACHYON as StorageLevel.OFF_HEAP 76805aa [RongGu] unifies the config properties name prefix; add the configs into docs/configuration.md e700d9c [RongGu] add the SparkTachyonHdfsLR example and some comments fd84156 [RongGu] use randomUUID to generate sparkapp directory name on tachyon;minor code style fix 939e467 [Haoyuan Li] 0.4.1-thrift from maven central 86a2eab [Haoyuan Li] tachyon 0.4.1-thrift is in the staging repo. but jenkins failed to download it. temporarily revert it back to 0.4.1 16c5798 [RongGu] make the dependency on tachyon as tachyon-0.4.1-thrift eacb2e8 [RongGu] Merge branch 'master' of https://github.com/RongGu/spark-1 bbeb4de [RongGu] fix the JsonProtocolSuite test failure problem 6adb58f [RongGu] Merge branch 'master' of https://github.com/RongGu/spark-1 d827250 [RongGu] fix JsonProtocolSuie test failure 716e93b [Haoyuan Li] revert the version ca14469 [Haoyuan Li] bump tachyon version to 0.4.1-thrift 2825a13 [RongGu] up-merging to the current master branch of the apache spark 6a22c1a [Haoyuan Li] fix scalastyle 8968b67 [Haoyuan Li] exclude more libraries from tachyon dependency to be the same as referencing tachyon-client. 77be7e8 [RongGu] address mateiz's comment about the temp folder name problem. The implementation followed mateiz's advice. 1dcadf9 [Haoyuan Li] typo bf278fa [Haoyuan Li] fix python tests e82909c [Haoyuan Li] minor cleanup 776a56c [Haoyuan Li] address patrick's and ali's comments from the previous PR 8859371 [Haoyuan Li] various minor fixes and clean up e3ddbba [Haoyuan Li] add doc to use Tachyon cache mode. fcaeab2 [Haoyuan Li] address Aaron's comment e554b1e [Haoyuan Li] add python code 47304b3 [Haoyuan Li] make tachyonStore in BlockMananger lazy val; add more comments StorageLevels. dc8ef24 [Haoyuan Li] add old storelevel constructor e01a271 [Haoyuan Li] update tachyon 0.4.1 8011a96 [RongGu] fix a brought-in mistake in StorageLevel 70ca182 [RongGu] a bit change in comment 556978b [RongGu] fix the scalastyle errors 791189b [RongGu] "Adding an option to persist Spark RDD blocks into Tachyon." move the PR#468 of apache-incubator-spark to the apache-spark
* Removed reference to incubation in Spark user docs.Reynold Xin2014-02-271-1/+1
| | | | | | | | Author: Reynold Xin <rxin@apache.org> Closes #2 from rxin/docs and squashes the following commits: 08bbd5f [Reynold Xin] Removed reference to incubation in Spark user docs.
* SPARK-1117: update accumulator docsXiangrui Meng2014-02-211-1/+1
| | | | | | | | | | | | The current doc hints spark doesn't support accumulators of type `Long`, which is wrong. JIRA: https://spark-project.atlassian.net/browse/SPARK-1117 Author: Xiangrui Meng <meng@databricks.com> Closes #631 from mengxr/acc and squashes the following commits: 45ecd25 [Xiangrui Meng] update accumulator docs
* [SPARK-1105] fix site scala version error in docsCodingCat2014-02-191-3/+3
| | | | | | | | | | | | | https://spark-project.atlassian.net/browse/SPARK-1105 fix site scala version error Author: CodingCat <zhunansjtu@gmail.com> Closes #618 from CodingCat/doc_version and squashes the following commits: 39bb8aa [CodingCat] more fixes 65bedb0 [CodingCat] fix site scala version error in doc
* Deprecate mapPartitionsWithSplit in PySpark.Josh Rosen2014-01-231-2/+2
| | | | | | Also, replace the last reference to it in the docs. This fixes SPARK-1026.
* Code review feedbackHolden Karau2014-01-051-1/+1
|
* Merge remote-tracking branch 'apache-github/master' into remove-binariesPatrick Wendell2014-01-031-7/+7
|\ | | | | | | | | | | Conflicts: core/src/test/scala/org/apache/spark/DriverSuite.scala docs/python-programming-guide.md
| * run-example -> bin/run-examplePrashant Sharma2014-01-021-2/+2
| |
| * spark-shell -> bin/spark-shellPrashant Sharma2014-01-021-5/+5
| |
* | Merge branch 'master' into spark-1002-remove-jarsPrashant Sharma2014-01-031-1/+3
|\|
| * Updated docs for SparkConf and handled review commentsMatei Zaharia2013-12-301-1/+3
| |
* | Removed sbt folder and changed docs accordinglyPrashant Sharma2014-01-021-1/+1
| |
* | Revert "Merge pull request #310 from jyunfan/master"Reynold Xin2013-12-281-1/+1
| | | | | | | | | | This reverts commit 79b20e4dbe3dcd8559ec8316784d3334bb55868b, reversing changes made to 7375047d516c5aa69221611f5f7b0f1d367039af.
* | Fix typo in the Accumulators sectionJyun-Fan Tsai2013-12-291-1/+1
|/ | | val => var
* changed the example links in the scala-programming-guidfengdong2013-12-181-1/+1
|
* Fixed the example link.fengdong2013-12-181-1/+1
|
* Docs: Fix links to RDD API documentationAaron Davidson2013-10-221-3/+3
|
* More fair scheduler docs and property names.Matei Zaharia2013-09-081-2/+2
| | | | | Also changed uses of "job" terminology to "application" when they referred to an entire Spark program, to avoid confusion.
* Move some classes to more appropriate packages:Matei Zaharia2013-09-011-1/+1
| | | | | | * RDD, *RDDFunctions -> org.apache.spark.rdd * Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util * JavaSerializer, KryoSerializer -> org.apache.spark.serializer
* Fix more URLs in docsMatei Zaharia2013-09-011-1/+5
|
* Update docs for new packageMatei Zaharia2013-09-011-8/+8
|
* Update docs about HDFS versionsMatei Zaharia2013-08-301-3/+11
|
* Change build and run instructions to use assembliesMatei Zaharia2013-08-291-1/+1
| | | | | | | | | | | | | | | | This commit makes Spark invocation saner by using an assembly JAR to find all of Spark's dependencies instead of adding all the JARs in lib_managed. It also packages the examples into an assembly and uses that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script with two better-named scripts: "run-examples" for examples, and "spark-class" for Spark internal classes (e.g. REPL, master, etc). This is also designed to minimize the confusion people have in trying to use "run" to run their own classes; it's not meant to do that, but now at least if they look at it, they can modify run-examples to do a decent job for them. As part of this, Bagel's examples are also now properly moved to the examples package instead of bagel.
* ADD_JARS environment variable for spark-shellMatei Zaharia2013-06-221-2/+8
|
* Docs: Mention spark shell's default for MASTERAndrew Ash2013-05-151-0/+2
|
* Allow passing sparkHome and JARs to StreamingContext constructorMatei Zaharia2013-02-251-2/+2
| | | | | Also warns if spark.cleaner.ttl is not set in the version where you pass your own SparkContext.
* Merge branch 'master' of https://github.com/mesos/spark into commutativeMark Hamstra2013-02-081-1/+2
|\ | | | | | | | | Conflicts: core/src/main/scala/spark/RDD.scala
| * Made StorageLevel constructor private, and added StorageLevels.create() to ↵Tathagata Das2013-01-231-1/+2
| | | | | | | | the Java API. Updates scala and java programming guides.
* | Change docs on 'reduce' since the merging of local reduces no longer preservesMark Hamstra2013-02-051-1/+1
|/ | | | ordering, so the reduce function must also be commutative.
* Fix Spark groupId in Scala Programming Guide.Josh Rosen2012-10-261-1/+1
|
* Merge branch 'dev' of github.com:mesos/spark into devMatei Zaharia2012-10-121-1/+7
|\
| * Updating programming guide with new link instructionsPatrick Wendell2012-10-091-1/+7
| |
* | TweakMatei Zaharia2012-10-121-1/+1
|/
* Updates to documentation:Matei Zaharia2012-10-091-2/+6
| | | | | | | | - Edited quick start and tuning guide to simplify them a little - Simplified top menu bar - Made private a SparkContext constructor parameter that was left as public - Various small fixes
* Updating lots of docs to use the new special version number variables,Andy Konwinski2012-10-081-1/+1
| | | | | also adding the version to the navbar so it is easy to tell which version of Spark these docs were compiled for.
* Adds liquid variables to docs templating system so that they can be usedAndy Konwinski2012-10-081-12/+12
| | | | | | | | | throughout the docs: SPARK_VERSION, SCALA_VERSION, and MESOS_VERSION. To use them, e.g. use {{site.SPARK_VERSION}}. Also removes uses of {{HOME_PATH}} which were being resolved to "" by the templating system anyway.
* Added mapPartitionsWithSplit to the programming guide.Reynold Xin2012-09-291-0/+6
|
* Allow controlling number of splits in distinct().Josh Rosen2012-09-281-0/+4
|
* Renamed storage levels to something cleaner; fixes #223.Matei Zaharia2012-09-271-12/+12
|
* Updates to standalone cluster, web UI and deploy docs.Matei Zaharia2012-09-261-1/+1
|
* More updates to docs, including tuning guideMatei Zaharia2012-09-261-33/+30
|
* Doc tweaksMatei Zaharia2012-09-261-2/+2
|
* Fixes to Java guideMatei Zaharia2012-09-251-1/+6
|
* Various enhancements to the programming guide and HTML/CSSMatei Zaharia2012-09-251-32/+171
|
* More updates to documentationMatei Zaharia2012-09-251-10/+16
|
* - Add docs/api to .gitignoreAndy Konwinski2012-09-161-0/+187
- Rework/expand the nav bar with more of the docs site - Removing parts of docs about EC2 and Mesos that differentiate between running 0.5 and before - Merged subheadings from running-on-amazon-ec2.html that are still relevant (i.e., "Using a newer version of Spark" and "Accessing Data in S3") into ec2-scripts.html and deleted running-on-amazon-ec2.html - Added some TODO comments to a few docs - Updated the blurb about AMP Camp - Renamed programming-guide to spark-programming-guide - Fixing typos/etc. in Standalone Spark doc