aboutsummaryrefslogtreecommitdiff
path: root/docs/scala-programming-guide.md
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-1439, SPARK-1440] Generate unified Scaladoc across projects and JavadocsMatei Zaharia2014-04-211-5/+5
| | | | | | | | | | | | | | | | | | | | | | I used the sbt-unidoc plugin (https://github.com/sbt/sbt-unidoc) to create a unified Scaladoc of our public packages, and generate Javadocs as well. One limitation is that I haven't found an easy way to exclude packages in the Javadoc; there is a SBT task that identifies Java sources to run javadoc on, but it's been very difficult to modify it from outside to change what is set in the unidoc package. Some SBT-savvy people should help with this. The Javadoc site also lacks package-level descriptions and things like that, so we may want to look into that. We may decide not to post these right now if it's too limited compared to the Scala one. Example of the built doc site: http://people.csail.mit.edu/matei/spark-unified-docs/ Author: Matei Zaharia <matei@databricks.com> This patch had conflicts when merged, resolved by Committer: Patrick Wendell <pwendell@gmail.com> Closes #457 from mateiz/better-docs and squashes the following commits: a63d4a3 [Matei Zaharia] Skip Java/Scala API docs for Python package 5ea1f43 [Matei Zaharia] Fix links to Java classes in Java guide, fix some JS for scrolling to anchors on page load f05abc0 [Matei Zaharia] Don't include java.lang package names 995e992 [Matei Zaharia] Skip internal packages and class names with $ in JavaDoc a14a93c [Matei Zaharia] typo 76ce64d [Matei Zaharia] Add groups to Javadoc index page, and a first package-info.java ed6f994 [Matei Zaharia] Generate JavaDoc as well, add titles, update doc site to use unified docs acb993d [Matei Zaharia] Add Unidoc plugin for the projects we want Unidoced
* Clean up and simplify Spark configurationPatrick Wendell2014-04-211-22/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Over time as we've added more deployment modes, this have gotten a bit unwieldy with user-facing configuration options in Spark. Going forward we'll advise all users to run `spark-submit` to launch applications. This is a WIP patch but it makes the following improvements: 1. Improved `spark-env.sh.template` which was missing a lot of things users now set in that file. 2. Removes the shipping of SPARK_CLASSPATH, SPARK_JAVA_OPTS, and SPARK_LIBRARY_PATH to the executors on the cluster. This was an ugly hack. Instead it introduces config variables spark.executor.extraJavaOpts, spark.executor.extraLibraryPath, and spark.executor.extraClassPath. 3. Adds ability to set these same variables for the driver using `spark-submit`. 4. Allows you to load system properties from a `spark-defaults.conf` file when running `spark-submit`. This will allow setting both SparkConf options and other system properties utilized by `spark-submit`. 5. Made `SPARK_LOCAL_IP` an environment variable rather than a SparkConf property. This is more consistent with it being set on each node. Author: Patrick Wendell <pwendell@gmail.com> Closes #299 from pwendell/config-cleanup and squashes the following commits: 127f301 [Patrick Wendell] Improvements to testing a006464 [Patrick Wendell] Moving properties file template. b4b496c [Patrick Wendell] spark-defaults.properties -> spark-defaults.conf 0086939 [Patrick Wendell] Minor style fixes af09e3e [Patrick Wendell] Mention config file in docs and clean-up docs b16e6a2 [Patrick Wendell] Cleanup of spark-submit script and Scala quick start guide af0adf7 [Patrick Wendell] Automatically add user jar a56b125 [Patrick Wendell] Responses to Tom's review d50c388 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into config-cleanup a762901 [Patrick Wendell] Fixing test failures ffa00fe [Patrick Wendell] Review feedback fda0301 [Patrick Wendell] Note 308f1f6 [Patrick Wendell] Properly escape quotes and other clean-up for YARN e83cd8f [Patrick Wendell] Changes to allow re-use of test applications be42f35 [Patrick Wendell] Handle case where SPARK_HOME is not set c2a2909 [Patrick Wendell] Test compile fixes 4ee6f9d [Patrick Wendell] Making YARN doc changes consistent afc9ed8 [Patrick Wendell] Cleaning up line limits and two compile errors. b08893b [Patrick Wendell] Additional improvements. ace4ead [Patrick Wendell] Responses to review feedback. b72d183 [Patrick Wendell] Review feedback for spark env file 46555c1 [Patrick Wendell] Review feedback and import clean-ups 437aed1 [Patrick Wendell] Small fix 761ebcd [Patrick Wendell] Library path and classpath for drivers 7cc70e4 [Patrick Wendell] Clean up terminology inside of spark-env script 5b0ba8e [Patrick Wendell] Don't ship executor envs 84cc5e5 [Patrick Wendell] Small clean-up 1f75238 [Patrick Wendell] SPARK_JAVA_OPTS --> SPARK_MASTER_OPTS for master settings 4982331 [Patrick Wendell] Remove SPARK_LIBRARY_PATH 6eaf7d0 [Patrick Wendell] executorJavaOpts 0faa3b6 [Patrick Wendell] Stash of adding config options in submit script and YARN ac2d65e [Patrick Wendell] Change spark.local.dir -> SPARK_LOCAL_DIRS
* misleading task number of groupByKeyChen Chao2014-04-161-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | "By default, this uses only 8 parallel tasks to do the grouping." is a big misleading. Please refer to https://github.com/apache/spark/pull/389 detail is as following code : def defaultPartitioner(rdd: RDD[_], others: RDD[_]*): Partitioner = { val bySize = (Seq(rdd) ++ others).sortBy(_.partitions.size).reverse for (r <- bySize if r.partitioner.isDefined) { return r.partitioner.get } if (rdd.context.conf.contains("spark.default.parallelism")) { new HashPartitioner(rdd.context.defaultParallelism) } else { new HashPartitioner(bySize.head.partitions.size) } } Author: Chen Chao <crazyjvm@gmail.com> Closes #403 from CrazyJvm/patch-4 and squashes the following commits: 42f6c9e [Chen Chao] fix format 829a995 [Chen Chao] fix format 1568336 [Chen Chao] misleading task number of groupByKey
* SPARK-1099: Introduce local[*] mode to infer number of coresAaron Davidson2014-04-071-2/+3
| | | | | | | | | | | This is the default mode for running spark-shell and pyspark, intended to allow users running spark for the first time to see the performance benefits of using multiple cores, while not breaking backwards compatibility for users who use "local" mode and expect exactly 1 core. Author: Aaron Davidson <aaron@databricks.com> Closes #182 from aarondav/110 and squashes the following commits: a88294c [Aaron Davidson] Rebased changes for new spark-shell a9f393e [Aaron Davidson] SPARK-1099: Introduce local[*] mode to infer number of cores
* SPARK-1305: Support persisting RDD's directly to TachyonHaoyuan Li2014-04-041-34/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the PR#468 of apache-incubator-spark to the apache-spark "Adding an option to persist Spark RDD blocks into Tachyon." Author: Haoyuan Li <haoyuan@cs.berkeley.edu> Author: RongGu <gurongwalker@gmail.com> Closes #158 from RongGu/master and squashes the following commits: 72b7768 [Haoyuan Li] merge master 9f7fa1b [Haoyuan Li] fix code style ae7834b [Haoyuan Li] minor cleanup a8b3ec6 [Haoyuan Li] merge master branch e0f4891 [Haoyuan Li] better check offheap. 55b5918 [RongGu] address matei's comment on the replication of offHeap storagelevel 7cd4600 [RongGu] remove some logic code for tachyonstore's replication 51149e7 [RongGu] address aaron's comment on returning value of the remove() function in tachyonstore 8adfcfa [RongGu] address arron's comment on inTachyonSize 120e48a [RongGu] changed the root-level dir name in Tachyon 5cc041c [Haoyuan Li] address aaron's comments 9b97935 [Haoyuan Li] address aaron's comments d9a6438 [Haoyuan Li] fix for pspark 77d2703 [Haoyuan Li] change python api.git status 3dcace4 [Haoyuan Li] address matei's comments 91fa09d [Haoyuan Li] address patrick's comments 589eafe [Haoyuan Li] use TRY_CACHE instead of MUST_CACHE 64348b2 [Haoyuan Li] update conf docs. ed73e19 [Haoyuan Li] Merge branch 'master' of github.com:RongGu/spark-1 619a9a8 [RongGu] set number of directories in TachyonStore back to 64; added a TODO tag for duplicated code from the DiskStore be79d77 [RongGu] find a way to clean up some unnecessay metods and classed to make the code simpler 49cc724 [Haoyuan Li] update docs with off_headp option 4572f9f [RongGu] reserving the old apply function API of StorageLevel 04301d3 [RongGu] rename StorageLevel.TACHYON to Storage.OFF_HEAP c9aeabf [RongGu] rename the StorgeLevel.TACHYON as StorageLevel.OFF_HEAP 76805aa [RongGu] unifies the config properties name prefix; add the configs into docs/configuration.md e700d9c [RongGu] add the SparkTachyonHdfsLR example and some comments fd84156 [RongGu] use randomUUID to generate sparkapp directory name on tachyon;minor code style fix 939e467 [Haoyuan Li] 0.4.1-thrift from maven central 86a2eab [Haoyuan Li] tachyon 0.4.1-thrift is in the staging repo. but jenkins failed to download it. temporarily revert it back to 0.4.1 16c5798 [RongGu] make the dependency on tachyon as tachyon-0.4.1-thrift eacb2e8 [RongGu] Merge branch 'master' of https://github.com/RongGu/spark-1 bbeb4de [RongGu] fix the JsonProtocolSuite test failure problem 6adb58f [RongGu] Merge branch 'master' of https://github.com/RongGu/spark-1 d827250 [RongGu] fix JsonProtocolSuie test failure 716e93b [Haoyuan Li] revert the version ca14469 [Haoyuan Li] bump tachyon version to 0.4.1-thrift 2825a13 [RongGu] up-merging to the current master branch of the apache spark 6a22c1a [Haoyuan Li] fix scalastyle 8968b67 [Haoyuan Li] exclude more libraries from tachyon dependency to be the same as referencing tachyon-client. 77be7e8 [RongGu] address mateiz's comment about the temp folder name problem. The implementation followed mateiz's advice. 1dcadf9 [Haoyuan Li] typo bf278fa [Haoyuan Li] fix python tests e82909c [Haoyuan Li] minor cleanup 776a56c [Haoyuan Li] address patrick's and ali's comments from the previous PR 8859371 [Haoyuan Li] various minor fixes and clean up e3ddbba [Haoyuan Li] add doc to use Tachyon cache mode. fcaeab2 [Haoyuan Li] address Aaron's comment e554b1e [Haoyuan Li] add python code 47304b3 [Haoyuan Li] make tachyonStore in BlockMananger lazy val; add more comments StorageLevels. dc8ef24 [Haoyuan Li] add old storelevel constructor e01a271 [Haoyuan Li] update tachyon 0.4.1 8011a96 [RongGu] fix a brought-in mistake in StorageLevel 70ca182 [RongGu] a bit change in comment 556978b [RongGu] fix the scalastyle errors 791189b [RongGu] "Adding an option to persist Spark RDD blocks into Tachyon." move the PR#468 of apache-incubator-spark to the apache-spark
* Removed reference to incubation in Spark user docs.Reynold Xin2014-02-271-1/+1
| | | | | | | | Author: Reynold Xin <rxin@apache.org> Closes #2 from rxin/docs and squashes the following commits: 08bbd5f [Reynold Xin] Removed reference to incubation in Spark user docs.
* SPARK-1117: update accumulator docsXiangrui Meng2014-02-211-1/+1
| | | | | | | | | | | | The current doc hints spark doesn't support accumulators of type `Long`, which is wrong. JIRA: https://spark-project.atlassian.net/browse/SPARK-1117 Author: Xiangrui Meng <meng@databricks.com> Closes #631 from mengxr/acc and squashes the following commits: 45ecd25 [Xiangrui Meng] update accumulator docs
* [SPARK-1105] fix site scala version error in docsCodingCat2014-02-191-3/+3
| | | | | | | | | | | | | https://spark-project.atlassian.net/browse/SPARK-1105 fix site scala version error Author: CodingCat <zhunansjtu@gmail.com> Closes #618 from CodingCat/doc_version and squashes the following commits: 39bb8aa [CodingCat] more fixes 65bedb0 [CodingCat] fix site scala version error in doc
* Deprecate mapPartitionsWithSplit in PySpark.Josh Rosen2014-01-231-2/+2
| | | | | | Also, replace the last reference to it in the docs. This fixes SPARK-1026.
* Code review feedbackHolden Karau2014-01-051-1/+1
|
* Merge remote-tracking branch 'apache-github/master' into remove-binariesPatrick Wendell2014-01-031-7/+7
|\ | | | | | | | | | | Conflicts: core/src/test/scala/org/apache/spark/DriverSuite.scala docs/python-programming-guide.md
| * run-example -> bin/run-examplePrashant Sharma2014-01-021-2/+2
| |
| * spark-shell -> bin/spark-shellPrashant Sharma2014-01-021-5/+5
| |
* | Merge branch 'master' into spark-1002-remove-jarsPrashant Sharma2014-01-031-1/+3
|\|
| * Updated docs for SparkConf and handled review commentsMatei Zaharia2013-12-301-1/+3
| |
* | Removed sbt folder and changed docs accordinglyPrashant Sharma2014-01-021-1/+1
| |
* | Revert "Merge pull request #310 from jyunfan/master"Reynold Xin2013-12-281-1/+1
| | | | | | | | | | This reverts commit 79b20e4dbe3dcd8559ec8316784d3334bb55868b, reversing changes made to 7375047d516c5aa69221611f5f7b0f1d367039af.
* | Fix typo in the Accumulators sectionJyun-Fan Tsai2013-12-291-1/+1
|/ | | val => var
* changed the example links in the scala-programming-guidfengdong2013-12-181-1/+1
|
* Fixed the example link.fengdong2013-12-181-1/+1
|
* Docs: Fix links to RDD API documentationAaron Davidson2013-10-221-3/+3
|
* More fair scheduler docs and property names.Matei Zaharia2013-09-081-2/+2
| | | | | Also changed uses of "job" terminology to "application" when they referred to an entire Spark program, to avoid confusion.
* Move some classes to more appropriate packages:Matei Zaharia2013-09-011-1/+1
| | | | | | * RDD, *RDDFunctions -> org.apache.spark.rdd * Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util * JavaSerializer, KryoSerializer -> org.apache.spark.serializer
* Fix more URLs in docsMatei Zaharia2013-09-011-1/+5
|
* Update docs for new packageMatei Zaharia2013-09-011-8/+8
|
* Update docs about HDFS versionsMatei Zaharia2013-08-301-3/+11
|
* Change build and run instructions to use assembliesMatei Zaharia2013-08-291-1/+1
| | | | | | | | | | | | | | | | This commit makes Spark invocation saner by using an assembly JAR to find all of Spark's dependencies instead of adding all the JARs in lib_managed. It also packages the examples into an assembly and uses that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script with two better-named scripts: "run-examples" for examples, and "spark-class" for Spark internal classes (e.g. REPL, master, etc). This is also designed to minimize the confusion people have in trying to use "run" to run their own classes; it's not meant to do that, but now at least if they look at it, they can modify run-examples to do a decent job for them. As part of this, Bagel's examples are also now properly moved to the examples package instead of bagel.
* ADD_JARS environment variable for spark-shellMatei Zaharia2013-06-221-2/+8
|
* Docs: Mention spark shell's default for MASTERAndrew Ash2013-05-151-0/+2
|
* Allow passing sparkHome and JARs to StreamingContext constructorMatei Zaharia2013-02-251-2/+2
| | | | | Also warns if spark.cleaner.ttl is not set in the version where you pass your own SparkContext.
* Merge branch 'master' of https://github.com/mesos/spark into commutativeMark Hamstra2013-02-081-1/+2
|\ | | | | | | | | Conflicts: core/src/main/scala/spark/RDD.scala
| * Made StorageLevel constructor private, and added StorageLevels.create() to ↵Tathagata Das2013-01-231-1/+2
| | | | | | | | the Java API. Updates scala and java programming guides.
* | Change docs on 'reduce' since the merging of local reduces no longer preservesMark Hamstra2013-02-051-1/+1
|/ | | | ordering, so the reduce function must also be commutative.
* Fix Spark groupId in Scala Programming Guide.Josh Rosen2012-10-261-1/+1
|
* Merge branch 'dev' of github.com:mesos/spark into devMatei Zaharia2012-10-121-1/+7
|\
| * Updating programming guide with new link instructionsPatrick Wendell2012-10-091-1/+7
| |
* | TweakMatei Zaharia2012-10-121-1/+1
|/
* Updates to documentation:Matei Zaharia2012-10-091-2/+6
| | | | | | | | - Edited quick start and tuning guide to simplify them a little - Simplified top menu bar - Made private a SparkContext constructor parameter that was left as public - Various small fixes
* Updating lots of docs to use the new special version number variables,Andy Konwinski2012-10-081-1/+1
| | | | | also adding the version to the navbar so it is easy to tell which version of Spark these docs were compiled for.
* Adds liquid variables to docs templating system so that they can be usedAndy Konwinski2012-10-081-12/+12
| | | | | | | | | throughout the docs: SPARK_VERSION, SCALA_VERSION, and MESOS_VERSION. To use them, e.g. use {{site.SPARK_VERSION}}. Also removes uses of {{HOME_PATH}} which were being resolved to "" by the templating system anyway.
* Added mapPartitionsWithSplit to the programming guide.Reynold Xin2012-09-291-0/+6
|
* Allow controlling number of splits in distinct().Josh Rosen2012-09-281-0/+4
|
* Renamed storage levels to something cleaner; fixes #223.Matei Zaharia2012-09-271-12/+12
|
* Updates to standalone cluster, web UI and deploy docs.Matei Zaharia2012-09-261-1/+1
|
* More updates to docs, including tuning guideMatei Zaharia2012-09-261-33/+30
|
* Doc tweaksMatei Zaharia2012-09-261-2/+2
|
* Fixes to Java guideMatei Zaharia2012-09-251-1/+6
|
* Various enhancements to the programming guide and HTML/CSSMatei Zaharia2012-09-251-32/+171
|
* More updates to documentationMatei Zaharia2012-09-251-10/+16
|
* - Add docs/api to .gitignoreAndy Konwinski2012-09-161-0/+187
- Rework/expand the nav bar with more of the docs site - Removing parts of docs about EC2 and Mesos that differentiate between running 0.5 and before - Merged subheadings from running-on-amazon-ec2.html that are still relevant (i.e., "Using a newer version of Spark" and "Accessing Data in S3") into ec2-scripts.html and deleted running-on-amazon-ec2.html - Added some TODO comments to a few docs - Updated the blurb about AMP Camp - Renamed programming-guide to spark-programming-guide - Fixing typos/etc. in Standalone Spark doc