[SPARK-5811] Added documentation for maven coordinates and added Spark Packages support

Documentation for maven coordinates + Spark Package support. Added pyspark tests for `--packages` Author: Burak Yavuz <brkyvz@gmail.com> Author: Davies Liu <davies@databricks.com> Closes #4662 from brkyvz/SPARK-5811 and squashes the following commits: 56ccccd [Burak Yavuz] fixed broken test 64cb8ee [Burak Yavuz] passed pep8 on local c07b81e [Burak Yavuz] fixed pep8 a8bd6b7 [Burak Yavuz] submit PR 4ef4046 [Burak Yavuz] ready for PR 8fb02e5 [Burak Yavuz] merged master 25c9b9f [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into python-jar 560d13b [Burak Yavuz] before PR 17d3f76 [Davies Liu] support .jar as python package a3eb717 [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into SPARK-5811 c60156d [Burak Yavuz] [SPARK-5811] Added documentation for maven coordinates (cherry picked from commit ae6cfb3acdbc2721d25793698a4a440f0519dbec) Signed-off-by: Patrick Wendell <patrick@databricks.com>
author: Burak Yavuz <brkyvz@gmail.com> 2015-02-17 17:15:43 -0800
committer: Patrick Wendell <patrick@databricks.com> 2015-02-17 17:23:30 -0800
commit: cb905841b2eaa19e28a1327cab0e5d51f805d233 (patch)
tree: c401bca6ed8865d8d5ab6a860d202bf7d49f3eb2 /docs
parent: 81202350a08c50685676300218270929c76f648a (diff)
download: spark-cb905841b2eaa19e28a1327cab0e5d51f805d233.tar.gz
spark-cb905841b2eaa19e28a1327cab0e5d51f805d233.tar.bz2
spark-cb905841b2eaa19e28a1327cab0e5d51f805d233.zip
2 files changed, 21 insertions, 3 deletions
diff --git a/docs/programming-guide.md b/docs/programming-guide.md
index 118701549a..4e4af76316 100644
--- a/docs/programming-guide.md
+++ b/docs/programming-guide.md
@@ -173,8 +173,11 @@ in-process.
 In the Spark shell, a special interpreter-aware SparkContext is already created for you, in the
 variable called `sc`. Making your own SparkContext will not work. You can set which master the
 context connects to using the `--master` argument, and you can add JARs to the classpath
-by passing a comma-separated list to the `--jars` argument.
-For example, to run `bin/spark-shell` on exactly four cores, use:
+by passing a comma-separated list to the `--jars` argument. You can also add dependencies 
+(e.g. Spark Packages) to your shell session by supplying a comma-separated list of maven coordinates 
+to the `--packages` argument. Any additional repositories where dependencies might exist (e.g. SonaType)
+can be passed to the `--repositories` argument. For example, to run `bin/spark-shell` on exactly
+four cores, use:
 
 {% highlight bash %}
 $ ./bin/spark-shell --master local[4]
@@ -186,6 +189,12 @@ Or, to also add `code.jar` to its classpath, use:
 $ ./bin/spark-shell --master local[4] --jars code.jar
 {% endhighlight %}
 
+To include a dependency using maven coordinates:
+
+{% highlight bash %}
+$ ./bin/spark-shell --master local[4] --packages "org.example:example:0.1"
+{% endhighlight %}
+
 For a complete list of options, run `spark-shell --help`. Behind the scenes,
 `spark-shell` invokes the more general [`spark-submit` script](submitting-applications.html).
 
@@ -196,7 +205,11 @@ For a complete list of options, run `spark-shell --help`. Behind the scenes,
 In the PySpark shell, a special interpreter-aware SparkContext is already created for you, in the
 variable called `sc`. Making your own SparkContext will not work. You can set which master the
 context connects to using the `--master` argument, and you can add Python .zip, .egg or .py files
-to the runtime path by passing a comma-separated list to `--py-files`.
+to the runtime path by passing a comma-separated list to `--py-files`. You can also add dependencies
+(e.g. Spark Packages) to your shell session by supplying a comma-separated list of maven coordinates
+to the `--packages` argument. Any additional repositories where dependencies might exist (e.g. SonaType)
+can be passed to the `--repositories` argument. Any python dependencies a Spark Package has (listed in 
+the requirements.txt of that package) must be manually installed using pip when necessary.
 For example, to run `bin/pyspark` on exactly four cores, use:
 
 {% highlight bash %}
diff --git a/docs/submitting-applications.md b/docs/submitting-applications.md
index 14a87f8436..57b074778f 100644
--- a/docs/submitting-applications.md
+++ b/docs/submitting-applications.md
@@ -174,6 +174,11 @@ This can use up a significant amount of space over time and will need to be clea
 is handled automatically, and with Spark standalone, automatic cleanup can be configured with the
 `spark.worker.cleanup.appDataTtl` property.
 
+Users may also include any other dependencies by supplying a comma-delimited list of maven coordinates 
+with `--packages`. All transitive dependencies will be handled when using this command. Additional 
+repositories (or resolvers in SBT) can be added in a comma-delimited fashion with the flag `--repositories`. 
+These commands can be used with `pyspark`, `spark-shell`, and `spark-submit` to include Spark Packages.
+
 For Python, the equivalent `--py-files` option can be used to distribute `.egg`, `.zip` and `.py` libraries
 to executors.
author	Burak Yavuz <brkyvz@gmail.com>	2015-02-17 17:15:43 -0800
committer	Patrick Wendell <patrick@databricks.com>	2015-02-17 17:23:30 -0800
commit	cb905841b2eaa19e28a1327cab0e5d51f805d233 (patch)
tree	c401bca6ed8865d8d5ab6a860d202bf7d49f3eb2 /docs
parent	81202350a08c50685676300218270929c76f648a (diff)
download	spark-cb905841b2eaa19e28a1327cab0e5d51f805d233.tar.gz spark-cb905841b2eaa19e28a1327cab0e5d51f805d233.tar.bz2 spark-cb905841b2eaa19e28a1327cab0e5d51f805d233.zip