Merge branch 'master' of github.com:apache/incubator-spark into mergemerge

Conflicts: README.md core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala core/src/main/scala/org/apache/spark/util/collection/PrimitiveKeyOpenHashMap.scala
author: Reynold Xin <rxin@apache.org> 2013-11-04 21:02:36 -0800
committer: Reynold Xin <rxin@apache.org> 2013-11-04 21:02:36 -0800
commit: 551a43fd3dfe24beed12961050b58aa0c0379b4c (patch)
tree: 6fd3a8a66b4efce4aa81bae444acc0d5eea1bce1 /docs
parent: 99bfcc91e010ba29852ec7dd0b4270805b7b2377 (diff)
parent: 7a26104ab7cb492b347ba761ef1f17ca1b9078e4 (diff)
download: spark-551a43fd3dfe24beed12961050b58aa0c0379b4c.tar.gz
spark-551a43fd3dfe24beed12961050b58aa0c0379b4c.tar.bz2
spark-551a43fd3dfe24beed12961050b58aa0c0379b4c.zip
2 files changed, 14 insertions, 2 deletions
diff --git a/docs/cluster-overview.md b/docs/cluster-overview.md
index f679cad713..5927f736f3 100644
--- a/docs/cluster-overview.md
+++ b/docs/cluster-overview.md
@@ -13,7 +13,7 @@ object in your main program (called the _driver program_).
 Specifically, to run on a cluster, the SparkContext can connect to several types of _cluster managers_
 (either Spark's own standalone cluster manager or Mesos/YARN), which allocate resources across
 applications. Once connected, Spark acquires *executors* on nodes in the cluster, which are
-worker processes that run computations and store data for your application. 
+worker processes that run computations and store data for your application.
 Next, it sends your application code (defined by JAR or Python files passed to SparkContext) to
 the executors. Finally, SparkContext sends *tasks* for the executors to run.
 
@@ -57,6 +57,18 @@ which takes a list of JAR files (Java/Scala) or .egg and .zip libraries (Python)
 worker nodes. You can also dynamically add new files to be sent to executors with `SparkContext.addJar`
 and `addFile`.
 
+## URIs for addJar / addFile
+
+- **file:** - Absolute paths and `file:/` URIs are served by the driver's HTTP file server, and every executor
+  pulls the file from the driver HTTP server
+- **hdfs:**, **http:**, **https:**, **ftp:** - these pull down files and JARs from the URI as expected
+- **local:** - a URI starting with local:/ is expected to exist as a local file on each worker node.  This
+  means that no network IO will be incurred, and works well for large files/JARs that are pushed to each worker,
+  or shared via NFS, GlusterFS, etc.
+
+Note that JARs and files are copied to the working directory for each SparkContext on the executor nodes.
+Over time this can use up a significant amount of space and will need to be cleaned up.
+
 # Monitoring
 
 Each driver program has a web UI, typically on port 4040, that displays information about running
diff --git a/docs/ec2-scripts.md b/docs/ec2-scripts.md
index 1e5575d657..156a727026 100644
--- a/docs/ec2-scripts.md
+++ b/docs/ec2-scripts.md
@@ -98,7 +98,7 @@ permissions on your private key file, you can run `launch` with the
     `bin/hadoop` script in that directory. Note that the data in this
     HDFS goes away when you stop and restart a machine.
 -   There is also a *persistent HDFS* instance in
-    `/root/presistent-hdfs` that will keep data across cluster restarts.
+    `/root/persistent-hdfs` that will keep data across cluster restarts.
     Typically each node has relatively little space of persistent data
     (about 3 GB), but you can use the `--ebs-vol-size` option to
     `spark-ec2` to attach a persistent EBS volume to each node for
author	Reynold Xin <rxin@apache.org>	2013-11-04 21:02:36 -0800
committer	Reynold Xin <rxin@apache.org>	2013-11-04 21:02:36 -0800
commit	551a43fd3dfe24beed12961050b58aa0c0379b4c (patch)
tree	6fd3a8a66b4efce4aa81bae444acc0d5eea1bce1 /docs
parent	99bfcc91e010ba29852ec7dd0b4270805b7b2377 (diff)
parent	7a26104ab7cb492b347ba761ef1f17ca1b9078e4 (diff)
download	spark-551a43fd3dfe24beed12961050b58aa0c0379b4c.tar.gz spark-551a43fd3dfe24beed12961050b58aa0c0379b4c.tar.bz2 spark-551a43fd3dfe24beed12961050b58aa0c0379b4c.zip