aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md60
1 files changed, 27 insertions, 33 deletions
diff --git a/README.md b/README.md
index df9e73e4bd..b0fc3524fa 100644
--- a/README.md
+++ b/README.md
@@ -6,23 +6,21 @@ Lightning-Fast Cluster Computing - <http://www.spark-project.org/>
## Online Documentation
You can find the latest Spark documentation, including a programming
-guide, on the project wiki at <http://github.com/mesos/spark/wiki>. This
-file only contains basic setup instructions.
+guide, on the project webpage at <http://spark-project.org/documentation.html>.
+This README file only contains basic setup instructions.
## Building
-Spark requires Scala 2.9.1. This version has been tested with 2.9.1.final.
+Spark requires Scala 2.9.2. The project is built using Simple Build Tool (SBT),
+which is packaged with it. To build Spark and its example programs, run:
-The project is built using Simple Build Tool (SBT), which is packaged with it.
-To build Spark and its example programs, run:
+ sbt/sbt package
- sbt/sbt compile
-
-To run Spark, you will need to have Scala's bin in your `PATH`, or you
-will need to set the `SCALA_HOME` environment variable to point to where
+To run Spark, you will need to have Scala's bin directory in your `PATH`, or
+you will need to set the `SCALA_HOME` environment variable to point to where
you've installed Scala. Scala must be accessible through one of these
-methods on Mesos slave nodes as well as on the master.
+methods on your cluster's worker nodes as well as its master.
To run one of the examples, use `./run <class> <params>`. For example:
@@ -32,12 +30,12 @@ will run the Logistic Regression example locally on 2 CPUs.
Each of the example programs prints usage help if no params are given.
-All of the Spark samples take a `<host>` parameter that is the Mesos master
-to connect to. This can be a Mesos URL, or "local" to run locally with one
-thread, or "local[N]" to run locally with N threads.
+All of the Spark samples take a `<host>` parameter that is the cluster URL
+to connect to. This can be a mesos:// or spark:// URL, or "local" to run
+locally with one thread, or "local[N]" to run locally with N threads.
-## A Note About Hadoop
+## A Note About Hadoop Versions
Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
storage systems. Because the HDFS API has changed in different versions of
@@ -48,27 +46,23 @@ of `project/SparkBuild.scala`, then rebuilding Spark.
## Configuration
-Spark can be configured through two files: `conf/java-opts` and
-`conf/spark-env.sh`.
-
-In `java-opts`, you can add flags to be passed to the JVM when running Spark.
-
-In `spark-env.sh`, you can set any environment variables you wish to be available
-when running Spark programs, such as `PATH`, `SCALA_HOME`, etc. There are also
-several Spark-specific variables you can set:
-
-- `SPARK_CLASSPATH`: Extra entries to be added to the classpath, separated by ":".
+Please refer to the "Configuration" guide in the online documentation for a
+full overview on how to configure Spark. At the minimum, you will need to
+create a `conf/spark-env.sh` script (copy `conf/spark-env.sh.template`) and
+set the following two variables:
-- `SPARK_MEM`: Memory for Spark to use, in the format used by java's `-Xmx`
- option (for example, `-Xmx200m` means 200 MB, `-Xmx1g` means 1 GB, etc).
+- `SCALA_HOME`: Location where Scala is installed.
-- `SPARK_LIBRARY_PATH`: Extra entries to add to `java.library.path` for locating
- shared libraries.
+- `MESOS_NATIVE_LIBRARY`: Your Mesos library (only needed if you want to run
+ on Mesos). For example, this might be `/usr/local/lib/libmesos.so` on Linux.
-- `SPARK_JAVA_OPTS`: Extra options to pass to JVM.
-- `MESOS_NATIVE_LIBRARY`: Your Mesos library, if you want to run on a Mesos
- cluster. For example, this might be `/usr/local/lib/libmesos.so` on Linux.
+## Contributing to Spark
-Note that `spark-env.sh` must be a shell script (it must be executable and start
-with a `#!` header to specify the shell to use).
+Contributions via GitHub pull requests are gladly accepted from their original
+author. Along with any pull requests, please state that the contribution is
+your original work and that you license the work to the project under the
+project's open source license. Whether or not you state this explicitly, by
+submitting any copyrighted material via pull request, email, or other means
+you agree to license the material under the project's open source license and
+warrant that you have the legal authority to do so.