aboutsummaryrefslogtreecommitdiff
path: root/docs/quick-start.md
diff options
context:
space:
mode:
authorPatrick Wendell <pwendell@gmail.com>2014-01-03 21:29:33 -0800
committerPatrick Wendell <pwendell@gmail.com>2014-01-03 21:29:33 -0800
commit604fad9c39763012d97b404941f7ba7137ec2eed (patch)
treea327c39fcf9ac53e17fbeb5dfedb11e04f505f3c /docs/quick-start.md
parent9e6f3bdcda1ab48159afa4f54b64d05e42a8688e (diff)
parentc4d6145f7fde8a516024e886314bf8fecde817ea (diff)
downloadspark-604fad9c39763012d97b404941f7ba7137ec2eed.tar.gz
spark-604fad9c39763012d97b404941f7ba7137ec2eed.tar.bz2
spark-604fad9c39763012d97b404941f7ba7137ec2eed.zip
Merge remote-tracking branch 'apache-github/master' into remove-binaries
Conflicts: core/src/test/scala/org/apache/spark/DriverSuite.scala docs/python-programming-guide.md
Diffstat (limited to 'docs/quick-start.md')
-rw-r--r--docs/quick-start.md8
1 files changed, 4 insertions, 4 deletions
diff --git a/docs/quick-start.md b/docs/quick-start.md
index 1882ea75c0..9b9261cfff 100644
--- a/docs/quick-start.md
+++ b/docs/quick-start.md
@@ -20,7 +20,7 @@ $ sbt assembly
## Basics
Spark's interactive shell provides a simple way to learn the API, as well as a powerful tool to analyze datasets interactively.
-Start the shell by running `./spark-shell` in the Spark directory.
+Start the shell by running `./bin/spark-shell` in the Spark directory.
Spark's primary abstraction is a distributed collection of items called a Resilient Distributed Dataset (RDD). RDDs can be created from Hadoop InputFormats (such as HDFS files) or by transforming other RDDs. Let's make a new RDD from the text of the README file in the Spark source directory:
@@ -99,7 +99,7 @@ scala> linesWithSpark.count()
res9: Long = 15
{% endhighlight %}
-It may seem silly to use Spark to explore and cache a 30-line text file. The interesting part is that these same functions can be used on very large data sets, even when they are striped across tens or hundreds of nodes. You can also do this interactively by connecting `spark-shell` to a cluster, as described in the [programming guide](scala-programming-guide.html#initializing-spark).
+It may seem silly to use Spark to explore and cache a 30-line text file. The interesting part is that these same functions can be used on very large data sets, even when they are striped across tens or hundreds of nodes. You can also do this interactively by connecting `bin/spark-shell` to a cluster, as described in the [programming guide](scala-programming-guide.html#initializing-spark).
# A Standalone App in Scala
Now say we wanted to write a standalone application using the Spark API. We will walk through a simple application in both Scala (with SBT), Java (with Maven), and Python. If you are using other build systems, consider using the Spark assembly JAR described in the developer guide.
@@ -277,11 +277,11 @@ We can pass Python functions to Spark, which are automatically serialized along
For applications that use custom classes or third-party libraries, we can add those code dependencies to SparkContext to ensure that they will be available on remote machines; this is described in more detail in the [Python programming guide](python-programming-guide.html).
`SimpleApp` is simple enough that we do not need to specify any code dependencies.
-We can run this application using the `pyspark` script:
+We can run this application using the `bin/pyspark` script:
{% highlight python %}
$ cd $SPARK_HOME
-$ ./pyspark SimpleApp.py
+$ ./bin/pyspark SimpleApp.py
...
Lines with a: 46, Lines with b: 23
{% endhighlight python %}