aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--docs/configuration.md2
-rw-r--r--docs/index.md2
-rw-r--r--docs/programming-guide.md6
-rw-r--r--docs/streaming-kafka-0-10-integration.md2
-rw-r--r--docs/submitting-applications.md2
5 files changed, 7 insertions, 7 deletions
diff --git a/docs/configuration.md b/docs/configuration.md
index a6b1f15fda..b7f10e69f3 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -435,7 +435,7 @@ Apart from these, the following properties are also available, and may be useful
<td><code>spark.jars.packages</code></td>
<td></td>
<td>
- Comma-separated list of maven coordinates of jars to include on the driver and executor
+ Comma-separated list of Maven coordinates of jars to include on the driver and executor
classpaths. The coordinates should be groupId:artifactId:version. If <code>spark.jars.ivySettings</code>
is given artifacts will be resolved according to the configuration in the file, otherwise artifacts
will be searched for in the local maven repo, then maven central and finally any additional remote
diff --git a/docs/index.md b/docs/index.md
index 57b9fa848f..023e06ada3 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -15,7 +15,7 @@ It also supports a rich set of higher-level tools including [Spark SQL](sql-prog
Get Spark from the [downloads page](http://spark.apache.org/downloads.html) of the project website. This documentation is for Spark version {{site.SPARK_VERSION}}. Spark uses Hadoop's client libraries for HDFS and YARN. Downloads are pre-packaged for a handful of popular Hadoop versions.
Users can also download a "Hadoop free" binary and run Spark with any Hadoop version
[by augmenting Spark's classpath](hadoop-provided.html).
-Scala and Java users can include Spark in their projects using its maven cooridnates and in the future Python users can also install Spark from PyPI.
+Scala and Java users can include Spark in their projects using its Maven coordinates and in the future Python users can also install Spark from PyPI.
If you'd like to build Spark from
diff --git a/docs/programming-guide.md b/docs/programming-guide.md
index a4017b5b97..db8b048fce 100644
--- a/docs/programming-guide.md
+++ b/docs/programming-guide.md
@@ -185,7 +185,7 @@ In the Spark shell, a special interpreter-aware SparkContext is already created
variable called `sc`. Making your own SparkContext will not work. You can set which master the
context connects to using the `--master` argument, and you can add JARs to the classpath
by passing a comma-separated list to the `--jars` argument. You can also add dependencies
-(e.g. Spark Packages) to your shell session by supplying a comma-separated list of maven coordinates
+(e.g. Spark Packages) to your shell session by supplying a comma-separated list of Maven coordinates
to the `--packages` argument. Any additional repositories where dependencies might exist (e.g. Sonatype)
can be passed to the `--repositories` argument. For example, to run `bin/spark-shell` on exactly
four cores, use:
@@ -200,7 +200,7 @@ Or, to also add `code.jar` to its classpath, use:
$ ./bin/spark-shell --master local[4] --jars code.jar
{% endhighlight %}
-To include a dependency using maven coordinates:
+To include a dependency using Maven coordinates:
{% highlight bash %}
$ ./bin/spark-shell --master local[4] --packages "org.example:example:0.1"
@@ -217,7 +217,7 @@ In the PySpark shell, a special interpreter-aware SparkContext is already create
variable called `sc`. Making your own SparkContext will not work. You can set which master the
context connects to using the `--master` argument, and you can add Python .zip, .egg or .py files
to the runtime path by passing a comma-separated list to `--py-files`. You can also add dependencies
-(e.g. Spark Packages) to your shell session by supplying a comma-separated list of maven coordinates
+(e.g. Spark Packages) to your shell session by supplying a comma-separated list of Maven coordinates
to the `--packages` argument. Any additional repositories where dependencies might exist (e.g. Sonatype)
can be passed to the `--repositories` argument. Any Python dependencies a Spark package has (listed in
the requirements.txt of that package) must be manually installed using `pip` when necessary.
diff --git a/docs/streaming-kafka-0-10-integration.md b/docs/streaming-kafka-0-10-integration.md
index b645d3c3a4..6ef54ac210 100644
--- a/docs/streaming-kafka-0-10-integration.md
+++ b/docs/streaming-kafka-0-10-integration.md
@@ -183,7 +183,7 @@ stream.foreachRDD(new VoidFunction<JavaRDD<ConsumerRecord<String, String>>>() {
Note that the typecast to `HasOffsetRanges` will only succeed if it is done in the first method called on the result of `createDirectStream`, not later down a chain of methods. Be aware that the one-to-one mapping between RDD partition and Kafka partition does not remain after any methods that shuffle or repartition, e.g. reduceByKey() or window().
### Storing Offsets
-Kafka delivery semantics in the case of failure depend on how and when offsets are stored. Spark output operations are [at-least-once](streaming-programming-guide.html#semantics-of-output-operations). So if you want the equivalent of exactly-once semantics, you must either store offsets after an idempotent output, or store offsets in an atomic transaction alongside output. With this integration, you have 3 options, in order of increasing reliablity (and code complexity), for how to store offsets.
+Kafka delivery semantics in the case of failure depend on how and when offsets are stored. Spark output operations are [at-least-once](streaming-programming-guide.html#semantics-of-output-operations). So if you want the equivalent of exactly-once semantics, you must either store offsets after an idempotent output, or store offsets in an atomic transaction alongside output. With this integration, you have 3 options, in order of increasing reliability (and code complexity), for how to store offsets.
#### Checkpoints
If you enable Spark [checkpointing](streaming-programming-guide.html#checkpointing), offsets will be stored in the checkpoint. This is easy to enable, but there are drawbacks. Your output operation must be idempotent, since you will get repeated outputs; transactions are not an option. Furthermore, you cannot recover from a checkpoint if your application code has changed. For planned upgrades, you can mitigate this by running the new code at the same time as the old code (since outputs need to be idempotent anyway, they should not clash). But for unplanned failures that require code changes, you will lose data unless you have another way to identify known good starting offsets.
diff --git a/docs/submitting-applications.md b/docs/submitting-applications.md
index b8b4cc3a53..d23dbcf10d 100644
--- a/docs/submitting-applications.md
+++ b/docs/submitting-applications.md
@@ -189,7 +189,7 @@ This can use up a significant amount of space over time and will need to be clea
is handled automatically, and with Spark standalone, automatic cleanup can be configured with the
`spark.worker.cleanup.appDataTtl` property.
-Users may also include any other dependencies by supplying a comma-delimited list of maven coordinates
+Users may also include any other dependencies by supplying a comma-delimited list of Maven coordinates
with `--packages`. All transitive dependencies will be handled when using this command. Additional
repositories (or resolvers in SBT) can be added in a comma-delimited fashion with the flag `--repositories`.
(Note that credentials for password-protected repositories can be supplied in some cases in the repository URI,