aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorAndy Konwinski <andyk@berkeley.edu>2012-09-12 19:27:44 -0700
committerAndy Konwinski <andyk@berkeley.edu>2012-09-12 19:27:44 -0700
commit35adccd0088e5f0baa0eb97f8ca21e0d1c1ff71f (patch)
tree704f740ed4d3977b429f648e1efdf12a5a964915
parentbf54ad2fe0926ad3f277500ad7280fabb1cd7257 (diff)
downloadspark-35adccd0088e5f0baa0eb97f8ca21e0d1c1ff71f.tar.gz
spark-35adccd0088e5f0baa0eb97f8ca21e0d1c1ff71f.tar.bz2
spark-35adccd0088e5f0baa0eb97f8ca21e0d1c1ff71f.zip
Adds syntax highlighting (via pygments), and some style tweaks to make things
easier to read.
-rw-r--r--docs/README.md15
-rw-r--r--docs/_config.yml1
-rwxr-xr-xdocs/_layouts/global.html17
-rwxr-xr-xdocs/css/main.css25
-rw-r--r--docs/css/pygments-default.css76
-rw-r--r--docs/programming-guide.md58
6 files changed, 161 insertions, 31 deletions
diff --git a/docs/README.md b/docs/README.md
index e2ae05722f..9f179a437a 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -4,10 +4,25 @@ This readme will walk you through navigating and building the Spark documentatio
Read on to learn more about viewing documentation in plain text (i.e., markdown) or building the documentation yourself that corresponds to whichever version of Spark you currently have checked out of revision control.
+## Generating the Documentation HTML
+
We include the Spark documentation as part of the source (as opposed to using a hosted wiki as the definitive documentation) to enable the documentation to evolve along with the source code and be captured by revision control (currently git). This way the code automatically includes the version of the documentation that is relevant regardless of which version or release you have checked out or downloaded.
In this directory you will find textfiles formatted using Markdown, with an ".md" suffix. You can read those text files directly if you want. Start with index.md.
To make things quite a bit prettier and make the links easier to follow, generate the html version of the documentation based on the src directory by running `jekyll` in the docs directory (You will need to have Jekyll installed, the easiest way to do this is via a Ruby Gem). This will create a directory called _site which will contain index.html as well as the rest of the compiled files. Read more about Jekyll at https://github.com/mojombo/jekyll/wiki.
+## Pygments
+
+We also use pygments (http://pygments.org) for syntax highlighting, so you will also need to install that (it requires Python) by running `sudo easy_install Pygments`.
+
+To mark a block of code in your markdown to be syntax highlighted by jekyll during the compile phase, use the following sytax:
+
+ {% highlight scala %}
+ // Your scala code goes here, you can replace scala with many other
+ // supported languages too.
+ {% endhighlight %}
+
+## Scaladoc
+
You can build just the Spark scaladoc by running `sbt/sbt doc` from the SPARK_PROJECT_ROOT directory.
diff --git a/docs/_config.yml b/docs/_config.yml
new file mode 100644
index 0000000000..b136b57555
--- /dev/null
+++ b/docs/_config.yml
@@ -0,0 +1 @@
+pygments: true
diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html
index a2f1927e6b..402adca72c 100755
--- a/docs/_layouts/global.html
+++ b/docs/_layouts/global.html
@@ -10,17 +10,18 @@
<meta name="description" content="">
<meta name="viewport" content="width=device-width">
- <link rel="stylesheet" href="css/bootstrap.min.css">
+ <link rel="stylesheet" href="{{HOME_PATH}}css/bootstrap.min.css">
<style>
body {
padding-top: 60px;
padding-bottom: 40px;
}
</style>
- <link rel="stylesheet" href="css/bootstrap-responsive.min.css">
- <link rel="stylesheet" href="css/main.css">
+ <link rel="stylesheet" href="{{HOME_PATH}}css/bootstrap-responsive.min.css">
+ <link rel="stylesheet" href="{{HOME_PATH}}css/main.css">
- <script src="js/vendor/modernizr-2.6.1-respond-1.1.0.min.js"></script>
+ <script src="{{HOME_PATH}}js/vendor/modernizr-2.6.1-respond-1.1.0.min.js"></script>
+ <link rel="stylesheet" href="{{HOME_PATH}}css/pygments-default.css">
</head>
<body>
<!--[if lt IE 7]>
@@ -37,13 +38,13 @@
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</a>
- <a class="brand" href="#">Spark</a>
+ <a class="brand" href="{{HOME_PATH}}index.html">Spark</a>
<div class="nav-collapse collapse">
<ul class="nav">
<!--TODO(andyk): Add class="active" attribute to li some how.-->
- <li><a href="/">Home</a></li>
- <li><a href="/programming-guide.html">Programming Guide</a></li>
- <li><a href="/api">API (Scaladoc)</a></li>
+ <li><a href="{{HOME_PATH}}index.html">Home</a></li>
+ <li><a href="{{HOME_PATH}}programming-guide.html">Programming Guide</a></li>
+ <li><a href="{{HOME_PATH}}api">API (Scaladoc)</a></li>
<!--
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Versions ({{ page.spark-version }})<b class="caret"></b></a>
diff --git a/docs/css/main.css b/docs/css/main.css
index b351c82415..8432d0f911 100755
--- a/docs/css/main.css
+++ b/docs/css/main.css
@@ -1,3 +1,28 @@
+---
+---
/* ==========================================================================
Author's custom styles
========================================================================== */
+
+/*.brand {
+ background: url({{HOME_PATH}}img/spark-logo.jpg) no-repeat left center;
+ height: 40px;
+ width: 100px;
+}
+*/
+
+body {
+ line-height: 1.6; /* Inspired by Github's wiki style */
+}
+
+h1 {
+ font-size: 28px;
+}
+
+code {
+ color: #333;
+}
+
+.container {
+ max-width: 914px;
+}
diff --git a/docs/css/pygments-default.css b/docs/css/pygments-default.css
new file mode 100644
index 0000000000..f5815c25ca
--- /dev/null
+++ b/docs/css/pygments-default.css
@@ -0,0 +1,76 @@
+/*
+Documentation for pygments (and Jekyll for that matter) is super sparse.
+To generate this, I had to run
+ `pygmentize -S default -f html > pygments-default.css`
+But first I had to install pygments via easy_install pygments
+
+I had to override the conflicting bootstrap style rules by linking to
+this stylesheet lower in the html than the bootstap css.
+
+Also, I was thrown off for a while at first when I was using markdown
+code block inside my {% highlight scala %} ... {% endhighlight %} tags
+(I was using 4 spaces for this), when it turns out that pygments will
+insert the code (or pre?) tags for you.
+
+*/
+.hll { background-color: #ffffcc }
+.c { color: #408080; font-style: italic } /* Comment */
+.err { border: 1px solid #FF0000 } /* Error */
+.k { color: #008000; font-weight: bold } /* Keyword */
+.o { color: #666666 } /* Operator */
+.cm { color: #408080; font-style: italic } /* Comment.Multiline */
+.cp { color: #BC7A00 } /* Comment.Preproc */
+.c1 { color: #408080; font-style: italic } /* Comment.Single */
+.cs { color: #408080; font-style: italic } /* Comment.Special */
+.gd { color: #A00000 } /* Generic.Deleted */
+.ge { font-style: italic } /* Generic.Emph */
+.gr { color: #FF0000 } /* Generic.Error */
+.gh { color: #000080; font-weight: bold } /* Generic.Heading */
+.gi { color: #00A000 } /* Generic.Inserted */
+.go { color: #808080 } /* Generic.Output */
+.gp { color: #000080; font-weight: bold } /* Generic.Prompt */
+.gs { font-weight: bold } /* Generic.Strong */
+.gu { color: #800080; font-weight: bold } /* Generic.Subheading */
+.gt { color: #0040D0 } /* Generic.Traceback */
+.kc { color: #008000; font-weight: bold } /* Keyword.Constant */
+.kd { color: #008000; font-weight: bold } /* Keyword.Declaration */
+.kn { color: #008000; font-weight: bold } /* Keyword.Namespace */
+.kp { color: #008000 } /* Keyword.Pseudo */
+.kr { color: #008000; font-weight: bold } /* Keyword.Reserved */
+.kt { color: #B00040 } /* Keyword.Type */
+.m { color: #666666 } /* Literal.Number */
+.s { color: #BA2121 } /* Literal.String */
+.na { color: #7D9029 } /* Name.Attribute */
+.nb { color: #008000 } /* Name.Builtin */
+.nc { color: #0000FF; font-weight: bold } /* Name.Class */
+.no { color: #880000 } /* Name.Constant */
+.nd { color: #AA22FF } /* Name.Decorator */
+.ni { color: #999999; font-weight: bold } /* Name.Entity */
+.ne { color: #D2413A; font-weight: bold } /* Name.Exception */
+.nf { color: #0000FF } /* Name.Function */
+.nl { color: #A0A000 } /* Name.Label */
+.nn { color: #0000FF; font-weight: bold } /* Name.Namespace */
+.nt { color: #008000; font-weight: bold } /* Name.Tag */
+.nv { color: #19177C } /* Name.Variable */
+.ow { color: #AA22FF; font-weight: bold } /* Operator.Word */
+.w { color: #bbbbbb } /* Text.Whitespace */
+.mf { color: #666666 } /* Literal.Number.Float */
+.mh { color: #666666 } /* Literal.Number.Hex */
+.mi { color: #666666 } /* Literal.Number.Integer */
+.mo { color: #666666 } /* Literal.Number.Oct */
+.sb { color: #BA2121 } /* Literal.String.Backtick */
+.sc { color: #BA2121 } /* Literal.String.Char */
+.sd { color: #BA2121; font-style: italic } /* Literal.String.Doc */
+.s2 { color: #BA2121 } /* Literal.String.Double */
+.se { color: #BB6622; font-weight: bold } /* Literal.String.Escape */
+.sh { color: #BA2121 } /* Literal.String.Heredoc */
+.si { color: #BB6688; font-weight: bold } /* Literal.String.Interpol */
+.sx { color: #008000 } /* Literal.String.Other */
+.sr { color: #BB6688 } /* Literal.String.Regex */
+.s1 { color: #BA2121 } /* Literal.String.Single */
+.ss { color: #19177C } /* Literal.String.Symbol */
+.bp { color: #008000 } /* Name.Builtin.Pseudo */
+.vc { color: #19177C } /* Name.Variable.Class */
+.vg { color: #19177C } /* Name.Variable.Global */
+.vi { color: #19177C } /* Name.Variable.Instance */
+.il { color: #666666 } /* Literal.Number.Integer.Long */
diff --git a/docs/programming-guide.md b/docs/programming-guide.md
index 15351bf661..94d304e23a 100644
--- a/docs/programming-guide.md
+++ b/docs/programming-guide.md
@@ -14,15 +14,19 @@ To write a Spark application, you will need to add both Spark and its dependenci
In addition, you'll need to import some Spark classes and implicit conversions. Add the following lines at the top of your program:
- import spark.SparkContext
- import SparkContext._
+{% highlight scala %}
+import spark.SparkContext
+import SparkContext._
+{% endhighlight %}
# Initializing Spark
The first thing a Spark program must do is to create a `SparkContext` object, which tells Spark how to access a cluster.
This is done through the following constructor:
- new SparkContext(master, jobName, [sparkHome], [jars])
+{% highlight scala %}
+new SparkContext(master, jobName, [sparkHome], [jars])
+{% endhighlight %}
The `master` parameter is a string specifying a [Mesos]({{HOME_PATH}}running-on-mesos.html) cluster to connect to, or a special "local" string to run in local mode, as described below. `jobName` is a name for your job, which will be shown in the Mesos web UI when running on a cluster. Finally, the last two parameters are needed to deploy your code to a cluster if running on Mesos, as described later.
@@ -60,11 +64,13 @@ Spark revolves around the concept of a _resilient distributed dataset_ (RDD), wh
Parallelized collections are created by calling `SparkContext`'s `parallelize` method on an existing Scala collection (a `Seq` object). The elements of the collection are copied to form a distributed dataset that can be operated on in parallel. For example, here is some interpreter output showing how to create a parallel collection from an array:
- scala> val data = Array(1, 2, 3, 4, 5)
- data: Array[Int] = Array(1, 2, 3, 4, 5)
-
- scala> val distData = sc.parallelize(data)
- distData: spark.RDD[Int] = spark.ParallelCollection@10d13e3e
+{% highlight scala %}
+scala> val data = Array(1, 2, 3, 4, 5)
+data: Array[Int] = Array(1, 2, 3, 4, 5)
+
+scala> val distData = sc.parallelize(data)
+distData: spark.RDD[Int] = spark.ParallelCollection@10d13e3e
+{% endhighlight %}
Once created, the distributed dataset (`distData` here) can be operated on in parallel. For example, we might call `distData.reduce(_ + _)` to add up the elements of the array. We describe operations on distributed datasets later on.
@@ -76,8 +82,10 @@ Spark can create distributed datasets from any file stored in the Hadoop distrib
Text file RDDs can be created using `SparkContext`'s `textFile` method. This method takes an URI for the file (either a local path on the machine, or a `hdfs://`, `s3n://`, `kfs://`, etc URI). Here is an example invocation:
- scala> val distFile = sc.textFile("data.txt")
- distFile: spark.RDD[String] = spark.HadoopRDD@1d4cee08
+{% highlight scala %}
+scala> val distFile = sc.textFile("data.txt")
+distFile: spark.RDD[String] = spark.HadoopRDD@1d4cee08
+{% endhighlight %}
Once created, `distFile` can be acted on by dataset operations. For example, we can add up the sizes of all the lines using the `map` and `reduce` operations as follows: `distFile.map(_.size).reduce(_ + _)`.
@@ -142,11 +150,13 @@ Broadcast variables allow the programmer to keep a read-only variable cached on
Broadcast variables are created from a variable `v` by calling `SparkContext.broadcast(v)`. The broadcast variable is a wrapper around `v`, and its value can be accessed by calling the `value` method. The interpreter session below shows this:
- scala> val broadcastVar = sc.broadcast(Array(1, 2, 3))
- broadcastVar: spark.Broadcast[Array[Int]] = spark.Broadcast(b5c40191-a864-4c7d-b9bf-d87e1a4e787c)
+{% highlight scala %}
+scala> val broadcastVar = sc.broadcast(Array(1, 2, 3))
+broadcastVar: spark.Broadcast[Array[Int]] = spark.Broadcast(b5c40191-a864-4c7d-b9bf-d87e1a4e787c)
- scala> broadcastVar.value
- res0: Array[Int] = Array(1, 2, 3)
+scala> broadcastVar.value
+res0: Array[Int] = Array(1, 2, 3)
+{% endhighlight %}
After the broadcast variable is created, it should be used instead of the value `v` in any functions run on the cluster so that `v` is not shipped to the nodes more than once. In addition, the object `v` should not be modified after it is broadcast in order to ensure that all nodes get the same value of the broadcast variable (e.g. if the variable is shipped to a new node later).
@@ -158,15 +168,17 @@ An accumulator is created from an initial value `v` by calling `SparkContext.acc
The interpreter session below shows an accumulator being used to add up the elements of an array:
- scala> val accum = sc.accumulator(0)
- accum: spark.Accumulator[Int] = 0
-
- scala> sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum += x)
- ...
- 10/09/29 18:41:08 INFO SparkContext: Tasks finished in 0.317106 s
-
- scala> accum.value
- res2: Int = 10
+{% highlight scala %}
+scala> val accum = sc.accumulator(0)
+accum: spark.Accumulator[Int] = 0
+
+scala> sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum += x)
+...
+10/09/29 18:41:08 INFO SparkContext: Tasks finished in 0.317106 s
+
+scala> accum.value
+res2: Int = 10
+{% endhighlight %}
# Where to Go from Here