[SPARK-1566] consolidate programming guide, and general doc updates

This is a fairly large PR to clean up and update the docs for 1.0. The major changes are: * A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs * New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark * Spark-submit guide moved to a separate page and expanded slightly * Various cleanups of the menu system, security docs, and others * Updated look of title bar to differentiate the docs from previous Spark versions You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html. Author: Matei Zaharia <matei@databricks.com> Closes #896 from mateiz/1.0-docs and squashes the following commits: 03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs 0779508 [Matei Zaharia] tweak ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks 1bf4112 [Matei Zaharia] Review comments 4414f88 [Matei Zaharia] tweaks d04e979 [Matei Zaharia] Fix some old links to Java guide a34ed33 [Matei Zaharia] tweak 541bb3b [Matei Zaharia] miscellaneous changes fcefdec [Matei Zaharia] Moved submitting apps to separate doc 61d72b4 [Matei Zaharia] stuff 181f217 [Matei Zaharia] migration guide, remove old language guides e11a0da [Matei Zaharia] Add more API functions 6a030a9 [Matei Zaharia] tweaks 8db0ae3 [Matei Zaharia] Added key-value pairs section 318d2c9 [Matei Zaharia] tweaks 1c81477 [Matei Zaharia] New section on basics and function syntax e38f559 [Matei Zaharia] Actually added programming guide to Git a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout 3b6a876 [Matei Zaharia] More CSS tweaks 01ec8bf [Matei Zaharia] More CSS tweaks e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
author: Matei Zaharia <matei@databricks.com> 2014-05-30 00:34:33 -0700
committer: Patrick Wendell <pwendell@gmail.com> 2014-05-30 00:34:33 -0700
commit: c8bf4131bc2a2e147e977159fc90e94b85738830 (patch)
tree: a2f885df8fb6654bd7750bb344b97a6cb6889bf3 /docs/sql-programming-guide.md
parent: eeee978a348ec2a35cc27865cea6357f9db75b74 (diff)
download: spark-c8bf4131bc2a2e147e977159fc90e94b85738830.tar.gz
spark-c8bf4131bc2a2e147e977159fc90e94b85738830.tar.bz2
spark-c8bf4131bc2a2e147e977159fc90e94b85738830.zip
1 files changed, 15 insertions, 14 deletions
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 8a785450ad..a506457eba 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -2,7 +2,6 @@
 layout: global
 title: Spark SQL Programming Guide
 ---
-**Spark SQL is currently an Alpha component. Therefore, the APIs may be changed in future releases.**
 
 * This will become a table of contents (this text will be scraped).
 {:toc}
@@ -17,10 +16,10 @@ Spark.  At the core of this component is a new type of RDD,
 [SchemaRDD](api/scala/index.html#org.apache.spark.sql.SchemaRDD).  SchemaRDDs are composed
 [Row](api/scala/index.html#org.apache.spark.sql.catalyst.expressions.Row) objects along with
 a schema that describes the data types of each column in the row.  A SchemaRDD is similar to a table
-in a traditional relational database.  A SchemaRDD can be created from an existing RDD, parquet
+in a traditional relational database.  A SchemaRDD can be created from an existing RDD, [Parquet](http://parquet.io)
 file, or by running HiveQL against data stored in [Apache Hive](http://hive.apache.org/).
 
-**All of the examples on this page use sample data included in the Spark distribution and can be run in the `spark-shell`.**
+All of the examples on this page use sample data included in the Spark distribution and can be run in the `spark-shell`.
 
 </div>
 
@@ -30,7 +29,7 @@ Spark.  At the core of this component is a new type of RDD,
 [JavaSchemaRDD](api/scala/index.html#org.apache.spark.sql.api.java.JavaSchemaRDD).  JavaSchemaRDDs are composed
 [Row](api/scala/index.html#org.apache.spark.sql.api.java.Row) objects along with
 a schema that describes the data types of each column in the row.  A JavaSchemaRDD is similar to a table
-in a traditional relational database.  A JavaSchemaRDD can be created from an existing RDD, parquet
+in a traditional relational database.  A JavaSchemaRDD can be created from an existing RDD, [Parquet](http://parquet.io)
 file, or by running HiveQL against data stored in [Apache Hive](http://hive.apache.org/).
 </div>
 
@@ -41,13 +40,15 @@ Spark.  At the core of this component is a new type of RDD,
 [SchemaRDD](api/python/pyspark.sql.SchemaRDD-class.html).  SchemaRDDs are composed
 [Row](api/python/pyspark.sql.Row-class.html) objects along with
 a schema that describes the data types of each column in the row.  A SchemaRDD is similar to a table
-in a traditional relational database.  A SchemaRDD can be created from an existing RDD, parquet
+in a traditional relational database.  A SchemaRDD can be created from an existing RDD, [Parquet](http://parquet.io)
 file, or by running HiveQL against data stored in [Apache Hive](http://hive.apache.org/).
 
-**All of the examples on this page use sample data included in the Spark distribution and can be run in the `pyspark` shell.**
+All of the examples on this page use sample data included in the Spark distribution and can be run in the `pyspark` shell.
 </div>
 </div>
 
+**Spark SQL is currently an alpha component. While we will minimize API changes, some APIs may change in future releases.**
+
 ***************************************************************************************************
 
 # Getting Started
@@ -240,8 +241,8 @@ Users that want a more complete dialect of SQL should look at the HiveQL support
 
 ## Using Parquet
 
-Parquet is a columnar format that is supported by many other data processing systems.  Spark SQL
-provides support for both reading and writing parquet files that automatically preserves the schema
+[Parquet](http://parquet.io) is a columnar format that is supported by many other data processing systems.
+Spark SQL provides support for both reading and writing Parquet files that automatically preserves the schema
 of the original data.  Using the data from the above example:
 
 <div class="codetabs">
@@ -254,11 +255,11 @@ import sqlContext._
 
 val people: RDD[Person] = ... // An RDD of case class objects, from the previous example.
 
-// The RDD is implicitly converted to a SchemaRDD, allowing it to be stored using parquet.
+// The RDD is implicitly converted to a SchemaRDD, allowing it to be stored using Parquet.
 people.saveAsParquetFile("people.parquet")
 
 // Read in the parquet file created above.  Parquet files are self-describing so the schema is preserved.
-// The result of loading a parquet file is also a JavaSchemaRDD.
+// The result of loading a Parquet file is also a JavaSchemaRDD.
 val parquetFile = sqlContext.parquetFile("people.parquet")
 
 //Parquet files can also be registered as tables and then used in SQL statements.
@@ -275,10 +276,10 @@ teenagers.collect().foreach(println)
 
 JavaSchemaRDD schemaPeople = ... // The JavaSchemaRDD from the previous example.
 
-// JavaSchemaRDDs can be saved as parquet files, maintaining the schema information.
+// JavaSchemaRDDs can be saved as Parquet files, maintaining the schema information.
 schemaPeople.saveAsParquetFile("people.parquet");
 
-// Read in the parquet file created above.  Parquet files are self-describing so the schema is preserved.
+// Read in the Parquet file created above.  Parquet files are self-describing so the schema is preserved.
 // The result of loading a parquet file is also a JavaSchemaRDD.
 JavaSchemaRDD parquetFile = sqlCtx.parquetFile("people.parquet");
 
@@ -297,10 +298,10 @@ JavaSchemaRDD teenagers = sqlCtx.sql("SELECT name FROM parquetFile WHERE age >=
 
 peopleTable # The SchemaRDD from the previous example.
 
-# SchemaRDDs can be saved as parquet files, maintaining the schema information.
+# SchemaRDDs can be saved as Parquet files, maintaining the schema information.
 peopleTable.saveAsParquetFile("people.parquet")
 
-# Read in the parquet file created above.  Parquet files are self-describing so the schema is preserved.
+# Read in the Parquet file created above.  Parquet files are self-describing so the schema is preserved.
 # The result of loading a parquet file is also a SchemaRDD.
 parquetFile = sqlCtx.parquetFile("people.parquet")
author	Matei Zaharia <matei@databricks.com>	2014-05-30 00:34:33 -0700
committer	Patrick Wendell <pwendell@gmail.com>	2014-05-30 00:34:33 -0700
commit	c8bf4131bc2a2e147e977159fc90e94b85738830 (patch)
tree	a2f885df8fb6654bd7750bb344b97a6cb6889bf3 /docs/sql-programming-guide.md
parent	eeee978a348ec2a35cc27865cea6357f9db75b74 (diff)
download	spark-c8bf4131bc2a2e147e977159fc90e94b85738830.tar.gz spark-c8bf4131bc2a2e147e977159fc90e94b85738830.tar.bz2 spark-c8bf4131bc2a2e147e977159fc90e94b85738830.zip