aboutsummaryrefslogtreecommitdiff
path: root/sql/README.md
diff options
context:
space:
mode:
authorLiquan Pei <liquanpei@gmail.com>2014-10-08 17:16:54 -0700
committerMichael Armbrust <michael@databricks.com>2014-10-08 17:16:54 -0700
commit00b7791720e50119a98084b2e8755e1b593ca55f (patch)
tree90033dd78d8ca8a8f5c20f44047f0ba1b5d0a912 /sql/README.md
parenta42cc08d219c579019f613faa8d310e6069c06fe (diff)
downloadspark-00b7791720e50119a98084b2e8755e1b593ca55f.tar.gz
spark-00b7791720e50119a98084b2e8755e1b593ca55f.tar.bz2
spark-00b7791720e50119a98084b2e8755e1b593ca55f.zip
[SQL][Doc] Keep Spark SQL README.md up to date
marmbrus Update README.md to be consistent with Spark 1.1 Author: Liquan Pei <liquanpei@gmail.com> Closes #2706 from Ishiihara/SparkSQL-readme and squashes the following commits: 33b9d4b [Liquan Pei] keep README.md up to date
Diffstat (limited to 'sql/README.md')
-rw-r--r--sql/README.md31
1 files changed, 15 insertions, 16 deletions
diff --git a/sql/README.md b/sql/README.md
index 31f9152344..c84534da9a 100644
--- a/sql/README.md
+++ b/sql/README.md
@@ -44,38 +44,37 @@ Type in expressions to have them evaluated.
Type :help for more information.
scala> val query = sql("SELECT * FROM (SELECT * FROM src) a")
-query: org.apache.spark.sql.ExecutedQuery =
-SELECT * FROM (SELECT * FROM src) a
-=== Query Plan ===
-Project [key#6:0.0,value#7:0.1]
- HiveTableScan [key#6,value#7], (MetastoreRelation default, src, None), None
+query: org.apache.spark.sql.SchemaRDD =
+== Query Plan ==
+== Physical Plan ==
+HiveTableScan [key#10,value#11], (MetastoreRelation default, src, None), None
```
Query results are RDDs and can be operated as such.
```
scala> query.collect()
-res8: Array[org.apache.spark.sql.execution.Row] = Array([238,val_238], [86,val_86], [311,val_311]...
+res2: Array[org.apache.spark.sql.Row] = Array([238,val_238], [86,val_86], [311,val_311], [27,val_27]...
```
You can also build further queries on top of these RDDs using the query DSL.
```
-scala> query.where('key === 100).toRdd.collect()
-res11: Array[org.apache.spark.sql.execution.Row] = Array([100,val_100], [100,val_100])
+scala> query.where('key === 100).collect()
+res3: Array[org.apache.spark.sql.Row] = Array([100,val_100], [100,val_100])
```
-From the console you can even write rules that transform query plans. For example, the above query has redundant project operators that aren't doing anything. This redundancy can be eliminated using the `transform` function that is available on all [`TreeNode`](http://databricks.github.io/catalyst/latest/api/#catalyst.trees.TreeNode) objects.
+From the console you can even write rules that transform query plans. For example, the above query has redundant project operators that aren't doing anything. This redundancy can be eliminated using the `transform` function that is available on all [`TreeNode`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala) objects.
```scala
-scala> query.logicalPlan
-res1: catalyst.plans.logical.LogicalPlan =
-Project {key#0,value#1}
- Project {key#0,value#1}
+scala> query.queryExecution.analyzed
+res4: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
+Project [key#10,value#11]
+ Project [key#10,value#11]
MetastoreRelation default, src, None
-scala> query.logicalPlan transform {
+scala> query.queryExecution.analyzed transform {
| case Project(projectList, child) if projectList == child.output => child
| }
-res2: catalyst.plans.logical.LogicalPlan =
-Project {key#0,value#1}
+res5: res17: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
+Project [key#10,value#11]
MetastoreRelation default, src, None
```