[SPARK-7035] Encourage __getitem__ over __getattr__ on column access in the Python DataFrame API

Author: ksonj <kson@siberie.de> Closes #5971 from ksonj/doc and squashes the following commits: dadfebb [ksonj] __getitem__ is cleaner than __getattr__
author: ksonj <kson@siberie.de> 2015-05-07 01:02:00 -0700
committer: Reynold Xin <rxin@databricks.com> 2015-05-07 01:02:00 -0700
commit: fae4e2d6094de57a438ee4188ce47fc5b01b96fe (patch)
tree: ee6e489e407349e2bda3f4c70b0375e0e57a971a /docs
parent: fa8fddffd52f8146ccceb72c2990607aaf5b2131 (diff)
download: spark-fae4e2d6094de57a438ee4188ce47fc5b01b96fe.tar.gz
spark-fae4e2d6094de57a438ee4188ce47fc5b01b96fe.tar.bz2
spark-fae4e2d6094de57a438ee4188ce47fc5b01b96fe.zip
1 files changed, 8 insertions, 3 deletions
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index b8233ae06f..df4c123bdd 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -139,7 +139,6 @@ DataFrames provide a domain-specific language for structured data manipulation i
 
 Here we include some basic examples of structured data processing using DataFrames:
 
-
 <div class="codetabs">
 <div data-lang="scala"  markdown="1">
 {% highlight scala %}
@@ -242,6 +241,12 @@ df.groupBy("age").count().show();
 </div>
 
 <div data-lang="python"  markdown="1">
+In Python it's possible to access a DataFrame's columns either by attribute
+(`df.age`) or by indexing (`df['age']`). While the former is convenient for
+interactive data exploration, users are highly encouraged to use the
+latter form, which is future proof and won't break with column names that
+are also attributes on the DataFrame class.
+
 {% highlight python %}
 from pyspark.sql import SQLContext
 sqlContext = SQLContext(sc)
@@ -270,14 +275,14 @@ df.select("name").show()
 ## Justin
 
 # Select everybody, but increment the age by 1
-df.select(df.name, df.age + 1).show()
+df.select(df['name'], df['age'] + 1).show()
 ## name    (age + 1)
 ## Michael null
 ## Andy    31
 ## Justin  20
 
 # Select people older than 21
-df.filter(df.age > 21).show()
+df.filter(df['age'] > 21).show()
 ## age name
 ## 30  Andy
author	ksonj <kson@siberie.de>	2015-05-07 01:02:00 -0700
committer	Reynold Xin <rxin@databricks.com>	2015-05-07 01:02:00 -0700
commit	fae4e2d6094de57a438ee4188ce47fc5b01b96fe (patch)
tree	ee6e489e407349e2bda3f4c70b0375e0e57a971a /docs
parent	fa8fddffd52f8146ccceb72c2990607aaf5b2131 (diff)
download	spark-fae4e2d6094de57a438ee4188ce47fc5b01b96fe.tar.gz spark-fae4e2d6094de57a438ee4188ce47fc5b01b96fe.tar.bz2 spark-fae4e2d6094de57a438ee4188ce47fc5b01b96fe.zip