diff options
author | Davies Liu <davies@databricks.com> | 2015-02-10 19:40:12 -0800 |
---|---|---|
committer | Michael Armbrust <michael@databricks.com> | 2015-02-10 19:40:12 -0800 |
commit | ea60284095cad43aa7ac98256576375d0e91a52a (patch) | |
tree | 35ac6e3935e1e7c731f7b9a850f2daa9640387d1 /examples/src/main/python/sql.py | |
parent | a60aea86b4d4b716b5ec3bff776b509fe0831342 (diff) | |
download | spark-ea60284095cad43aa7ac98256576375d0e91a52a.tar.gz spark-ea60284095cad43aa7ac98256576375d0e91a52a.tar.bz2 spark-ea60284095cad43aa7ac98256576375d0e91a52a.zip |
[SPARK-5704] [SQL] [PySpark] createDataFrame from RDD with columns
Deprecate inferSchema() and applySchema(), use createDataFrame() instead, which could take an optional `schema` to create an DataFrame from an RDD. The `schema` could be StructType or list of names of columns.
Author: Davies Liu <davies@databricks.com>
Closes #4498 from davies/create and squashes the following commits:
08469c1 [Davies Liu] remove Scala/Java API for now
c80a7a9 [Davies Liu] fix hive test
d1bd8f2 [Davies Liu] cleanup applySchema
9526e97 [Davies Liu] createDataFrame from RDD with columns
Diffstat (limited to 'examples/src/main/python/sql.py')
-rw-r--r-- | examples/src/main/python/sql.py | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/examples/src/main/python/sql.py b/examples/src/main/python/sql.py index 7f5c68e3d0..47202fde75 100644 --- a/examples/src/main/python/sql.py +++ b/examples/src/main/python/sql.py @@ -31,7 +31,7 @@ if __name__ == "__main__": Row(name="Smith", age=23), Row(name="Sarah", age=18)]) # Infer schema from the first row, create a DataFrame and print the schema - some_df = sqlContext.inferSchema(some_rdd) + some_df = sqlContext.createDataFrame(some_rdd) some_df.printSchema() # Another RDD is created from a list of tuples @@ -40,7 +40,7 @@ if __name__ == "__main__": schema = StructType([StructField("person_name", StringType(), False), StructField("person_age", IntegerType(), False)]) # Create a DataFrame by applying the schema to the RDD and print the schema - another_df = sqlContext.applySchema(another_rdd, schema) + another_df = sqlContext.createDataFrame(another_rdd, schema) another_df.printSchema() # root # |-- age: integer (nullable = true) |