aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorhyukjinkwon <gurwls223@gmail.com>2017-01-07 12:52:41 +0000
committerSean Owen <sowen@cloudera.com>2017-01-07 12:52:41 +0000
commit68ea290b3aa89b2a539d13ea2c18bdb5a651b2bf (patch)
tree9c81ab0539bceef888db09c8722762107a50c4dd
parentd60f6f62d00ffccc40ed72e15349358fe3543311 (diff)
downloadspark-68ea290b3aa89b2a539d13ea2c18bdb5a651b2bf.tar.gz
spark-68ea290b3aa89b2a539d13ea2c18bdb5a651b2bf.tar.bz2
spark-68ea290b3aa89b2a539d13ea2c18bdb5a651b2bf.zip
[SPARK-13748][PYSPARK][DOC] Add the description for explictly setting None for a named argument for a Row
## What changes were proposed in this pull request? It seems allowed to not set a key and value for a dict to represent the value is `None` or missing as below: ``` python spark.createDataFrame([{"x": 1}, {"y": 2}]).show() ``` ``` +----+----+ | x| y| +----+----+ | 1|null| |null| 2| +----+----+ ``` However, it seems it is not for `Row` as below: ``` python spark.createDataFrame([Row(x=1), Row(y=2)]).show() ``` ``` scala 16/06/19 16:25:56 ERROR Executor: Exception in task 6.0 in stage 66.0 (TID 316) java.lang.IllegalStateException: Input row doesn't have expected number of values required by the schema. 2 fields are required while 1 values are provided. at org.apache.spark.sql.execution.python.EvaluatePython$.fromJava(EvaluatePython.scala:147) at org.apache.spark.sql.SparkSession$$anonfun$7.apply(SparkSession.scala:656) at org.apache.spark.sql.SparkSession$$anonfun$7.apply(SparkSession.scala:656) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:247) at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:780) ``` The behaviour seems right but it seems it might confuse users just like this JIRA was reported. This PR adds the explanation for `Row` class. ## How was this patch tested? N/A Author: hyukjinkwon <gurwls223@gmail.com> Closes #13771 from HyukjinKwon/SPARK-13748.
-rw-r--r--python/pyspark/sql/types.py4
1 files changed, 3 insertions, 1 deletions
diff --git a/python/pyspark/sql/types.py b/python/pyspark/sql/types.py
index 4a023123b6..26b54a7fb3 100644
--- a/python/pyspark/sql/types.py
+++ b/python/pyspark/sql/types.py
@@ -1389,7 +1389,9 @@ class Row(tuple):
``key in row`` will search through row keys.
Row can be used to create a row object by using named arguments,
- the fields will be sorted by names.
+ the fields will be sorted by names. It is not allowed to omit
+ a named argument to represent the value is None or missing. This should be
+ explicitly set to None in this case.
>>> row = Row(name="Alice", age=11)
>>> row