[SPARK-15392][SQL] fix default value of size estimation of logical plan

## What changes were proposed in this pull request? We use autoBroadcastJoinThreshold + 1L as the default value of size estimation, that is not good in 2.0, because we will calculate the size based on size of schema, then the estimation could be less than autoBroadcastJoinThreshold if you have an SELECT on top of an DataFrame created from RDD. This PR change the default value to Long.MaxValue. ## How was this patch tested? Added regression tests. Author: Davies Liu <davies@databricks.com> Closes #13183 from davies/fix_default_size.
author: Davies Liu <davies@databricks.com> 2016-05-19 12:12:42 -0700
committer: Andrew Or <andrew@databricks.com> 2016-05-19 12:12:42 -0700
commit: 5ccecc078aa757d3f1f6632aa6df5659490f602f (patch)
tree: e8f37bd325eb96eceb4413a2c8dcaea8ea9c6dc5 /python
parent: 4e3cb7a5d965fd490390398ecfe35f1fc05e8511 (diff)
download: spark-5ccecc078aa757d3f1f6632aa6df5659490f602f.tar.gz
spark-5ccecc078aa757d3f1f6632aa6df5659490f602f.tar.bz2
spark-5ccecc078aa757d3f1f6632aa6df5659490f602f.zip
1 files changed, 1 insertions, 1 deletions
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index a68ef33d39..4fa799ac55 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -576,7 +576,7 @@ class DataFrame(object):
         >>> df_as2 = df.alias("df_as2")
         >>> joined_df = df_as1.join(df_as2, col("df_as1.name") == col("df_as2.name"), 'inner')
         >>> joined_df.select("df_as1.name", "df_as2.name", "df_as2.age").collect()
-        [Row(name=u'Alice', name=u'Alice', age=2), Row(name=u'Bob', name=u'Bob', age=5)]
+        [Row(name=u'Bob', name=u'Bob', age=5), Row(name=u'Alice', name=u'Alice', age=2)]
         """
         assert isinstance(alias, basestring), "alias should be a string"
         return DataFrame(getattr(self._jdf, "as")(alias), self.sql_ctx)
author	Davies Liu <davies@databricks.com>	2016-05-19 12:12:42 -0700
committer	Andrew Or <andrew@databricks.com>	2016-05-19 12:12:42 -0700
commit	5ccecc078aa757d3f1f6632aa6df5659490f602f (patch)
tree	e8f37bd325eb96eceb4413a2c8dcaea8ea9c6dc5 /python
parent	4e3cb7a5d965fd490390398ecfe35f1fc05e8511 (diff)
download	spark-5ccecc078aa757d3f1f6632aa6df5659490f602f.tar.gz spark-5ccecc078aa757d3f1f6632aa6df5659490f602f.tar.bz2 spark-5ccecc078aa757d3f1f6632aa6df5659490f602f.zip