spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-7320] [SQL] [Minor] Move the testData into beforeAll()	Cheng Hao	2015-05-21	1	-7/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Follow up of #6340, to avoid the test report missing once it fails. Author: Cheng Hao <hao.cheng@intel.com> Closes #6312 from chenghao-intel/rollup_minor and squashes the following commits: b03a25f [Cheng Hao] simplify the testData instantiation 09b7e8b [Cheng Hao] move the testData into beforeAll() (cherry picked from commit feb3a9d3f81f19850fddbd9639823f59a60efa52) Signed-off-by: Yin Huai <yhuai@databricks.com>
*	[SPARK-7745] Change asserts to requires for user input checks in Spark Streaming	Burak Yavuz	2015-05-21	7	-38/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Assertions can be turned off. `require` throws an `IllegalArgumentException` which makes more sense when it's a user set variable. Author: Burak Yavuz <brkyvz@gmail.com> Closes #6271 from brkyvz/streaming-require and squashes the following commits: d249484 [Burak Yavuz] fix merge conflict 264adb8 [Burak Yavuz] addressed comments v1.0 6161350 [Burak Yavuz] fix tests 16aa766 [Burak Yavuz] changed more assertions to more meaningful errors afd923d [Burak Yavuz] changed some assertions to require (cherry picked from commit 1ee8eb431e04db16f95f0bcb3a546ad6e14b616f) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	[SPARK-7753] [MLLIB] Update KernelDensity API	Xiangrui Meng	2015-05-20	3	-48/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update `KernelDensity` API to make it extensible to different kernels in the future. `bandwidth` is used instead of `standardDeviation`. The static `kernelDensity` method is removed from `Statistics`. The implementation is updated using BLAS, while the algorithm remains the same. sryza srowen Author: Xiangrui Meng <meng@databricks.com> Closes #6279 from mengxr/SPARK-7753 and squashes the following commits: 4cdfadc [Xiangrui Meng] add example code in the doc 767fd5a [Xiangrui Meng] update KernelDensity API (cherry picked from commit 947ea1cf5f6986aa687631d6cf9f0fb974ee7caf) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[SPARK-7606] [SQL] [PySpark] add version to Python SQL API docs	Davies Liu	2015-05-20	7	-18/+170
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add version info for public Python SQL API. cc rxin Author: Davies Liu <davies@databricks.com> Closes #6295 from davies/versions and squashes the following commits: cfd91e6 [Davies Liu] add more version for DataFrame API 600834d [Davies Liu] add version to SQL API docs (cherry picked from commit 8ddcb25b3990ec691463f87d4071e7425f4909a9) Signed-off-by: Reynold Xin <rxin@databricks.com>
*	[SPARK-7746][SQL] Add FetchSize parameter for JDBC driver	Liang-Chi Hsieh	2015-05-20	2	-3/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	JIRA: https://issues.apache.org/jira/browse/SPARK-7746 Looks like an easy to add parameter but can show significant performance improvement if the JDBC driver accepts it. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #6283 from viirya/jdbc_fetchsize and squashes the following commits: de47f94 [Liang-Chi Hsieh] Don't keep fetchSize as single parameter. b7bff2f [Liang-Chi Hsieh] Add FetchSize parameter for JDBC driver. (cherry picked from commit d0eb9ffe978c663b7aa06e908cadee81767d23d1) Signed-off-by: Reynold Xin <rxin@databricks.com>
*	[SPARK-7774] [MLLIB] add sqlContext to MLlibTestSparkContext	Xiangrui Meng	2015-05-20	14	-79/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	to simplify test suites that require a SQLContext. Author: Xiangrui Meng <meng@databricks.com> Closes #6303 from mengxr/SPARK-7774 and squashes the following commits: 0622b5a [Xiangrui Meng] update some other test suites e1f9b8d [Xiangrui Meng] add sqlContext to MLlibTestSparkContext (cherry picked from commit ddec173cba63df723cd94508121d8c06d8c153c6) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[SPARK-7320] [SQL] Add Cube / Rollup for dataframe	Cheng Hao	2015-05-20	3	-28/+237
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a follow up for #6257, which broke the maven test. Add cube & rollup for DataFrame For example: ```scala testData.rollup($"a" + $"b", $"b").agg(sum($"a" - $"b")) testData.cube($"a" + $"b", $"b").agg(sum($"a" - $"b")) ``` Author: Cheng Hao <hao.cheng@intel.com> Closes #6304 from chenghao-intel/rollup and squashes the following commits: 04bb1de [Cheng Hao] move the table register/unregister into beforeAll/afterAll a6069f1 [Cheng Hao] cancel the implicit keyword ced4b8f [Cheng Hao] remove the unnecessary code changes 9959dfa [Cheng Hao] update the code as comments e1d88aa [Cheng Hao] update the code as suggested 03bc3d9 [Cheng Hao] Remove the CubedData & RollupedData 5fd62d0 [Cheng Hao] hiden the CubedData & RollupedData 5ffb196 [Cheng Hao] Add Cube / Rollup for dataframe (cherry picked from commit 42c592adb381ff20832cce55e0849ed68dd7eee4) Signed-off-by: Yin Huai <yhuai@databricks.com>
*	[SPARK-7777] [STREAMING] Fix the flaky test in ↵	zsxwing	2015-05-20	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	org.apache.spark.streaming.BasicOperationsSuite Just added a guard to make sure a batch has completed before moving to the next batch. Author: zsxwing <zsxwing@gmail.com> Closes #6306 from zsxwing/SPARK-7777 and squashes the following commits: ecee529 [zsxwing] Fix the failure message 58634fe [zsxwing] Fix the flaky test in org.apache.spark.streaming.BasicOperationsSuite (cherry picked from commit 895baf8f77e630ce32b0e25b00bf5ee45d17398f) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	[SPARK-7750] [WEBUI] Rename endpoints from `json` to `api` to allow fu…	Hari Shreedharan	2015-05-20	7	-25/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	…rther extension to non-json outputs too. Author: Hari Shreedharan <hshreedharan@apache.org> Closes #6273 from harishreedharan/json-to-api and squashes the following commits: e14b73b [Hari Shreedharan] Rename `getJsonServlet` to `getServletHandler` i 42f8acb [Hari Shreedharan] Import order fixes. 2ef852f [Hari Shreedharan] [SPARK-7750][WebUI] Rename endpoints from `json` to `api` to allow further extension to non-json outputs too. (cherry picked from commit a70bf06b790add5f279a69607df89ed36155b0e4) Signed-off-by: Imran Rashid <irashid@cloudera.com>
*	[SPARK-7719] Re-add UnsafeShuffleWriterSuite test that was removed for Java ↵	Josh Rosen	2015-05-20	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	6 compat This patch re-adds a test which was removed in 9ebb44f8abb1a13f045eed60190954db904ffef7 due to a Java 6 compatibility issue. We now use Guava's `Iterators.emptyIterator()` in place of `Collections.emptyIterator()`, which isn't present in all Java 6 versions. Author: Josh Rosen <joshrosen@databricks.com> Closes #6298 from JoshRosen/SPARK-7719-fix-java-6-test-code and squashes the following commits: 5c9bd85 [Josh Rosen] Re-add UnsafeShuffleWriterSuite.emptyIterator() test which was removed due to Java 6 issue (cherry picked from commit 5196efff53af4965ff216a9d5c0f8b2b4fc98652) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
*	Preparing development version 1.4.0-SNAPSHOT	Patrick Wendell	2015-05-20	30	-30/+30
\|
*	Preparing Spark release rc-test	Patrick Wendell	2015-05-20	30	-30/+30
\|
*	[SPARK-7762] [MLLIB] set default value for outputCol	Xiangrui Meng	2015-05-20	5	-4/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Set a default value for `outputCol` instead of forcing users to name it. This is useful for intermediate transformers in the pipeline. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #6289 from mengxr/SPARK-7762 and squashes the following commits: 54edebc [Xiangrui Meng] merge master bff8667 [Xiangrui Meng] update unit test 171246b [Xiangrui Meng] add unit test for outputCol a4321bd [Xiangrui Meng] set default value for outputCol (cherry picked from commit c330e52dae6a3ec7e67ca82e2c2f4ea873976458) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
*	Preparing development version 1.4.0-SNAPSHOT	pwendell	2015-05-20	30	-30/+30
\|
*	Preparing Spark release rc-test	pwendell	2015-05-20	30	-30/+30
\|
*	Preparing development version 1.4.0-SNAPSHOT	jenkins	2015-05-20	30	-30/+30
\|
*	Preparing Spark release rc-test	jenkins	2015-05-20	30	-30/+30
\|
*	[SPARK-7251] Perform sequential scan when iterating over BytesToBytesMap	Josh Rosen	2015-05-20	5	-53/+274
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch modifies `BytesToBytesMap.iterator()` to iterate through records in the order that they appear in the data pages rather than iterating through the hashtable pointer arrays. This results in fewer random memory accesses, significantly improving performance for scan-and-copy operations. This is possible because our data pages are laid out as sequences of `[keyLength][data][valueLength][data]` entries. In order to mark the end of a partially-filled data page, we write `-1` as a special end-of-page length (BytesToByesMap supports empty/zero-length keys and values, which is why we had to use a negative length). This patch incorporates / closes #5836. Author: Josh Rosen <joshrosen@databricks.com> Closes #6159 from JoshRosen/SPARK-7251 and squashes the following commits: 05bd90a [Josh Rosen] Compare capacity, not size, to MAX_CAPACITY 2a20d71 [Josh Rosen] Fix maximum BytesToBytesMap capacity bc4854b [Josh Rosen] Guard against overflow when growing BytesToBytesMap f5feadf [Josh Rosen] Add test for iterating over an empty map 273b842 [Josh Rosen] [SPARK-7251] Perform sequential scan when iterating over entries in BytesToBytesMap (cherry picked from commit f2faa7af30662e3bdf15780f8719c71108f8e30b) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
*	[SPARK-7698] Cache and reuse buffers in ExecutorMemoryAllocator when using ↵	Josh Rosen	2015-05-20	1	-2/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	heap allocation When on-heap memory allocation is used, ExecutorMemoryManager should maintain a cache / pool of buffers for re-use by tasks. This will significantly improve the performance of the new Tungsten's sort-shuffle for jobs with many short-lived tasks by eliminating a major source of GC. This pull request is a minimum-viable-implementation of this idea. In its current form, this patch significantly improves performance on a stress test which launches huge numbers of short-lived shuffle map tasks back-to-back in the same JVM. Author: Josh Rosen <joshrosen@databricks.com> Closes #6227 from JoshRosen/SPARK-7698 and squashes the following commits: fd6cb55 [Josh Rosen] SoftReference -> WeakReference b154e86 [Josh Rosen] WIP sketch of pooling in ExecutorMemoryManager (cherry picked from commit 7956dd7ab03e1542d89dd94c043f1e5131684199) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
*	Preparing development version 1.4.0-SNAPSHOT	Patrick Wendell	2015-05-20	30	-30/+30
\|
*	Preparing Spark release rc-test	Patrick Wendell	2015-05-20	30	-30/+30
\|
*	[SPARK-7767] [STREAMING] Added test for checkpoint serialization in ↵	Tathagata Das	2015-05-20	4	-36/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	StreamingContext.start() Currently, the background checkpointing thread fails silently if the checkpoint is not serializable. It is hard to debug and therefore its best to fail fast at `start()` when checkpointing is enabled and the checkpoint is not serializable. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #6292 from tdas/SPARK-7767 and squashes the following commits: 51304e6 [Tathagata Das] Addressed comments. c35237b [Tathagata Das] Added test for checkpoint serialization in StreamingContext.start() (cherry picked from commit 3c434cbfd0d6821e5bcf572be792b787a514018b) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	[SPARK-7237] [SPARK-7741] [CORE] [STREAMING] Clean more closures that need ↵	Andrew Or	2015-05-20	9	-37/+249
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cleaning SPARK-7741 is the equivalent of SPARK-7237 in streaming. This is an alternative to #6268. Author: Andrew Or <andrew@databricks.com> Closes #6269 from andrewor14/clean-moar and squashes the following commits: c51c9ab [Andrew Or] Add periods (trivial) 6c686ac [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-moar 79a435b [Andrew Or] Fix tests d18c9f9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-moar 65ef07b [Andrew Or] Fix tests? 4b487a3 [Andrew Or] Add tests for closures passed to DStream operations 328139b [Andrew Or] Do not forget foreachRDD 5431f61 [Andrew Or] Clean streaming closures 72b7b73 [Andrew Or] Clean core closures (cherry picked from commit 9b84443dd43777e25b0b00468c61814fe6d26c23) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
*	[SPARK-7511] [MLLIB] pyspark ml seed param should be random by default or 42 ↵	Holden Karau	2015-05-20	8	-64/+96
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	is quite funny but not very random Author: Holden Karau <holden@pigscanfly.ca> Closes #6139 from holdenk/SPARK-7511-pyspark-ml-seed-param-should-be-random-by-default-or-42-is-quite-funny-but-not-very-random and squashes the following commits: 591f8e5 [Holden Karau] specify old seed for doc tests 2470004 [Holden Karau] Fix a bunch of seeds with default values to have None as the default which will then result in using the hash of the class name cbad96d [Holden Karau] Add the setParams function that is used in the real code 423b8d7 [Holden Karau] Switch the test code to behave slightly more like production code. also don't check the param map value only check for key existence 140d25d [Holden Karau] remove extra space 926165a [Holden Karau] Add some missing newlines for pep8 style 8616751 [Holden Karau] merge in master 58532e6 [Holden Karau] its the __name__ method, also treat None values as not set 56ef24a [Holden Karau] fix test and regenerate base afdaa5c [Holden Karau] make sure different classes have different results 68eb528 [Holden Karau] switch default seed to hash of type of self 89c4611 [Holden Karau] Merge branch 'master' into SPARK-7511-pyspark-ml-seed-param-should-be-random-by-default-or-42-is-quite-funny-but-not-very-random 31cd96f [Holden Karau] specify the seed to randomforestregressor test e1b947f [Holden Karau] Style fixes ce90ec8 [Holden Karau] merge in master bcdf3c9 [Holden Karau] update docstring seeds to none and some other default seeds from 42 65eba21 [Holden Karau] pep8 fixes 0e3797e [Holden Karau] Make seed default to random in more places 213a543 [Holden Karau] Simplify the generated code to only include set default if there is a default rather than having None is note None in the generated code 1ff17c2 [Holden Karau] Make the seed random for HasSeed in python (cherry picked from commit 191ee474527530246ac3164ae9631e01bdd1e647) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
*	Revert "[SPARK-7320] [SQL] Add Cube / Rollup for dataframe"	Patrick Wendell	2015-05-20	3	-230/+28
\| \| \| \|	This reverts commit 10698e1131f665addb454cd498669920699a91b2.
*	[SPARK-7579] [ML] [DOC] User guide update for OneHotEncoder	Sandy Ryza	2015-05-20	1	-0/+95
\| \| \| \| \| \| \| \| \| \| \|	Author: Sandy Ryza <sandy@cloudera.com> Closes #6126 from sryza/sandy-spark-7579 and squashes the following commits: 5af803d [Sandy Ryza] SPARK-7579 [MLLIB] User guide update for OneHotEncoder (cherry picked from commit 829f1d95bac9153e7b646fbc0d55566ecf896200) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
*	[SPARK-7537] [MLLIB] spark.mllib API updates	Xiangrui Meng	2015-05-20	2	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Minor updates to the spark.mllib APIs: 1. Add `DeveloperApi` to `PMMLExportable` and add `Experimental` to `toPMML` methods. 2. Mention `RankingMetrics.of` in the `RankingMetrics` constructor. Author: Xiangrui Meng <meng@databricks.com> Closes #6280 from mengxr/SPARK-7537 and squashes the following commits: 1bd2583 [Xiangrui Meng] organize imports 94afa7a [Xiangrui Meng] mark all toPMML methods experimental 4c40da1 [Xiangrui Meng] mention the factory method for RankingMetrics for Java users 88c62d0 [Xiangrui Meng] add DeveloperApi to PMMLExportable (cherry picked from commit 2ad4837cfa66fcedc96b0819a8c2f4c3d70b0aaa) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[SPARK-7713] [SQL] Use shared broadcast hadoop conf for partitioned table scan.	Yin Huai	2015-05-20	4	-48/+387
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-7713 I tested the performance with the following code: ```scala import sqlContext._ import sqlContext.implicits._ (1 to 5000).foreach { i => val df = (1 to 1000).map(j => (j, s"str$j")).toDF("a", "b").save(s"/tmp/partitioned/i=$i") } sqlContext.sql(""" CREATE TEMPORARY TABLE partitionedParquet USING org.apache.spark.sql.parquet OPTIONS ( path '/tmp/partitioned' )""") table("partitionedParquet").explain(true) ``` In our master `explain` takes 40s in my laptop. With this PR, `explain` takes 14s. Author: Yin Huai <yhuai@databricks.com> Closes #6252 from yhuai/broadcastHadoopConf and squashes the following commits: 6fa73df [Yin Huai] Address comments of Josh and Andrew. 807fbf9 [Yin Huai] Make the new buildScan and SqlNewHadoopRDD private sql. e393555 [Yin Huai] Cheng's comments. 2eb53bb [Yin Huai] Use a shared broadcast Hadoop Configuration for partitioned HadoopFsRelations. (cherry picked from commit b631bf73b9f288f37c98b806be430b22485880e5) Signed-off-by: Yin Huai <yhuai@databricks.com>
*	[SPARK-6094] [MLLIB] Add MultilabelMetrics in PySpark/MLlib	Yanbo Liang	2015-05-20	2	-0/+125
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add MultilabelMetrics in PySpark/MLlib Author: Yanbo Liang <ybliang8@gmail.com> Closes #6276 from yanboliang/spark-6094 and squashes the following commits: b8e3343 [Yanbo Liang] Add MultilabelMetrics in PySpark/MLlib (cherry picked from commit 98a46f9dffec294386f6c39acafa7f11adb87a8f) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[SPARK-7654] [MLLIB] Migrate MLlib to the DataFrame reader/writer API	Xiangrui Meng	2015-05-20	10	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \|	parquetFile -> read.parquet rxin Author: Xiangrui Meng <meng@databricks.com> Closes #6281 from mengxr/SPARK-7654 and squashes the following commits: a79b612 [Xiangrui Meng] parquetFile -> read.parquet (cherry picked from commit 589b12f8e62ec5d10713ce057756ebc791e7ddc6) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[SPARK-7320] [SQL] Add Cube / Rollup for dataframe	Cheng Hao	2015-05-20	3	-28/+230
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add `cube` & `rollup` for DataFrame For example: ```scala testData.rollup($"a" + $"b", $"b").agg(sum($"a" - $"b")) testData.cube($"a" + $"b", $"b").agg(sum($"a" - $"b")) ``` Author: Cheng Hao <hao.cheng@intel.com> Closes #6257 from chenghao-intel/rollup and squashes the following commits: 7302319 [Cheng Hao] cancel the implicit keyword a66e38f [Cheng Hao] remove the unnecessary code changes a2869d4 [Cheng Hao] update the code as comments c441777 [Cheng Hao] update the code as suggested 84c9564 [Cheng Hao] Remove the CubedData & RollupedData 279584c [Cheng Hao] hiden the CubedData & RollupedData ef357e1 [Cheng Hao] Add Cube / Rollup for dataframe (cherry picked from commit 09265ad7c85c6de6b568ec329daad632d4a79fa3) Signed-off-by: Cheng Lian <lian@databricks.com>
*	[SPARK-7656] [SQL] use CatalystConf in FunctionRegistry	scwf	2015-05-19	3	-7/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	follow up for #5806 Author: scwf <wangfei1@huawei.com> Closes #6164 from scwf/FunctionRegistry and squashes the following commits: 15e6697 [scwf] use catalogconf in FunctionRegistry (cherry picked from commit 60336e3bc02a2587fdf315f9011bbe7c9d3a58c4) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-7744] [DOCS] [MLLIB] Distributed matrix" section in MLlib "Data ↵	Mike Dusenberry	2015-05-19	1	-64/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Types" documentation should be reordered. The documentation for BlockMatrix should come after RowMatrix, IndexedRowMatrix, and CoordinateMatrix, as BlockMatrix references the later three types, and RowMatrix is considered the "basic" distributed matrix. This will improve comprehensibility of the "Distributed matrix" section, especially for the new reader. Author: Mike Dusenberry <dusenberrymw@gmail.com> Closes #6270 from dusenberrymw/Reorder_MLlib_Data_Types_Distributed_matrix_docs and squashes the following commits: 6313bab [Mike Dusenberry] The documentation for BlockMatrix should come after RowMatrix, IndexedRowMatrix, and CoordinateMatrix, as BlockMatrix references the later three types, and RowMatrix is considered the "basic" distributed matrix. This will improve comprehensibility of the "Distributed matrix" section, especially for the new reader. (cherry picked from commit 3860520633770cc5719b2cdebe6dc3608798386d) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[SPARK-7662] [SQL] Resolve correct names for generator in projection	Cheng Hao	2015-05-19	3	-4/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	``` select explode(map(value, key)) from src; ``` Throws exception ``` org.apache.spark.sql.AnalysisException: The number of aliases supplied in the AS clause does not match the number of columns output by the UDTF expected 2 aliases but got _c0 ; at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:43) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveGenerate$$makeGeneratorOutput(Analyzer.scala:605) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate$$anonfun$apply$16$$anonfun$22.apply(Analyzer.scala:562) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate$$anonfun$apply$16$$anonfun$22.apply(Analyzer.scala:548) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate$$anonfun$apply$16.applyOrElse(Analyzer.scala:548) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate$$anonfun$apply$16.applyOrElse(Analyzer.scala:538) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:222) ``` Author: Cheng Hao <hao.cheng@intel.com> Closes #6178 from chenghao-intel/explode and squashes the following commits: 916fbe9 [Cheng Hao] add more strict rules for TGF alias 5c3f2c5 [Cheng Hao] fix bug in unit test e1d93ab [Cheng Hao] Add more unit test 19db09e [Cheng Hao] resolve names for generator in projection (cherry picked from commit bcb1ff81468eb4afc7c03b2bca18e99cc1ccf6b8) Signed-off-by: Michael Armbrust <michael@databricks.com>
*	[SPARK-7738] [SQL] [PySpark] add reader and writer API in Python	Davies Liu	2015-05-19	6	-92/+430
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cc rxin, please take a quick look, I'm working on tests. Author: Davies Liu <davies@databricks.com> Closes #6238 from davies/readwrite and squashes the following commits: c7200eb [Davies Liu] update tests 9cbf01b [Davies Liu] Merge branch 'master' of github.com:apache/spark into readwrite f0c5a04 [Davies Liu] use sqlContext.read.load 5f68bc8 [Davies Liu] update tests 6437e9a [Davies Liu] Merge branch 'master' of github.com:apache/spark into readwrite bcc6668 [Davies Liu] add reader amd writer API in Python (cherry picked from commit 4de74d2602f6577c3c8458aa85377e89c19724ca) Signed-off-by: Reynold Xin <rxin@databricks.com>
*	[SPARK-7652] [MLLIB] Update the implementation of naive Bayes prediction ↵	Liang-Chi Hsieh	2015-05-19	1	-17/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	with BLAS JIRA: https://issues.apache.org/jira/browse/SPARK-7652 Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #6189 from viirya/naive_bayes_blas_prediction and squashes the following commits: ab611fd [Liang-Chi Hsieh] Remove unnecessary space. ddc48b9 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into naive_bayes_blas_prediction b5772b4 [Liang-Chi Hsieh] Fix binary compatibility. 2f65186 [Liang-Chi Hsieh] Remove toDense. 1b6cdfe [Liang-Chi Hsieh] Update the implementation of naive Bayes prediction with BLAS. (cherry picked from commit c12dff9b82e4869f866a9b96ce0bf05503dd7dda) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[SPARK-7586] [ML] [DOC] Add docs of Word2Vec in ml package	Xusen Yin	2015-05-19	2	-0/+165
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CC jkbradley. JIRA [issue](https://issues.apache.org/jira/browse/SPARK-7586). Author: Xusen Yin <yinxusen@gmail.com> Closes #6181 from yinxusen/SPARK-7586 and squashes the following commits: 77014c5 [Xusen Yin] comment fix 57a4c07 [Xusen Yin] small fix for docs 1178c8f [Xusen Yin] remove the correctness check in java suite 1c3f389 [Xusen Yin] delete sbt commit 1af152b [Xusen Yin] check python example code 1b5369e [Xusen Yin] add docs of word2vec (cherry picked from commit 68fb2a46edc95f867d4b28597d20da2597f008c1) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
*	[SPARK-7726] Fix Scaladoc false errors	Iulian Dragos	2015-05-19	6	-3/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Visibility rules for static members are different in Scala and Java, and this case requires an explicit static import. Even though these are Java files, they are run through scaladoc, which enforces Scala rules. Also reverted the commit that reverts the upgrade to 2.11.6 Author: Iulian Dragos <jaguarul@gmail.com> Closes #6260 from dragos/issue/scaladoc-false-error and squashes the following commits: f2e998e [Iulian Dragos] Revert "[HOTFIX] Revert "[SPARK-7092] Update spark scala version to 2.11.6"" 0bad052 [Iulian Dragos] Fix scaladoc faux-error. (cherry picked from commit 3c4c1f96474b3e66fa1d44ac0177f548cf5a3a10) Signed-off-by: Patrick Wendell <patrick@databricks.com>
*	[SPARK-7678] [ML] Fix default random seed in HasSeed	Joseph K. Bradley	2015-05-19	6	-12/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Changed shared param HasSeed to have default based on hashCode of class name, instead of random number. Also, removed fixed random seeds from Word2Vec and ALS. CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #6251 from jkbradley/scala-fixed-seed and squashes the following commits: 0e37184 [Joseph K. Bradley] Fixed Word2VecSuite, ALSSuite in spark.ml to use original fixed random seeds 678ec3a [Joseph K. Bradley] Removed fixed random seeds from Word2Vec and ALS. Changed shared param HasSeed to have default based on hashCode of class name, instead of random number. (cherry picked from commit 7b16e9f2118fbfbb1c0ba957161fe500c9aff82a) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[SPARK-7047] [ML] ml.Model optional parent support	Joseph K. Bradley	2015-05-19	3	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Made Model.parent transient. Added Model.hasParent to test for null parent CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #5914 from jkbradley/parent-optional and squashes the following commits: d501774 [Joseph K. Bradley] Made Model.parent transient. Added Model.hasParent to test for null parent (cherry picked from commit fb90273212dc7241c9a0c3446e25e0e0b9377750) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	[SPARK-7704] Updating Programming Guides per SPARK-4397	Dice	2015-05-19	1	-6/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The change per SPARK-4397 makes implicit objects in SparkContext to be found by the compiler automatically. So that we don't need to import the o.a.s.SparkContext._ explicitly any more and can remove some statements around the "implicit conversions" from the latest Programming Guides (1.3.0 and higher) Author: Dice <poleon.kd@gmail.com> Closes #6234 from daisukebe/patch-1 and squashes the following commits: b77ecd9 [Dice] fix a typo 45dfcd3 [Dice] rewording per Sean's advice a094bcf [Dice] Adding a note for users on any previous releases a29be5f [Dice] Updating Programming Guides per SPARK-4397 (cherry picked from commit 32fa611b19c6b95d4563be631c5a8ff0cdf3438f) Signed-off-by: Sean Owen <sowen@cloudera.com>
*	[SPARK-7681] [MLLIB] remove mima excludes for 1.3	Xiangrui Meng	2015-05-19	1	-8/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	There excludes are unnecessary for 1.3 because the changes were made in 1.4.x. Author: Xiangrui Meng <meng@databricks.com> Closes #6254 from mengxr/SPARK-7681-mima and squashes the following commits: 7f0cea0 [Xiangrui Meng] remove mima excludes for 1.3 (cherry picked from commit 6845cb2ff475fd794b30b01af5ebc80714b880f0) Signed-off-by: Xiangrui Meng <meng@databricks.com>
*	Preparing development version 1.4.1-SNAPSHOT	Patrick Wendell	2015-05-19	30	-30/+30
\|
*	Preparing Spark release v1.4.0-rc1	Patrick Wendell	2015-05-19	30	-30/+30
\|
*	CHANGES.txt updates	Patrick Wendell	2015-05-19	1	-0/+35
\|
*	[SPARK-7723] Fix string interpolation in pipeline examples	Saleem Ansari	2015-05-19	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-7723 Author: Saleem Ansari <tuxdna@gmail.com> Closes #6258 from tuxdna/master and squashes the following commits: 2bb5a42 [Saleem Ansari] Merge branch 'master' into mllib-pipeline e39db9c [Saleem Ansari] Fix string interpolation in pipeline examples (cherry picked from commit df34793ad4e76214fc4c0a22af1eb89b171a32e4) Signed-off-by: Sean Owen <sowen@cloudera.com>
*	[HOTFIX] Revert "[SPARK-7092] Update spark scala version to 2.11.6"	Patrick Wendell	2015-05-19	2	-3/+3
\| \| \| \| \| \| \|	This reverts commit a11c8683c76c67f45749a1b50a0912a731fd2487. For more information see: https://issues.apache.org/jira/browse/SPARK-7726
*	Revert "Preparing Spark release v1.4.0-rc1"	Patrick Wendell	2015-05-19	30	-30/+30
\| \| \| \|	This reverts commit 79fb01a3be07b5086134a6fe103248e9a33a9500.
*	Revert "Preparing development version 1.4.1-SNAPSHOT"	Patrick Wendell	2015-05-19	30	-30/+30
\| \| \| \|	This reverts commit a1d896b85bd3fb88284f8b6758d7e5f0a1bb9eb3.
*	Fixing a few basic typos in the Programming Guide.	Mike Dusenberry	2015-05-19	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	Just a few minor fixes in the guide, so a new JIRA issue was not created per the guidelines. Author: Mike Dusenberry <dusenberrymw@gmail.com> Closes #6240 from dusenberrymw/Fix_Programming_Guide_Typos and squashes the following commits: ffa76eb [Mike Dusenberry] Fixing a few basic typos in the Programming Guide. (cherry picked from commit 61f164d3fdd1c8dcdba8c9d66df05ff4069aa6e6) Signed-off-by: Sean Owen <sowen@cloudera.com>