spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-8001] [CORE] Make AsynchronousListenerBus.waitUntilEmpty throw ↵	zsxwing	2015-06-03	6	-29/+30
\| \| \| \| \| \| \| \| \| \| \| \|	TimeoutException if timeout Some places forget to call `assert` to check the return value of `AsynchronousListenerBus.waitUntilEmpty`. Instead of adding `assert` in these places, I think it's better to make `AsynchronousListenerBus.waitUntilEmpty` throw `TimeoutException`. Author: zsxwing <zsxwing@gmail.com> Closes #6550 from zsxwing/SPARK-8001 and squashes the following commits: 607674a [zsxwing] Make AsynchronousListenerBus.waitUntilEmpty throw TimeoutException if timeout
*	[SPARK-8059] [YARN] Wake up allocation thread when new requests arrive.	Marcelo Vanzin	2015-06-03	2	-4/+19
\| \| \| \| \| \| \| \| \| \|	This should help reduce latency for new executor allocations. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #6600 from vanzin/SPARK-8059 and squashes the following commits: 8387a3a [Marcelo Vanzin] [SPARK-8059] [yarn] Wake up allocation thread when new requests arrive.
*	[SPARK-8083] [MESOS] Use the correct base path in mesos driver page.	Timothy Chen	2015-06-03	1	-1/+1
\| \| \| \| \| \| \| \|	Author: Timothy Chen <tnachen@gmail.com> Closes #6615 from tnachen/mesos_driver_path and squashes the following commits: 4f47b7c [Timothy Chen] Use the correct base path in mesos driver page.
*	[MINOR] [UI] Improve confusing message on log page	Andrew Or	2015-06-03	4	-11/+115
\| \| \| \| \|	It's good practice to check if the input path is in the directory we expect to avoid potentially confusing error messages.
*	[SPARK-8054] [MLLIB] Added several Java-friendly APIs + unit tests	Joseph K. Bradley	2015-06-03	13	-19/+284
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Java-friendly APIs added: * GaussianMixture.run() * GaussianMixtureModel.predict() * DistributedLDAModel.javaTopicDistributions() * StreamingKMeans: trainOn, predictOn, predictOnValues * Statistics.corr * params * added doc to w() since Java docs do not inherit doc * removed non-Java-friendly w() from StringArrayParam and DoubleArrayParam * made DoubleArrayParam Java-friendly w() actually Java-friendly I generated the doc and verified all changes. CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #6562 from jkbradley/java-api-1.4 and squashes the following commits: c16821b [Joseph K. Bradley] Small fixes based on code review. d955581 [Joseph K. Bradley] unit test fixes 29b6b0d [Joseph K. Bradley] small fixes fe6dcfe [Joseph K. Bradley] Added several Java-friendly APIs + unit tests: NaiveBayes, GaussianMixture, LDA, StreamingKMeans, Statistics.corr, params
*	Update documentation for [SPARK-7980] [SQL] Support SQLContext.range(end)	Reynold Xin	2015-06-03	2	-10/+12
\|
*	[SPARK-8074] Parquet should throw AnalysisException during setup for data ↵	Reynold Xin	2015-06-03	2	-17/+17
\| \| \| \| \| \| \| \| \| \| \|	type/name related failures. Author: Reynold Xin <rxin@databricks.com> Closes #6608 from rxin/parquet-analysis and squashes the following commits: b5dc8e2 [Reynold Xin] Code review feedback. 5617cf6 [Reynold Xin] [SPARK-8074] Parquet should throw AnalysisException during setup for data type/name related failures.
*	[SPARK-8063] [SPARKR] Spark master URL conflict between MASTER env variable ↵	Sun Rui	2015-06-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	and --master command line option. Author: Sun Rui <rui.sun@intel.com> Closes #6605 from sun-rui/SPARK-8063 and squashes the following commits: 51ca48b [Sun Rui] [SPARK-8063][SPARKR] Spark master URL conflict between MASTER env variable and --master command line option.
*	[SPARK-7161] [HISTORY SERVER] Provide REST api to download event logs fro...	Hari Shreedharan	2015-06-03	14	-19/+367
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	...m History Server This PR adds a new API that allows the user to download event logs for an application as a zip file. APIs have been added to download all logs for a given application or just for a specific attempt. This also add an additional method to the ApplicationHistoryProvider to get the raw files, zipped. Author: Hari Shreedharan <hshreedharan@apache.org> Closes #5792 from harishreedharan/eventlog-download and squashes the following commits: 221cc26 [Hari Shreedharan] Update docs with new API information. a131be6 [Hari Shreedharan] Fix style issues. 5528bd8 [Hari Shreedharan] Merge branch 'master' into eventlog-download 6e8156e [Hari Shreedharan] Simplify tests, use Guava stream copy methods. d8ddede [Hari Shreedharan] Remove unnecessary case in EventLogDownloadResource. ffffb53 [Hari Shreedharan] Changed interface to use zip stream. Added more tests. 1100b40 [Hari Shreedharan] Ensure that `Path` does not appear in interfaces, by rafactoring interfaces. 5a5f3e2 [Hari Shreedharan] Fix test ordering issue. 0b66948 [Hari Shreedharan] Minor formatting/import fixes. 4fc518c [Hari Shreedharan] Fix rat failures. a48b91f [Hari Shreedharan] Refactor to make attemptId optional in the API. Also added tests. 0fc1424 [Hari Shreedharan] File download now works for individual attempts and the entire application. 350d7e8 [Hari Shreedharan] Merge remote-tracking branch 'asf/master' into eventlog-download fd6ab00 [Hari Shreedharan] Fix style issues 32b7662 [Hari Shreedharan] Use UIRoot directly in ApiRootResource. Also, use `Response` class to set headers. 7b362b2 [Hari Shreedharan] Almost working. 3d18ebc [Hari Shreedharan] [WIP] Try getting the event log download to work.
*	[SPARK-7980] [SQL] Support SQLContext.range(end)	animesh	2015-06-03	4	-2/+31
\| \| \| \| \| \| \| \| \| \| \| \|	1. range() overloaded in SQLContext.scala 2. range() modified in python sql context.py 3. Tests added accordingly in DataFrameSuite.scala and python sql tests.py Author: animesh <animesh@apache.spark> Closes #6609 from animeshbaranawal/SPARK-7980 and squashes the following commits: 935899c [animesh] SPARK-7980:python+scala changes
*	[SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0	Patrick Wendell	2015-06-03	34	-34/+61
\| \| \| \| \| \| \| \| \| \| \| \| \|	Author: Patrick Wendell <patrick@databricks.com> Closes #6328 from pwendell/spark-1.5-update and squashes the following commits: 2f42d02 [Patrick Wendell] A few more excludes 4bebcf0 [Patrick Wendell] Update to RC4 61aaf46 [Patrick Wendell] Using new release candidate 55f1610 [Patrick Wendell] Another exclude 04b4f04 [Patrick Wendell] More issues with transient 1.4 changes 36f549b [Patrick Wendell] [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0
*	[SPARK-7973] [SQL] Increase the timeout of two CliSuite tests.	Yin Huai	2015-06-03	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-7973 Author: Yin Huai <yhuai@databricks.com> Closes #6525 from yhuai/SPARK-7973 and squashes the following commits: 763b821 [Yin Huai] Also change the timeout of "Single command with -e" to 2 minutes. e598a08 [Yin Huai] Increase the timeout to 3 minutes.
*	[SPARK-7983] [MLLIB] Add require for one-based indices in loadLibSVMFile	Yuhao Yang	2015-06-03	2	-0/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	jira: https://issues.apache.org/jira/browse/SPARK-7983 Customers frequently use zero-based indices in their LIBSVM files. No warnings or errors from Spark will be reported during their computation afterwards, and usually it will lead to wired result for many algorithms (like GBDT). add a quick check. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #6538 from hhbyyh/loadSVM and squashes the following commits: 79d9c11 [Yuhao Yang] optimization as respond to comments 4310710 [Yuhao Yang] merge conflict 96460f1 [Yuhao Yang] merge conflict 20a2811 [Yuhao Yang] use require 6e4f8ca [Yuhao Yang] add check for ascending order 9956365 [Yuhao Yang] add ut for 0-based loadlibsvm exception 5bd1f9a [Yuhao Yang] add require for one-based in loadLIBSVM
*	[SPARK-7562][SPARK-6444][SQL] Improve error reporting for expression data ↵	Wenchen Fan	2015-06-03	17	-421/+583
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	type mismatch It seems hard to find a common pattern of checking types in `Expression`. Sometimes we know what input types we need(like `And`, we know we need two booleans), sometimes we just have some rules(like `Add`, we need 2 numeric types which are equal). So I defined a general interface `checkInputDataTypes` in `Expression` which returns a `TypeCheckResult`. `TypeCheckResult` can tell whether this expression passes the type checking or what the type mismatch is. This PR mainly works on apply input types checking for arithmetic and predicate expressions. TODO: apply type checking interface to more expressions. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #6405 from cloud-fan/6444 and squashes the following commits: b5ff31b [Wenchen Fan] address comments b917275 [Wenchen Fan] rebase 39929d9 [Wenchen Fan] add todo 0808fd2 [Wenchen Fan] make constrcutor of TypeCheckResult private 3bee157 [Wenchen Fan] and decimal type coercion rule for binary comparison 8883025 [Wenchen Fan] apply type check interface to CaseWhen cffb67c [Wenchen Fan] to have resolved call the data type check function 6eaadff [Wenchen Fan] add equal type constraint to EqualTo 3affbd8 [Wenchen Fan] more fixes 654d46a [Wenchen Fan] improve tests e0a3628 [Wenchen Fan] improve error message 1524ff6 [Wenchen Fan] fix style 69ca3fe [Wenchen Fan] add error message and tests c71d02c [Wenchen Fan] fix hive tests 6491721 [Wenchen Fan] use value class TypeCheckResult 7ae76b9 [Wenchen Fan] address comments cb77e4f [Wenchen Fan] Improve error reporting for expression data type mismatch
*	[SPARK-8060] Improve DataFrame Python test coverage and documentation.	Reynold Xin	2015-06-03	20	-227/+180
\| \| \| \| \| \| \| \| \| \|	Author: Reynold Xin <rxin@databricks.com> Closes #6601 from rxin/python-read-write-test-and-doc and squashes the following commits: baa8ad5 [Reynold Xin] Code review feedback. f081d47 [Reynold Xin] More documentation updates. c9902fa [Reynold Xin] [SPARK-8060] Improve DataFrame Python reader/writer interface doc and testing.
*	[SPARK-8032] [PYSPARK] Make version checking for NumPy in MLlib more robust	MechCoder	2015-06-02	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current checking does version `1.x' is less than `1.4' this will fail if x has greater than 1 digit, since x > 4, however `1.x` < `1.4` It fails in my system since I have version `1.10` :P Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes #6579 from MechCoder/np_ver and squashes the following commits: 15430f8 [MechCoder] fix syntax error 893fb7e [MechCoder] remove equal to e35f0d4 [MechCoder] minor e89376c [MechCoder] Better checking 22703dd [MechCoder] [SPARK-8032] Make version checking for NumPy in MLlib more robust
*	[SPARK-8043] [MLLIB] [DOC] update NaiveBayes and SVM examples in doc	Yuhao Yang	2015-06-02	3	-18/+14
\| \| \| \| \| \| \| \| \| \| \| \| \|	jira: https://issues.apache.org/jira/browse/SPARK-8043 I found some issues during testing the save/load examples in markdown Documents, as a part of 1.4 QA plan Author: Yuhao Yang <hhbyyh@gmail.com> Closes #6584 from hhbyyh/naiveDocExample and squashes the following commits: a01a206 [Yuhao Yang] fix for Gaussian mixture 2fb8b96 [Yuhao Yang] update NaiveBayes and SVM examples in doc
*	[MINOR] make the launcher project name consistent with others	WangTaoTheTonic	2015-06-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	I found this by chance while building spark and think it is better to keep its name consistent with other sub-projects (Spark Project *). I am not gonna file JIRA as it is a pretty small issue. Author: WangTaoTheTonic <wangtao111@huawei.com> Closes #6603 from WangTaoTheTonic/projName and squashes the following commits: 994b3ba [WangTaoTheTonic] make the project name consistent
*	[SPARK-8053] [MLLIB] renamed scalingVector to scalingVec	Joseph K. Bradley	2015-06-02	2	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \|	I searched the Spark codebase for all occurrences of "scalingVector" CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #6596 from jkbradley/scalingVec-rename and squashes the following commits: d3812f8 [Joseph K. Bradley] renamed scalingVector to scalingVec
*	[SPARK-7691] [SQL] Refactor CatalystTypeConverter to use type-specific row ↵	Josh Rosen	2015-06-02	5	-265/+382
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	accessors This patch significantly refactors CatalystTypeConverters to both clean up the code and enable these conversions to work with future Project Tungsten features. At a high level, I've reorganized the code so that all functions dealing with the same type are grouped together into type-specific subclasses of `CatalystTypeConveter`. In addition, I've added new methods that allow the Catalyst Row -> Scala Row conversions to access the Catalyst row's fields through type-specific `getTYPE()` methods rather than the generic `get()` / `Row.apply` methods. This refactoring is a blocker to being able to unit test new operators that I'm developing as part of Project Tungsten, since those operators may output `UnsafeRow` instances which don't support the generic `get()`. The stricter type usage of types here has uncovered some bugs in other parts of Spark SQL: - #6217: DescribeCommand is assigned wrong output attributes in SparkStrategies - #6218: DataFrame.describe() should cast all aggregates to String - #6400: Use output schema, not relation schema, for data source input conversion Spark SQL current has undefined behavior for what happens when you try to create a DataFrame from user-specified rows whose values don't match the declared schema. According to the `createDataFrame()` Scaladoc: > It is important to make sure that the structure of every [[Row]] of the provided RDD matches the provided schema. Otherwise, there will be runtime exception. Given this, it sounds like it's technically not a break of our API contract to fail-fast when the data types don't match. However, there appear to be many cases where we don't fail even though the types don't match. For example, `JavaHashingTFSuite.hasingTF` passes a column of integers values for a "label" column which is supposed to contain floats. This column isn't actually read or modified as part of query processing, so its actual concrete type doesn't seem to matter. In other cases, there could be situations where we have generic numeric aggregates that tolerate being called with different numeric types than the schema specified, but this can be okay due to numeric conversions. In the long run, we will probably want to come up with precise semantics for implicit type conversions / widening when converting Java / Scala rows to Catalyst rows. Until then, though, I think that failing fast with a ClassCastException is a reasonable behavior; this is the approach taken in this patch. Note that certain optimizations in the inbound conversion functions for primitive types mean that we'll probably preserve the old undefined behavior in a majority of cases. Author: Josh Rosen <joshrosen@databricks.com> Closes #6222 from JoshRosen/catalyst-converters-refactoring and squashes the following commits: 740341b [Josh Rosen] Optimize method dispatch for primitive type conversions befc613 [Josh Rosen] Add tests to document Option-handling behavior. 5989593 [Josh Rosen] Use new SparkFunSuite base in CatalystTypeConvertersSuite 6edf7f8 [Josh Rosen] Re-add convertToScala(), since a Hive test still needs it 3f7b2d8 [Josh Rosen] Initialize converters lazily so that the attributes are resolved first 6ad0ebb [Josh Rosen] Fix JavaHashingTFSuite ClassCastException 677ff27 [Josh Rosen] Fix null handling bug; add tests. 8033d4c [Josh Rosen] Fix serialization error in UserDefinedGenerator. 85bba9d [Josh Rosen] Fix wrong input data in InMemoryColumnarQuerySuite 9c0e4e1 [Josh Rosen] Remove last use of convertToScala(). ae3278d [Josh Rosen] Throw ClassCastException errors during inbound conversions. 7ca7fcb [Josh Rosen] Comments and cleanup 1e87a45 [Josh Rosen] WIP refactoring of CatalystTypeConverters
*	[SPARK-7547] [ML] Scala Example code for ElasticNet	DB Tsai	2015-06-02	7	-9/+314
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is scala example code for both linear and logistic regression. Python and Java versions are to be added. Author: DB Tsai <dbt@netflix.com> Closes #6576 from dbtsai/elasticNetExample and squashes the following commits: e7ca406 [DB Tsai] fix test 6bb6d77 [DB Tsai] fix suite and remove duplicated setMaxIter 136e0dd [DB Tsai] address feedback 1ec29d4 [DB Tsai] fix style 9462f5f [DB Tsai] add example
*	[SPARK-7387] [ML] [DOC] CrossValidator example code in Python	Ram Sriharsha	2015-06-02	2	-2/+98
\| \| \| \| \| \| \| \| \| \| \| \| \|	Author: Ram Sriharsha <rsriharsha@hw11853.local> Closes #6358 from harsha2010/SPARK-7387 and squashes the following commits: 63efda2 [Ram Sriharsha] more examples for classifier to distinguish mapreduce from spark properly aeb6bb6 [Ram Sriharsha] Python Style Fix 54a500c [Ram Sriharsha] Merge branch 'master' into SPARK-7387 615e91c [Ram Sriharsha] cleanup 204c4e3 [Ram Sriharsha] Merge branch 'master' into SPARK-7387 7246d35 [Ram Sriharsha] [SPARK-7387][ml][doc] CrossValidator example code in Python
*	[SQL] [TEST] [MINOR] Follow-up of PR #6493, use Guava API to ensure Java 6 ↵	Cheng Lian	2015-06-02	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	friendliness This is a follow-up of PR #6493, which has been reverted in branch-1.4 because it uses Java 7 specific APIs and breaks Java 6 build. This PR replaces those APIs with equivalent Guava ones to ensure Java 6 friendliness. cc andrewor14 pwendell, this should also be back ported to branch-1.4. Author: Cheng Lian <lian@databricks.com> Closes #6547 from liancheng/override-log4j and squashes the following commits: c900cfd [Cheng Lian] Addresses Shixiong's comment 72da795 [Cheng Lian] Uses Guava API to ensure Java 6 friendliness
*	[SPARK-8049] [MLLIB] drop tmp col from OneVsRest output	Xiangrui Meng	2015-06-02	2	-0/+10
\| \| \| \| \| \| \| \| \| \| \|	The temporary column should be dropped after we get the prediction column. harsha2010 Author: Xiangrui Meng <meng@databricks.com> Closes #6592 from mengxr/SPARK-8049 and squashes the following commits: 1d89107 [Xiangrui Meng] use SparkFunSuite 6ee70de [Xiangrui Meng] drop tmp col from OneVsRest output
*	[SPARK-8038] [SQL] [PYSPARK] fix Column.when() and otherwise()	Davies Liu	2015-06-02	1	-3/+28
\| \| \| \| \| \| \| \| \| \| \| \|	Thanks ogirardot, closes #6580 cc rxin JoshRosen Author: Davies Liu <davies@databricks.com> Closes #6590 from davies/when and squashes the following commits: c0f2069 [Davies Liu] fix Column.when() and otherwise()
*	[SPARK-8014] [SQL] Avoid premature metadata discovery when writing a ↵	Cheng Lian	2015-06-02	5	-32/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	HadoopFsRelation with a save mode other than Append The current code references the schema of the DataFrame to be written before checking save mode. This triggers expensive metadata discovery prematurely. For save mode other than `Append`, this metadata discovery is useless since we either ignore the result (for `Ignore` and `ErrorIfExists`) or delete existing files (for `Overwrite`) later. This PR fixes this issue by deferring metadata discovery after save mode checking. Author: Cheng Lian <lian@databricks.com> Closes #6583 from liancheng/spark-8014 and squashes the following commits: 1aafabd [Cheng Lian] Updates comments 088abaa [Cheng Lian] Avoids schema merging and partition discovery when data schema and partition schema are defined 8fbd93f [Cheng Lian] Fixes SPARK-8014
*	[SPARK-7985] [ML] [MLlib] [Docs] Remove "fittingParamMap" references. ↵	Mike Dusenberry	2015-06-02	11	-14/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Updating ML Doc "Estimator, Transformer, and Param" examples. Updating ML Doc's "Estimator, Transformer, and Param" example to use `model.extractParamMap` instead of `model.fittingParamMap`, which no longer exists. mengxr, I believe this addresses (part of) the update documentation TODO list item from [PR 5820](https://github.com/apache/spark/pull/5820). Author: Mike Dusenberry <dusenberrymw@gmail.com> Closes #6514 from dusenberrymw/Fix_ML_Doc_Estimator_Transformer_Param_Example and squashes the following commits: 6366e1f [Mike Dusenberry] Updating instances of model.extractParamMap to model.parent.extractParamMap, since the Params of the parent Estimator could possibly differ from thos of the Model. d850e0e [Mike Dusenberry] Removing all references to "fittingParamMap" throughout Spark, since it has been removed. 0480304 [Mike Dusenberry] Updating the ML Doc "Estimator, Transformer, and Param" Java example to use model.extractParamMap() instead of model.fittingParamMap(), which no longer exists. 7d34939 [Mike Dusenberry] Updating ML Doc "Estimator, Transformer, and Param" example to use model.extractParamMap instead of model.fittingParamMap, which no longer exists.
*	[SPARK-8015] [FLUME] Remove Guava dependency from flume-sink.	Marcelo Vanzin	2015-06-02	4	-7/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The minimal change would be to disable shading of Guava in the module, and rely on the transitive dependency from other libraries instead. But since Guava's use is so localized, I think it's better to just not use it instead, so I replaced that code and removed all traces of Guava from the module's build. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #6555 from vanzin/SPARK-8015 and squashes the following commits: c0ceea8 [Marcelo Vanzin] Add comments about dependency management. c38228d [Marcelo Vanzin] Add guava dep in test scope. b7a0349 [Marcelo Vanzin] Add libthrift exclusion. 6e0942d [Marcelo Vanzin] Add comment in pom. 2d79260 [Marcelo Vanzin] [SPARK-8015] [flume] Remove Guava dependency from flume-sink.
*	[SPARK-8037] [SQL] Ignores files whose name starts with dot in HadoopFsRelation	Cheng Lian	2015-06-03	3	-6/+26
\| \| \| \| \| \| \| \|	Author: Cheng Lian <lian@databricks.com> Closes #6581 from liancheng/spark-8037 and squashes the following commits: d08e97b [Cheng Lian] Ignores files whose name starts with dot in HadoopFsRelation
*	[SPARK-7432] [MLLIB] fix flaky CrossValidator doctest	Xiangrui Meng	2015-06-02	1	-10/+9
\| \| \| \| \| \| \| \| \| \|	The new test uses CV to compare `maxIter=0` and `maxIter=1`, and validate on the evaluation result. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #6572 from mengxr/SPARK-7432 and squashes the following commits: c236bb8 [Xiangrui Meng] fix flacky cv doctest
*	[SPARK-8021] [SQL] [PYSPARK] make Python read/write API consistent with Scala	Davies Liu	2015-06-02	1	-27/+94
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	add schema()/format()/options() for reader, add mode()/format()/options()/partitionBy() for writer cc rxin yhuai pwendell Author: Davies Liu <davies@databricks.com> Closes #6578 from davies/readwrite and squashes the following commits: 720d293 [Davies Liu] address comments b65dfa2 [Davies Liu] Update readwriter.py 1299ab6 [Davies Liu] make Python API consistent with Scala
*	[SPARK-8023][SQL] Add "deterministic" attribute to Expression to avoid ↵	Yin Huai	2015-06-02	6	-2/+137
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	collapsing nondeterministic projects. This closes #6570. Author: Yin Huai <yhuai@databricks.com> Author: Reynold Xin <rxin@databricks.com> Closes #6573 from rxin/deterministic and squashes the following commits: 356cd22 [Reynold Xin] Added unit test for the optimizer. da3fde1 [Reynold Xin] Merge pull request #6570 from yhuai/SPARK-8023 da56200 [Yin Huai] Comments. e38f264 [Yin Huai] Comment. f9d6a73 [Yin Huai] Add a deterministic method to Expression.
*	[SPARK-8020] [SQL] Spark SQL conf in spark-defaults.conf make metadataHive ↵	Yin Huai	2015-06-02	1	-3/+22
\| \| \| \| \| \| \| \| \| \| \| \|	get constructed too early https://issues.apache.org/jira/browse/SPARK-8020 Author: Yin Huai <yhuai@databricks.com> Closes #6571 from yhuai/SPARK-8020-1 and squashes the following commits: 0398f5b [Yin Huai] First populate the SQLConf and then construct executionHive and metadataHive.
*	[SPARK-6917] [SQL] DecimalType is not read back when non-native type exists	Davies Liu	2015-06-01	2	-1/+16
\| \| \| \| \| \| \| \| \| \| \| \| \|	cc yhuai Author: Davies Liu <davies@databricks.com> Closes #6558 from davies/decimalType and squashes the following commits: c877ca8 [Davies Liu] Update ParquetConverter.scala 48cc57c [Davies Liu] Update ParquetConverter.scala b43845c [Davies Liu] add test 3b4a94f [Davies Liu] DecimalType is not read back when non-native type exists
*	[SPARK-7582] [MLLIB] user guide for StringIndexer	Xiangrui Meng	2015-06-01	2	-0/+193
\| \| \| \| \| \| \| \| \| \| \| \| \|	This PR adds a Java unit test and user guide for `StringIndexer`. I put it before `OneHotEncoder` because they are closely related. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #6561 from mengxr/SPARK-7582 and squashes the following commits: 4bba4f1 [Xiangrui Meng] fix example ba1cd1b [Xiangrui Meng] fix style 7fa18d1 [Xiangrui Meng] add user guide for StringIndexer 136cb93 [Xiangrui Meng] add a Java unit test for StringIndexer
*	Fixed typo in the previous commit.	Reynold Xin	2015-06-01	1	-1/+1
\|
*	[SPARK-7965] [SPARK-7972] [SQL] Handle expressions containing multiple ↵	Yin Huai	2015-06-01	3	-32/+134
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	window expressions and make parser match window frames in case insensitive way JIRAs: https://issues.apache.org/jira/browse/SPARK-7965 https://issues.apache.org/jira/browse/SPARK-7972 Author: Yin Huai <yhuai@databricks.com> Closes #6524 from yhuai/7965-7972 and squashes the following commits: c12c79c [Yin Huai] Add doc for returned value. de64328 [Yin Huai] Address rxin's comments. fc9b1ad [Yin Huai] wip 2996da4 [Yin Huai] scala style 20b65b7 [Yin Huai] Handle expressions containing multiple window expressions. 9568b21 [Yin Huai] case insensitive matches 41f633d [Yin Huai] Failed test case.
*	[SPARK-8025][Streaming]Add JavaDoc style deprecation for deprecated ↵	zsxwing	2015-06-01	3	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \|	Streaming methods Scala `deprecated` annotation actually doesn't show up in JavaDoc. Author: zsxwing <zsxwing@gmail.com> Closes #6564 from zsxwing/SPARK-8025 and squashes the following commits: 2faa2bb [zsxwing] Add JavaDoc style deprecation for deprecated Streaming methods
*	Revert "[SPARK-8020] Spark SQL in spark-defaults.conf make metadataHive get ↵	Reynold Xin	2015-06-01	2	-66/+4
\| \| \| \| \| \|	constructed too early" This reverts commit 91f6be87bc5cff41ca7a9cca9fdcc4678a4e7086.
*	[SPARK-8020] Spark SQL in spark-defaults.conf make metadataHive get ↵	Yin Huai	2015-06-01	2	-4/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	constructed too early https://issues.apache.org/jira/browse/SPARK-8020 Author: Yin Huai <yhuai@databricks.com> Closes #6563 from yhuai/SPARK-8020 and squashes the following commits: 4e5addc [Yin Huai] style bf766c6 [Yin Huai] Failed test. 0398f5b [Yin Huai] First populate the SQLConf and then construct executionHive and metadataHive.
*	[minor doc] Add exploratory data analysis warning for ↵	Reynold Xin	2015-06-01	2	-0/+15
\| \| \| \| \| \| \| \| \| \|	DataFrame.stat.freqItem API Author: Reynold Xin <rxin@databricks.com> Closes #6569 from rxin/freqItemsWarning and squashes the following commits: 7eec145 [Reynold Xin] [minor doc] Add exploratory data analysis warning for DataFrame.stat.freqItem API.
*	[SPARK-8027] [SPARKR] Add maven profile to build R package docs	Shivaram Venkataraman	2015-06-01	2	-8/+31
\| \| \| \| \| \| \| \| \| \| \| \|	Also use that profile in create-release.sh cc pwendell -- Note that this means that we need `knitr` and `roxygen` installed on the machines used for building the release. Let me know if you need help with that. Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #6567 from shivaram/SPARK-8027 and squashes the following commits: 8dc8ecf [Shivaram Venkataraman] Add maven profile to build R package docs Also use that profile in create-release.sh
*	[SPARK-8026][SQL] Add Column.alias to Scala/Java DataFrame API	Reynold Xin	2015-06-01	2	-0/+18
\| \| \| \| \| \| \| \|	Author: Reynold Xin <rxin@databricks.com> Closes #6565 from rxin/alias and squashes the following commits: 286d880 [Reynold Xin] [SPARK-8026][SQL] Add Column.alias to Scala/Java DataFrame API
*	[SPARK-7982][SQL] DataFrame.stat.crosstab should use 0 instead of null for ↵	Reynold Xin	2015-06-01	2	-5/+8
\| \| \| \| \| \| \| \| \| \|	pairs that don't appear Author: Reynold Xin <rxin@databricks.com> Closes #6566 from rxin/crosstab and squashes the following commits: e0ace1c [Reynold Xin] [SPARK-7982][SQL] DataFrame.stat.crosstab should use 0 instead of null for pairs that don't appear
*	[SPARK-8028] [SPARKR] Use addJar instead of setJars in SparkR	Shivaram Venkataraman	2015-06-01	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \|	This prevents the spark.jars from being cleared while using `--packages` or `--jars` cc pwendell davies brkyvz Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #6568 from shivaram/SPARK-8028 and squashes the following commits: 3a9cf1f [Shivaram Venkataraman] Use addJar instead of setJars in SparkR This prevents the spark.jars from being cleared
*	[MINOR] [UI] Improve error message on log page	Andrew Or	2015-06-01	2	-0/+76
\| \| \| \| \|	Currently if a bad log type if specified, then we get blank. We should provide a more informative error message.
*	[SPARK-7958] [STREAMING] Handled exception in StreamingContext.start() to ↵	Tathagata Das	2015-06-01	3	-4/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	prevent leaking of actors StreamingContext.start() can throw exception because DStream.validateAtStart() fails (say, checkpoint directory not set for StateDStream). But by then JobScheduler, JobGenerator, and ReceiverTracker has already started, along with their actors. But those cannot be shutdown because the only way to do that is call StreamingContext.stop() which cannot be called as the context has not been marked as ACTIVE. The solution in this PR is to stop the internal scheduler if start throw exception, and mark the context as STOPPED. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #6559 from tdas/SPARK-7958 and squashes the following commits: 20b2ec1 [Tathagata Das] Added synchronized 790b617 [Tathagata Das] Handled exception in StreamingContext.start()
*	[SPARK-7584] [MLLIB] User guide for VectorAssembler	Xiangrui Meng	2015-06-01	2	-0/+192
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR adds a section in the user guide for `VectorAssembler` with code examples in Python/Java/Scala. It also adds a unit test in Java. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #6556 from mengxr/SPARK-7584 and squashes the following commits: 11313f6 [Xiangrui Meng] simplify Java example 0cd47f3 [Xiangrui Meng] update user guide fd36292 [Xiangrui Meng] update Java unit test ce61ca0 [Xiangrui Meng] add Java unit test for VectorAssembler e399942 [Xiangrui Meng] scala/python example code
*	[SPARK-7497] [PYSPARK] [STREAMING] fix streaming flaky tests	Davies Liu	2015-06-01	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \|	Increase the duration and timeout in streaming python tests. Author: Davies Liu <davies@databricks.com> Closes #6239 from davies/flaky_tests and squashes the following commits: d6aee8f [Davies Liu] fix window tests 26317f7 [Davies Liu] Merge branch 'master' of github.com:apache/spark into flaky_tests 7947db6 [Davies Liu] fix streaming flaky tests
*	[DOC] Minor modification to Streaming docs with regards to parallel data ↵	Nishkam Ravi	2015-06-01	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	receiving pwendell tdas Author: Nishkam Ravi <nravi@cloudera.com> Author: nishkamravi2 <nishkamravi@gmail.com> Author: nravi <nravi@c1704.halxg.cloudera.com> Closes #6544 from nishkamravi2/master_nravi and squashes the following commits: 46e8c03 [Nishkam Ravi] Slight modification to streaming docs