spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-11322] [PYSPARK] Keep full stack trace in captured exception	Liang-Chi Hsieh	2015-10-28	2	-4/+21
\| \| \| \| \| \| \| \| \| \|	JIRA: https://issues.apache.org/jira/browse/SPARK-11322 As reported by JoshRosen in [databricks/spark-redshift/issues/89](https://github.com/databricks/spark-redshift/issues/89#issuecomment-149828308), the exception-masking behavior sometimes makes debugging harder. To deal with this issue, we should keep full stack trace in the captured exception. Author: Liang-Chi Hsieh <viirya@appier.com> Closes #9283 from viirya/py-exception-stacktrace.
*	[SPARK-11292] [SQL] Python API for text data source	Reynold Xin	2015-10-28	2	-2/+27
\| \| \| \| \| \| \| \|	Adds DataFrameReader.text and DataFrameWriter.text. Author: Reynold Xin <rxin@databricks.com> Closes #9259 from rxin/SPARK-11292.
*	[SPARK-11367][ML][PYSPARK] Python LinearRegression should support setting solver	Yanbo Liang	2015-10-28	3	-22/+37
\| \| \| \| \| \| \| \|	[SPARK-10668](https://issues.apache.org/jira/browse/SPARK-10668) has provided ```WeightedLeastSquares``` solver("normal") in ```LinearRegression``` with L2 regularization in Scala and R, Python ML ```LinearRegression``` should also support setting solver("auto", "normal", "l-bfgs") Author: Yanbo Liang <ybliang8@gmail.com> Closes #9328 from yanboliang/spark-11367.
*	[SPARK-11302][MLLIB] 2) Multivariate Gaussian Model with Covariance matrix ↵	Sean Owen	2015-10-27	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	returns incorrect answer in some cases Fix computation of root-sigma-inverse in multivariate Gaussian; add a test and fix related Python mixture model test. Supersedes https://github.com/apache/spark/pull/9293 Author: Sean Owen <sowen@cloudera.com> Closes #9309 from srowen/SPARK-11302.2.
*	[SPARK-10024][PYSPARK] Python API RF and GBT related params clear up	vectorijk	2015-10-27	2	-338/+168
\| \| \| \| \| \| \| \| \|	implement {RandomForest, GBT, TreeEnsemble, TreeClassifier, TreeRegressor}Params for Python API in pyspark/ml/{classification, regression}.py Author: vectorijk <jiangkai@gmail.com> Closes #9233 from vectorijk/spark-10024.
*	[SPARK-6488][MLLIB][PYTHON] Support addition/multiplication in PySpark's ↵	Mike Dusenberry	2015-10-27	1	-0/+68
\| \| \| \| \| \| \| \| \| \|	BlockMatrix This PR adds addition and multiplication to PySpark's `BlockMatrix` class via `add` and `multiply` functions. Author: Mike Dusenberry <mwdusenb@us.ibm.com> Closes #9139 from dusenberrymw/SPARK-6488_Add_Addition_and_Multiplication_to_PySpark_BlockMatrix.
*	[SPARK-11270][STREAMING] Add improved equality testing for TopicAndPartition ↵	Nick Evans	2015-10-27	2	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	from the Kafka Streaming API jerryshao tdas I know this is kind of minor, and I know you all are busy, but this brings this class in line with the `OffsetRange` class, and makes tests a little more concise. Instead of doing something like: ``` assert topic_and_partition_instance._topic == "foo" assert topic_and_partition_instance._partition == 0 ``` You can do something like: ``` assert topic_and_partition_instance == TopicAndPartition("foo", 0) ``` Before: ``` >>> from pyspark.streaming.kafka import TopicAndPartition >>> TopicAndPartition("foo", 0) == TopicAndPartition("foo", 0) False ``` After: ``` >>> from pyspark.streaming.kafka import TopicAndPartition >>> TopicAndPartition("foo", 0) == TopicAndPartition("foo", 0) True ``` I couldn't find any tests - am I missing something? Author: Nick Evans <me@nicolasevans.org> Closes #9236 from manygrams/topic_and_partition_equality.
*	[SPARK-10271][PYSPARK][MLLIB] Added @since tags to pyspark.mllib.clustering	noelsmith	2015-10-26	1	-1/+68
\| \| \| \| \| \| \| \| \| \|	Duplicated the since decorator from pyspark.sql into pyspark (also tweaked to handle functions without docstrings). Added since to methods + "versionadded::" to classes (derived from the git file history in pyspark). Author: noelsmith <mail@noelsmith.com> Closes #8627 from noel-smith/SPARK-10271-since-mllib-clustering.
*	[SPARK-11279][PYSPARK] Add DataFrame#toDF in PySpark	Jeff Zhang	2015-10-26	1	-0/+12
\| \| \| \| \| \|	Author: Jeff Zhang <zjffdu@apache.org> Closes #9248 from zjffdu/SPARK-11279.
*	[SPARK-10277] [MLLIB] [PYSPARK] Add @since annotation to ↵	Yu ISHIKAWA	2015-10-23	1	-1/+101
\| \| \| \| \| \| \| \|	pyspark.mllib.regression Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8684 from yu-iskw/SPARK-10277.
*	[SPARK-7021] Add JUnit output for Python unit tests	Gábor Lipták	2015-10-22	5	-9/+48
\| \| \| \| \| \| \| \|	WIP Author: Gábor Lipták <gliptak@gmail.com> Closes #8323 from gliptak/SPARK-7021.
*	[SPARK-11205][PYSPARK] Delegate to scala DataFrame API rather than p…	Jeff Zhang	2015-10-20	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	…rint in python No test needed. Verify it manually in pyspark shell Author: Jeff Zhang <zjffdu@apache.org> Closes #9177 from zjffdu/SPARK-11205.
*	[MINOR][ML] fix doc warnings	Xiangrui Meng	2015-10-20	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Without an empty line, sphinx will treat doctest as docstring. holdenk ~~~ /Users/meng/src/spark/python/pyspark/ml/feature.py:docstring of pyspark.ml.feature.CountVectorizer:3: ERROR: Undefined substitution referenced: "label\|raw \|vectors \| +-----+---------------+-------------------------+ \|0 \|[a, b, c] \|(3,[0,1,2],[1.0,1.0,1.0])". /Users/meng/src/spark/python/pyspark/ml/feature.py:docstring of pyspark.ml.feature.CountVectorizer:3: ERROR: Undefined substitution referenced: "1 \|[a, b, b, c, a]\|(3,[0,1,2],[2.0,2.0,1.0])". ~~~ Author: Xiangrui Meng <meng@databricks.com> Closes #9188 from mengxr/py-count-vec-doc-fix.
*	[SPARK-10767][PYSPARK] Make pyspark shared params codegen more consistent	Holden Karau	2015-10-20	3	-65/+65
\| \| \| \| \| \| \| \|	Namely "." shows up in some places in the template when using the param docstring and not in others Author: Holden Karau <holden@pigscanfly.ca> Closes #9017 from holdenk/SPARK-10767-Make-pyspark-shared-params-codegen-more-consistent.
*	[SPARK-10269][PYSPARK][MLLIB] Add @since annotation to ↵	noelsmith	2015-10-20	1	-4/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	pyspark.mllib.classification Duplicated the since decorator from pyspark.sql into pyspark (also tweaked to handle functions without docstrings). Added since to methods + "versionadded::" to classes derived from the file history. Note - some methods are inherited from the regression module (i.e. LinearModel.intercept) so these won't have version numbers in the API docs until that model is updated. Author: noelsmith <mail@noelsmith.com> Closes #8626 from noel-smith/SPARK-10269-since-mlib-classification.
*	[SPARK-10272][PYSPARK][MLLIB] Added @since tags to pyspark.mllib.evaluation	noelsmith	2015-10-20	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \|	Duplicated the since decorator from pyspark.sql into pyspark (also tweaked to handle functions without docstrings). Added since to public methods + "versionadded::" to classes (derived from the git file history in pyspark). Note - I added also the tags to MultilabelMetrics even though it isn't declared as public in the __all__ statement... if that's incorrect - I'll remove. Author: noelsmith <mail@noelsmith.com> Closes #8628 from noel-smith/SPARK-10272-since-mllib-evalutation.
*	[SPARK-10447][SPARK-3842][PYSPARK] upgrade pyspark to py4j0.9	Holden Karau	2015-10-20	9	-57/+25
\| \| \| \| \| \| \| \| \|	Upgrade to Py4j0.9 Author: Holden Karau <holden@pigscanfly.ca> Author: Holden Karau <holden@us.ibm.com> Closes #8615 from holdenk/SPARK-10447-upgrade-pyspark-to-py4j0.9.
*	[SPARK-11114][PYSPARK] add getOrCreate for SparkContext/SQLContext in Python	Davies Liu	2015-10-19	4	-2/+59
\| \| \| \| \| \| \| \|	Also added SQLContext.newSession() Author: Davies Liu <davies@databricks.com> Closes #9122 from davies/py_create.
*	[SPARK-7018][BUILD] Refactor dev/run-tests-jenkins into Python	Brennon York	2015-10-18	1	-18/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit refactors the `run-tests-jenkins` script into Python. This refactoring was done by brennonyork in #7401; this PR contains a few minor edits from joshrosen in order to bring it up to date with other recent changes. From the original PR description (by brennonyork): Currently a few things are left out that, could and I think should, be smaller JIRA's after this. 1. There are still a few areas where we use environment variables where we don't need to (like `CURRENT_BLOCK`). I might get around to fixing this one in lieu of everything else, but wanted to point that out. 2. The PR tests are still written in bash. I opted to not change those and just rewrite the runner into Python. This is a great follow-on JIRA IMO. 3. All of the linting scripts are still in bash as well and would likely do to just add those in as follow-on JIRA's as well. Closes #7401. Author: Brennon York <brennon.york@capitalone.com> Closes #9161 from JoshRosen/run-tests-jenkins-refactoring.
*	[SPARK-11158][SQL] Modified _verify_type() to be more informative on Errors ↵	Mahmoud Lababidi	2015-10-18	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	by presenting the Object The _verify_type() function had Errors that were raised when there were Type conversion issues but left out the Object in question. The Object is now added in the Error to reduce the strain on the user to debug through to figure out the Object that failed the Type conversion. The use case for me was a Pandas DataFrame that contained 'nan' as values for columns of Strings. Author: Mahmoud Lababidi <mahmoud@thehumangeo.com> Author: Mahmoud Lababidi <lababidi@gmail.com> Closes #9149 from lababidi/master.
*	[SPARK-10185] [SQL] Feat sql comma separated paths	Koert Kuipers	2015-10-17	2	-1/+15
\| \| \| \| \| \| \| \|	Make sure comma-separated paths get processed correcly in ResolvedDataSource for a HadoopFsRelationProvider Author: Koert Kuipers <koert@tresata.com> Closes #8416 from koertkuipers/feat-sql-comma-separated-paths.
*	[SPARK-11084] [ML] [PYTHON] Check if index can contain non-zero value before ↵	zero323	2015-10-16	2	-2/+12
\| \| \| \| \| \| \| \| \| \|	binary search At this moment `SparseVector.__getitem__` executes `np.searchsorted` first and checks if result is in an expected range after that. It is possible to check if index can contain non-zero value before executing `np.searchsorted`. Author: zero323 <matthew.szymkiewicz@gmail.com> Closes #9098 from zero323/sparse_vector_getitem_improved.
*	[SPARK-11050] [MLLIB] PySpark SparseVector can return wrong index in e…	Bhargav Mangipudi	2015-10-16	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \|	…rror message For negative indices in the SparseVector, we update the index value. If we have an incorrect index at this point, the error message has the incorrect updated index instead of the original one. This change contains the fix for the same. Author: Bhargav Mangipudi <bhargav.mangipudi@gmail.com> Closes #9069 from bhargav/spark-10759.
*	[PYTHON] [MINOR] List modules in PySpark tests when given bad name	Joseph K. Bradley	2015-10-13	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	Output list of supported modules for python tests in error message when given bad module name. CC: davies Author: Joseph K. Bradley <joseph@databricks.com> Closes #9088 from jkbradley/python-tests-modules.
*	[SPARK-8170] [PYTHON] Add signal handler to trap Ctrl-C in pyspark and ↵	Ashwin Shankar	2015-10-12	1	-0/+7
\| \| \| \| \| \| \| \| \| \|	cancel all running jobs This patch adds a signal handler to trap Ctrl-C and cancels running job. Author: Ashwin Shankar <ashankar@netflix.com> Closes #9033 from ashwinshankar77/master.
*	[SPARK-10535] Sync up API for matrix factorization model between Scala and ↵	Vladimir Vladimirov	2015-10-09	1	-4/+28
\| \| \| \| \| \| \| \| \| \|	PySpark Support for recommendUsersForProducts and recommendProductsForUsers in matrix factorization model for PySpark Author: Vladimir Vladimirov <vladimir.vladimirov@magnetic.com> Closes #8700 from smartkiwi/SPARK-10535_.
*	[SPARK-10959] [PYSPARK] StreamingLogisticRegressionWithSGD does not train ↵	Bryan Cutler	2015-10-08	2	-2/+3
\| \| \| \| \| \| \| \| \| \|	with given regParam and convergenceTol parameters These params were being passed into the StreamingLogisticRegressionWithSGD constructor, but not transferred to the call for model training. Same with StreamingLinearRegressionWithSGD. I added the params as named arguments to the call and also fixed the intercept parameter, which was being passed as regularization value. Author: Bryan Cutler <bjcutler@us.ibm.com> Closes #9002 from BryanCutler/StreamingSGD-convergenceTol-bug-10959.
*	[SPARK-10973] [ML] [PYTHON] __gettitem__ method throws IndexError exception ↵	zero323	2015-10-08	2	-5/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	when we… __gettitem__ method throws IndexError exception when we try to access index after the last non-zero entry from pyspark.mllib.linalg import Vectors sv = Vectors.sparse(5, {1: 3}) sv[0] ## 0.0 sv[1] ## 3.0 sv[2] ## Traceback (most recent call last): ## File "<stdin>", line 1, in <module> ## File "/python/pyspark/mllib/linalg/__init__.py", line 734, in __getitem__ ## row_ind = inds[insert_index] ## IndexError: index out of bounds Author: zero323 <matthew.szymkiewicz@gmail.com> Closes #9009 from zero323/sparse_vector_index_error.
*	[SPARK-9774] [ML] [PYSPARK] Add python api for ml regression isotonicregression	Holden Karau	2015-10-07	3	-1/+149
\| \| \| \| \| \| \| \|	Add the Python API for isotonicregression. Author: Holden Karau <holden@pigscanfly.ca> Closes #8214 from holdenk/SPARK-9774-add-python-api-for-ml-regression-isotonicregression.
*	[SPARK-10779] [PYSPARK] [MLLIB] Set initialModel for KMeans model in PySpark ↵	Evan Chen	2015-10-07	1	-2/+15
\| \| \| \| \| \| \| \| \| \|	(spark.mllib) Provide initialModel param for pyspark.mllib.clustering.KMeans Author: Evan Chen <chene@us.ibm.com> Closes #8967 from evanyc15/SPARK-10779-pyspark-mllib.
*	[SPARK-10957] [ML] setParams changes quantileProbabilities unexpectly in ↵	Xiangrui Meng	2015-10-06	1	-5/+1
\| \| \| \| \| \| \| \| \| \|	PySpark's AFTSurvivalRegression If user doesn't specify `quantileProbs` in `setParams`, it will get reset to the default value. We don't need special handling here. vectorijk yanboliang Author: Xiangrui Meng <meng@databricks.com> Closes #9001 from mengxr/SPARK-10957.
*	[SPARK-10688] [ML] [PYSPARK] Python API for AFTSurvivalRegression	vectorijk	2015-10-06	1	-2/+169
\| \| \| \| \| \| \| \|	Implement Python API for AFTSurvivalRegression Author: vectorijk <jiangkai@gmail.com> Closes #8926 from vectorijk/spark-10688.
*	[SPARK-10782] [PYTHON] Update dropDuplicates documentation	asokadiggs	2015-09-29	1	-0/+2
\| \| \| \| \| \| \| \|	Documentation for dropDuplicates() and drop_duplicates() is one and the same. Resolved the error in the example for drop_duplicates using the same approach used for groupby and groupBy, by indicating that dropDuplicates and drop_duplicates are aliases. Author: asokadiggs <asoka.diggs@intel.com> Closes #8930 from asokadiggs/jira-10782.
*	[SPARK-6919] [PYSPARK] Add asDict method to StatCounter	Erik Shilts	2015-09-29	2	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \|	Add method to easily convert a StatCounter instance into a Python dict https://issues.apache.org/jira/browse/SPARK-6919 Note: This is my original work and the existing Spark license applies. Author: Erik Shilts <erik.shilts@opower.com> Closes #5516 from eshilts/statcounter-asdict.
*	[SPARK-10415] [PYSPARK] [MLLIB] [DOCS] Enhance Navigation Sidebar in PySpark API	noelsmith	2015-09-29	4	-2/+197
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These are CSS/JavaScript changes changes to make navigation in the PySpark API a bit simpler by adding the following to the sidebar: * Classes * Functions * Tags to highlight experimental features ![screen shot 2015-09-02 at 08 50 12](https://cloud.githubusercontent.com/assets/11915197/9634781/301f853a-518b-11e5-8d5c-fda202f6202f.png) Online example here: https://dl.dropboxusercontent.com/u/20821334/pyspark-api-nav-enhance/pyspark.mllib.html (The contribution is my original work and that I license the work to the project under the project's open source license) Author: noelsmith <mail@noelsmith.com> Closes #8571 from noel-smith/pyspark-api-nav-enhance.
*	[SPARK-9681] [ML] Support R feature interactions in RFormula	Eric Liang	2015-09-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This integrates the Interaction feature transformer with SparkR R formula support (i.e. support `:`). To generate reasonable ML attribute names for feature interactions, it was necessary to add the ability to read attribute the original attribute names back from `StructField`, and also to specify custom group prefixes in `VectorAssembler`. This also has the side-benefit of cleaning up the double-underscores in the attributes generated for non-interaction terms. mengxr Author: Eric Liang <ekl@databricks.com> Closes #8830 from ericl/interaction-2.
*	[SPARK-10731] [SQL] Delegate to Scala's DataFrame.take implementation in ↵	Reynold Xin	2015-09-23	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Python DataFrame. Python DataFrame.head/take now requires scanning all the partitions. This pull request changes them to delegate the actual implementation to Scala DataFrame (by calling DataFrame.take). This is more of a hack for fixing this issue in 1.5.1. A more proper fix is to change executeCollect and executeTake to return InternalRow rather than Row, and thus eliminate the extra round-trip conversion. Author: Reynold Xin <rxin@databricks.com> Closes #8876 from rxin/SPARK-10731.
*	[SPARK-10446][SQL] Support to specify join type when calling join with ↵	Liang-Chi Hsieh	2015-09-21	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \|	usingColumns JIRA: https://issues.apache.org/jira/browse/SPARK-10446 Currently the method `join(right: DataFrame, usingColumns: Seq[String])` only supports inner join. It is more convenient to have it support other join types. Author: Liang-Chi Hsieh <viirya@appier.com> Closes #8600 from viirya/usingcolumns_df.
*	[SPARK-10577] [PYSPARK] DataFrame hint for broadcast join	Jian Feng	2015-09-21	2	-0/+27
\| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-10577 Author: Jian Feng <jzhang.chs@gmail.com> Closes #8801 from Jianfeng-chs/master.
*	[SPARK-10716] [BUILD] spark-1.5.0-bin-hadoop2.6.tgz file doesn't uncompress ↵	Sean Owen	2015-09-21	1	-0/+0
\| \| \| \| \| \| \| \| \| \|	on OS X due to hidden file Remove ._SUCCESS.crc hidden file that may cause problems in distribution tar archive, and is not used Author: Sean Owen <sowen@cloudera.com> Closes #8846 from srowen/SPARK-10716.
*	[SPARK-9821] [PYSPARK] pyspark-reduceByKey-should-take-a-custom-partitioner	Holden Karau	2015-09-21	1	-13/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	from the issue: In Scala, I can supply a custom partitioner to reduceByKey (and other aggregation/repartitioning methods like aggregateByKey and combinedByKey), but as far as I can tell from the Pyspark API, there's no way to do the same in Python. Here's an example of my code in Scala: weblogs.map(s => (getFileType(s), 1)).reduceByKey(new FileTypePartitioner(),_+_) But I can't figure out how to do the same in Python. The closest I can get is to call repartition before reduceByKey like so: weblogs.map(lambda s: (getFileType(s), 1)).partitionBy(3,hash_filetype).reduceByKey(lambda v1,v2: v1+v2).collect() But that defeats the purpose, because I'm shuffling twice instead of once, so my performance is worse instead of better. Author: Holden Karau <holden@pigscanfly.ca> Closes #8569 from holdenk/SPARK-9821-pyspark-reduceByKey-should-take-a-custom-partitioner.
*	[DOC] [PYSPARK] [MLLIB] Added newlines to docstrings to fix parameter formatting	noelsmith	2015-09-21	8	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added newlines before `:param ...:` and `:return:` markup. Without these, parameter lists aren't formatted correctly in the API docs. I.e: ![screen shot 2015-09-21 at 21 49 26](https://cloud.githubusercontent.com/assets/11915197/10004686/de3c41d4-60aa-11e5-9c50-a46dcb51243f.png) .. looks like this once newline is added: ![screen shot 2015-09-21 at 21 50 14](https://cloud.githubusercontent.com/assets/11915197/10004706/f86bfb08-60aa-11e5-8524-ae4436713502.png) Author: noelsmith <mail@noelsmith.com> Closes #8851 from noel-smith/docstring-missing-newline-fix.
*	[SPARK-9769] [ML] [PY] add python api for countvectorizermodel	Holden Karau	2015-09-21	1	-6/+142
\| \| \| \| \| \| \| \|	From JIRA: Add Python API, user guide and example for ml.feature.CountVectorizerModel Author: Holden Karau <holden@pigscanfly.ca> Closes #8561 from holdenk/SPARK-9769-add-python-api-for-countvectorizermodel.
*	[SPARK-10631] [DOCUMENTATION, MLLIB, PYSPARK] Added documentation for few APIs	vinodkc	2015-09-20	1	-5/+17
\| \| \| \| \| \| \| \|	There are some missing API docs in pyspark.mllib.linalg.Vector (including DenseVector and SparseVector). We should add them based on their Scala counterparts. Author: vinodkc <vinod.kc.in@gmail.com> Closes #8834 from vinodkc/fix_SPARK-10631.
*	[SPARK-10710] Remove ability to disable spilling in core and SQL	Josh Rosen	2015-09-19	3	-60/+8
\| \| \| \| \| \| \| \| \| \|	It does not make much sense to set `spark.shuffle.spill` or `spark.sql.planner.externalSort` to false: I believe that these configurations were initially added as "escape hatches" to guard against bugs in the external operators, but these operators are now mature and well-tested. In addition, these configurations are not handled in a consistent way anymore: SQL's Tungsten codepath ignores these configurations and will continue to use spilling operators. Similarly, Spark Core's `tungsten-sort` shuffle manager does not respect `spark.shuffle.spill=false`. This pull request removes these configurations, adds warnings at the appropriate places, and deletes a large amount of code which was only used in code paths that did not support spilling. Author: Josh Rosen <joshrosen@databricks.com> Closes #8831 from JoshRosen/remove-ability-to-disable-spilling.
*	[SPARK-10615] [PYSPARK] change assertEquals to assertEqual	Yanbo Liang	2015-09-18	4	-99/+99
\| \| \| \| \| \| \| \|	As ```assertEquals``` is deprecated, so we need to change ```assertEquals``` to ```assertEqual``` for existing python unit tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes #8814 from yanboliang/spark-10615.
*	[SPARK-10642] [PYSPARK] Fix crash when calling rdd.lookup() on tuple keys	Liang-Chi Hsieh	2015-09-17	1	-1/+4
\| \| \| \| \| \| \| \| \| \|	JIRA: https://issues.apache.org/jira/browse/SPARK-10642 When calling `rdd.lookup()` on a RDD with tuple keys, `portable_hash` will return a long. That causes `DAGScheduler.submitJob` to throw `java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer`. Author: Liang-Chi Hsieh <viirya@appier.com> Closes #8796 from viirya/fix-pyrdd-lookup.
*	[SPARK-10282] [ML] [PYSPARK] [DOCS] Add @since annotation to ↵	Yu ISHIKAWA	2015-09-17	1	-0/+28
\| \| \| \| \| \| \| \|	pyspark.ml.recommendation Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8692 from yu-iskw/SPARK-10282.
*	[SPARK-10274] [MLLIB] Add @since annotation to pyspark.mllib.fpm	Yu ISHIKAWA	2015-09-17	1	-1/+9
\| \| \| \| \| \|	Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8665 from yu-iskw/SPARK-10274.
*	[SPARK-10279] [MLLIB] [PYSPARK] [DOCS] Add @since annotation to ↵	Yu ISHIKAWA	2015-09-17	1	-2/+26
\| \| \| \| \| \| \| \|	pyspark.mllib.util Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8689 from yu-iskw/SPARK-10279.