aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
...
* Updated DataFrame.saveAsTable Hive warning to include SPARK-7550 ticket.Reynold Xin2015-05-111-6/+6
| | | | | | | | | | | | | So users that are interested in this can track it easily. Author: Reynold Xin <rxin@databricks.com> Closes #6067 from rxin/SPARK-7550 and squashes the following commits: ee0e34c [Reynold Xin] Updated DataFrame.saveAsTable Hive warning to include SPARK-7550 ticket. (cherry picked from commit 87229c95c6b597f5b84e36d518b9830e3ba63424) Signed-off-by: Reynold Xin <rxin@databricks.com>
* [SPARK-7462][SQL] Update documentation for retaining grouping columns in ↵Reynold Xin2015-05-113-3/+73
| | | | | | | | | | | | | DataFrames. Author: Reynold Xin <rxin@databricks.com> Closes #6062 from rxin/agg-retain-doc and squashes the following commits: 43e511e [Reynold Xin] [SPARK-7462][SQL] Update documentation for retaining grouping columns in DataFrames. (cherry picked from commit 3a9b6997df3fef1052d8c410f32319018c52acff) Signed-off-by: Reynold Xin <rxin@databricks.com>
* [SPARK-7084] improve saveAsTable documentationmadhukar2015-05-111-0/+18
| | | | | | | | | | | | | | | Author: madhukar <phatak.dev@gmail.com> Closes #5654 from phatak-dev/master and squashes the following commits: 386f407 [madhukar] #5654 updated for all the methods 2c997c5 [madhukar] Merge branch 'master' of https://github.com/apache/spark 00bc819 [madhukar] Merge branch 'master' of https://github.com/apache/spark 2a802c6 [madhukar] #5654 updated the doc according to comments 866e8df [madhukar] [SPARK-7084] improve saveAsTable documentation (cherry picked from commit 57255dcd794222f4db5df1e549ebc7b896cebfdc) Signed-off-by: Reynold Xin <rxin@databricks.com>
* [SQL] Show better error messages for incorrect join types in DataFrames.Reynold Xin2015-05-111-0/+10
| | | | | | | | | | | | | As a follow-up to https://github.com/apache/spark/pull/5944 Author: Reynold Xin <rxin@databricks.com> Closes #6064 from rxin/jointype-better-error and squashes the following commits: 7629bf7 [Reynold Xin] [SQL] Show better error messages for incorrect join types in DataFrames. (cherry picked from commit 4f4dbb030c208caba18f314a1ef1751696627d26) Signed-off-by: Reynold Xin <rxin@databricks.com>
* Update Documentation: leftsemi instead of semijoinLCY Vincent2015-05-111-1/+1
| | | | | | | | | | | | | | should sync up with here? https://github.com/apache/spark/blob/119f45d61d7b48d376cca05e1b4f0c7fcf65bfa8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/joinTypes.scala#L26 Author: LCY Vincent <lauchunyin@gmail.com> Closes #5944 from vincentlaucy/master and squashes the following commits: fc0e454 [LCY Vincent] Update DataFrame.scala (cherry picked from commit a8ea09683acc071cd81b244e8d2b7d9638b1aced) Signed-off-by: Reynold Xin <rxin@databricks.com>
* [STREAMING] [MINOR] Close files correctly when iterator is finished in ↵jerryshao2015-05-111-2/+3
| | | | | | | | | | | | | | | streaming WAL recovery Currently there's no chance to close the file correctly after the iteration is finished, change to `CompletionIterator` to avoid resource leakage. Author: jerryshao <saisai.shao@intel.com> Closes #6050 from jerryshao/close-file-correctly and squashes the following commits: 52dfaf5 [jerryshao] Close files correctly when iterator is finished (cherry picked from commit 25c01c54840a9ab768f8b917de7edc2bc2d61b9e) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
* [SPARK-7516] [Minor] [DOC] Replace depreciated inferSchema() with ↵gchen2015-05-111-1/+1
| | | | | | | | | | | | | | | | | | | | createDataFrame() JIRA: https://issues.apache.org/jira/browse/SPARK-7516 In sql-programming-guide, deprecated python data frame api inferSchema() should be replaced by createDataFrame(): schemaPeople = sqlContext.inferSchema(people) -> schemaPeople = sqlContext.createDataFrame(people) Author: gchen <chenguancheng@gmail.com> Closes #6041 from gchen/python-docs and squashes the following commits: c27eb7c [gchen] replace inferSchema() with createDataFrame() (cherry picked from commit 8e674331d9ce98068b44e4d483b6d35cef0648fa) Signed-off-by: Reynold Xin <rxin@databricks.com>
* [SPARK-7508] JettyUtils-generated servlets to log & report all errorsSteve Loughran2015-05-111-0/+6
| | | | | | | | | | | | | | | | | | | | | | | Patch for SPARK-7508 This logs warn then generates a response which include the message body and stack trace as text/plain, no-cache. The exit code is 500. In practise (in some tests in SPARK-1537 to be precise), jetty is getting in between this servlet and the web response the user sees —the body of the response is lost for any error response (500, even 404 and bad request). The standard Jetty handlers must be getting in the way. This patch doesn't address that, it ensures that 1. if the jetty handlers were put to one side the users would see the errors 2. at least the exceptions appear in the server-side logs. This is better to users saying "I saw a 500 error" and you not having anything in the logs to see what went wrong. Author: Steve Loughran <stevel@hortonworks.com> Closes #6033 from steveloughran/stevel/feature/SPARK-7508-JettyUtils and squashes the following commits: 584836f [Steve Loughran] SPARK-7508 drop trailing semicolon ad6f185 [Steve Loughran] SPARK-7508: jetty handles exception reporting itself; spark just sets this up and logs exceptions before being relayed 258d9f9 [Steve Loughran] SPARK-7508 fix typo manually-edited before patch pushed 69c8263 [Steve Loughran] SPARK-7508 JettyUtils-generated servlets to log & report all errors
* [SPARK-7462] By default retain group by columns in aggregateReynold Xin2015-05-1110-172/+218
| | | | | | | | | | | | | | | | | | | | Updated Java, Scala, Python, and R. Author: Reynold Xin <rxin@databricks.com> Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #5996 from rxin/groupby-retain and squashes the following commits: aac7119 [Reynold Xin] Merge branch 'groupby-retain' of github.com:rxin/spark into groupby-retain f6858f6 [Reynold Xin] Merge branch 'master' into groupby-retain 5f923c0 [Reynold Xin] Merge pull request #15 from shivaram/sparkr-groupby-retrain c1de670 [Shivaram Venkataraman] Revert workaround in SparkR to retain grouped cols Based on reverting code added in commit https://github.com/amplab-extras/spark/commit/9a6be746efc9fafad88122fa2267862ef87aa0e1 b8b87e1 [Reynold Xin] Fixed DataFrameJoinSuite. d910141 [Reynold Xin] Updated rest of the files 1e6e666 [Reynold Xin] [SPARK-7462] By default retain group by columns in aggregate (cherry picked from commit 0a4844f90a712e796c9404b422cea76d21a5d2e3) Signed-off-by: Reynold Xin <rxin@databricks.com>
* [SPARK-7361] [STREAMING] Throw unambiguous exception when attempting to ↵Tathagata Das2015-05-112-8/+58
| | | | | | | | | | | | | | | | | | start multiple StreamingContexts in the same JVM Currently attempt to start a streamingContext while another one is started throws a confusing exception that the action name JobScheduler is already registered. Instead its best to throw a proper exception as it is not supported. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #5907 from tdas/SPARK-7361 and squashes the following commits: fb81c4a [Tathagata Das] Fix typo a9cd5bb [Tathagata Das] Added startSite to StreamingContext 5fdfc0d [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-7361 5870e2b [Tathagata Das] Added check for multiple streaming contexts (cherry picked from commit 1b46556999ca126cb593ef052d24afcb75383223) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
* [SPARK-7522] [EXAMPLES] Removed angle brackets from dataFormat optionBryan Cutler2015-05-115-5/+5
| | | | | | | | | | | | | As is, to specify this option on command line, you have to escape the angle brackets. Author: Bryan Cutler <bjcutler@us.ibm.com> Closes #6049 from BryanCutler/dataFormat-option-7522 and squashes the following commits: b34afb4 [Bryan Cutler] [SPARK-7522] Removed angle brackets from dataFormat option (cherry picked from commit 4f8a15519267ac205424270155254382cc2d3690) Signed-off-by: Xiangrui Meng <meng@databricks.com>
* [SPARK-6092] [MLLIB] Add RankingMetrics in PySpark/MLlibYanbo Liang2015-05-112-2/+86
| | | | | | | | | | | | Author: Yanbo Liang <ybliang8@gmail.com> Closes #6044 from yanboliang/spark-6092 and squashes the following commits: 726a9b1 [Yanbo Liang] add newRankingMetrics 33f649c [Yanbo Liang] Add RankingMetrics in PySpark/MLlib (cherry picked from commit 042dda3c5c25b5ecb6ae4fd37c85b211b01c187b) Signed-off-by: Xiangrui Meng <meng@databricks.com>
* [SPARK-7326] [STREAMING] Performing window() on a WindowedDStream doesn't ↵Wesley Miao2015-05-113-8/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | work all the time tdas https://issues.apache.org/jira/browse/SPARK-7326 The problem most likely resides in DStream.slice() implementation, as shown below. def slice(fromTime: Time, toTime: Time): Seq[RDD[T]] = { if (!isInitialized) { throw new SparkException(this + " has not been initialized") } if (!(fromTime - zeroTime).isMultipleOf(slideDuration)) { logWarning("fromTime (" + fromTime + ") is not a multiple of slideDuration (" + slideDuration + ")") } if (!(toTime - zeroTime).isMultipleOf(slideDuration)) { logWarning("toTime (" + fromTime + ") is not a multiple of slideDuration (" + slideDuration + ")") } val alignedToTime = toTime.floor(slideDuration, zeroTime) val alignedFromTime = fromTime.floor(slideDuration, zeroTime) logInfo("Slicing from " + fromTime + " to " + toTime + " (aligned to " + alignedFromTime + " and " + alignedToTime + ")") alignedFromTime.to(alignedToTime, slideDuration).flatMap(time => { if (time >= zeroTime) getOrCompute(time) else None }) } Here after performing floor() on both fromTime and toTime, the result (alignedFromTime - zeroTime) and (alignedToTime - zeroTime) may no longer be multiple of the slidingDuration, thus making isTimeValid() check failed for all the remaining computation. The fix is to add a new floor() function in Time.scala to respect the zeroTime while performing the floor : def floor(that: Duration, zeroTime: Time): Time = { val t = that.milliseconds new Time(((this.millis - zeroTime.milliseconds) / t) * t + zeroTime.milliseconds) } And then change the DStream.slice to call this new floor function by passing in its zeroTime. val alignedToTime = toTime.floor(slideDuration, zeroTime) val alignedFromTime = fromTime.floor(slideDuration, zeroTime) This way the alignedToTime and alignedFromTime are *really* aligned in respect to zeroTime whose value is not really a 0. Author: Wesley Miao <wesley.miao@gmail.com> Author: Wesley <wesley.miao@autodesk.com> Closes #5871 from wesleymiao/spark-7326 and squashes the following commits: 82a4d8c [Wesley Miao] [SPARK-7326] [STREAMING] Performing window() on a WindowedDStream dosen't work all the time 48b4dc0 [Wesley] [SPARK-7326] [STREAMING] Performing window() on a WindowedDStream doesn't work all the time 6ade399 [Wesley] [SPARK-7326] [STREAMING] Performing window() on a WindowedDStream doesn't work all the time 2611745 [Wesley Miao] [SPARK-7326] [STREAMING] Performing window() on a WindowedDStream doesn't work all the time (cherry picked from commit d70a076892e0677acceccaba665908cdf664f1b4) Signed-off-by: Sean Owen <sowen@cloudera.com>
* [SPARK-7519] [SQL] fix minor bugs in thrift server UItianyi2015-05-112-6/+8
| | | | | | | | | | | | | | | | | Bugs description: 1. There are extra commas on the top of session list. 2. The format of time in "Start at:" part is not the same as others. 3. The total number of online sessions is wrong. Author: tianyi <tianyi.asiainfo@gmail.com> Closes #6048 from tianyi/SPARK-7519 and squashes the following commits: ed366b7 [tianyi] fix bug (cherry picked from commit 2242ab31e99227a102b0918d73db67e99899fd24) Signed-off-by: Cheng Lian <lian@databricks.com>
* [SPARK-7512] [SPARKR] Fix RDD's show method to use getJRDDShivaram Venkataraman2015-05-101-2/+2
| | | | | | | | | | | | | | | | | | Since the RDD object might be a Pipelined RDD we should use `getJRDD` to get the right handle to the Java object. Fixes the bug reported at http://stackoverflow.com/questions/30057702/sparkr-filterrdd-and-flatmap-not-working cc concretevitamin Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #6035 from shivaram/sparkr-show-bug and squashes the following commits: d70145c [Shivaram Venkataraman] Fix RDD's show method to use getJRDD Fixes the bug reported at http://stackoverflow.com/questions/30057702/sparkr-filterrdd-and-flatmap-not-working (cherry picked from commit 0835f1edd4c9c05439df85c248faf6787d45f7b7) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
* [SPARK-7427] [PYSPARK] Make sharedParams match in Scala, PythonGlenn Weidner2015-05-104-22/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Modified 2 files: python/pyspark/ml/param/_shared_params_code_gen.py python/pyspark/ml/param/shared.py Generated shared.py on Linux using Python 2.6.6 on Redhat Enterprise Linux Server 6.6. python _shared_params_code_gen.py > shared.py Only changed maxIter, regParam, rawPredictionCol based on strings from SharedParamsCodeGen.scala. Note warning was displayed when committing shared.py: warning: LF will be replaced by CRLF in python/pyspark/ml/param/shared.py. Author: Glenn Weidner <gweidner@us.ibm.com> Closes #6023 from gweidner/br-7427 and squashes the following commits: db72e32 [Glenn Weidner] [SPARK-7427] [PySpark] Make sharedParams match in Scala, Python 825e4a9 [Glenn Weidner] [SPARK-7427] [PySpark] Make sharedParams match in Scala, Python e6a865e [Glenn Weidner] [SPARK-7427] [PySpark] Make sharedParams match in Scala, Python 1eee702 [Glenn Weidner] Merge remote-tracking branch 'upstream/master' 1ac10e5 [Glenn Weidner] Merge remote-tracking branch 'upstream/master' cafd104 [Glenn Weidner] Merge remote-tracking branch 'upstream/master' 9bea1eb [Glenn Weidner] Merge remote-tracking branch 'upstream/master' 4a35c20 [Glenn Weidner] Merge remote-tracking branch 'upstream/master' 9790cbe [Glenn Weidner] Merge remote-tracking branch 'upstream/master' d9c30f4 [Glenn Weidner] [SPARK-7275] [SQL] [WIP] Make LogicalRelation public (cherry picked from commit c5aca0c27be31e94ffdb01ef2eb29d3b373d7f4c) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
* [SPARK-5521] PCA wrapper for easy transform vectorsKirill A. Korinskiy2015-05-104-2/+213
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I implement a simple PCA wrapper for easy transform of vectors by PCA for example LabeledPoint or another complicated structure. Example of usage: ``` import org.apache.spark.mllib.regression.LinearRegressionWithSGD import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.feature.PCA val data = sc.textFile("data/mllib/ridge-data/lpsa.data").map { line => val parts = line.split(',') LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).split(' ').map(_.toDouble))) }.cache() val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L) val training = splits(0).cache() val test = splits(1) val pca = PCA.create(training.first().features.size/2, data.map(_.features)) val training_pca = training.map(p => p.copy(features = pca.transform(p.features))) val test_pca = test.map(p => p.copy(features = pca.transform(p.features))) val numIterations = 100 val model = LinearRegressionWithSGD.train(training, numIterations) val model_pca = LinearRegressionWithSGD.train(training_pca, numIterations) val valuesAndPreds = test.map { point => val score = model.predict(point.features) (score, point.label) } val valuesAndPreds_pca = test_pca.map { point => val score = model_pca.predict(point.features) (score, point.label) } val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean() val MSE_pca = valuesAndPreds_pca.map{case(v, p) => math.pow((v - p), 2)}.mean() println("Mean Squared Error = " + MSE) println("PCA Mean Squared Error = " + MSE_pca) ``` Author: Kirill A. Korinskiy <catap@catap.ru> Author: Joseph K. Bradley <joseph@databricks.com> Closes #4304 from catap/pca and squashes the following commits: 501bcd9 [Joseph K. Bradley] Small updates: removed k from Java-friendly PCA fit(). In PCASuite, converted results to set for comparison. Added an error message for bad k in PCA. 9dcc02b [Kirill A. Korinskiy] [SPARK-5521] fix scala style 1892a06 [Kirill A. Korinskiy] [SPARK-5521] PCA wrapper for easy transform vectors (cherry picked from commit 8c07c75c9831d6c34f69fe840edb6470d4dfdfef) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
* [SPARK-7431] [ML] [PYTHON] Made CrossValidatorModel call parent init in PySparkJoseph K. Bradley2015-05-103-3/+4
| | | | | | | | | | | | | | | | Fixes bug with PySpark cvModel not having UID Also made small PySpark fixes: Evaluator should inherit from Params. MockModel should inherit from Model. CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #5968 from jkbradley/pyspark-cv-uid and squashes the following commits: 57f13cd [Joseph K. Bradley] Made CrossValidatorModel call parent init in PySpark (cherry picked from commit 3038443e58b9320c56f7785d9e36d4f85a563e6b) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
* [MINOR] [SQL] Fixes variable name typoCheng Lian2015-05-108-9/+9
| | | | | | | | | | | | | | | <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/6038) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #6038 from liancheng/fix-typo and squashes the following commits: 572c2a4 [Cheng Lian] Fixes variable name typo (cherry picked from commit 6bf9352fa5d740d01ffdafbbb23d9732752a8d87) Signed-off-by: Cheng Lian <lian@databricks.com>
* [SPARK-7345][SQL] Spark cannot detect renamed columns using JDBC connectorOleg Sidorkin2015-05-102-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue appears when one tries to create DataFrame using sqlContext.load("jdbc"...) statement when "dbtable" contains query with renamed columns. If original column is used in SQL query once the resulting DataFrame will contain non-renamed column. If original column is used in SQL query several times with different aliases, sqlContext.load will fail. Original implementation of JDBCRDD.resolveTable uses getColumnName to detect column names in RDD schema. Suggested implementation uses getColumnLabel to handle column renames in SQL statement which is aware of SQL "AS" statement. Readings: http://stackoverflow.com/questions/4271152/getcolumnlabel-vs-getcolumnname http://stackoverflow.com/questions/12259829/jdbc-getcolumnname-getcolumnlabel-db2 Official documentation unfortunately a bit misleading in definition of "suggested title" purpose however clearly defines behavior of AS keyword in SQL statement. http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html getColumnLabel - Gets the designated column's suggested title for use in printouts and displays. The suggested title is usually specified by the SQL AS clause. If a SQL AS is not specified, the value returned from getColumnLabel will be the same as the value returned by the getColumnName method. Author: Oleg Sidorkin <oleg.sidorkin@gmail.com> Closes #6032 from osidorkin/master and squashes the following commits: 10fc44b [Oleg Sidorkin] [SPARK-7345][SQL] Regression test for JDBCSuite (resolved scala style test error) 2aaf6f7 [Oleg Sidorkin] [SPARK-7345][SQL] Regression test for JDBCSuite (renamed fields in JDBC query) b7d5b22 [Oleg Sidorkin] [SPARK-7345][SQL] Regression test for JDBCSuite 09559a0 [Oleg Sidorkin] [SPARK-7345][SQL] Spark cannot detect renamed columns using JDBC connector (cherry picked from commit d7a37bcaf123389fb0828eefb92659c6d9cb3460) Signed-off-by: Reynold Xin <rxin@databricks.com>
* [SPARK-6091] [MLLIB] Add MulticlassMetrics in PySpark/MLlibYanbo Liang2015-05-102-0/+137
| | | | | | | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-6091 Author: Yanbo Liang <ybliang8@gmail.com> Closes #6011 from yanboliang/spark-6091 and squashes the following commits: bb3e4ba [Yanbo Liang] trigger jenkins 53c045d [Yanbo Liang] keep compatibility for python 2.6 972d5ac [Yanbo Liang] Add MulticlassMetrics in PySpark/MLlib (cherry picked from commit bf7e81a51cd81706570615cd67362c86602dec88) Signed-off-by: Xiangrui Meng <meng@databricks.com>
* [SPARK-7475] [MLLIB] adjust ldaExample for online LDAYuhao Yang2015-05-091-6/+25
| | | | | | | | | | | | | | | | | | jira: https://issues.apache.org/jira/browse/SPARK-7475 Add a new argument to specify the algorithm applied to LDA, to exhibit the basic usage of LDAOptimizer. cc jkbradley Author: Yuhao Yang <hhbyyh@gmail.com> Closes #6000 from hhbyyh/ldaExample and squashes the following commits: 0a7e2bc [Yuhao Yang] fix according to comments 5810b0f [Yuhao Yang] adjust ldaExample for online LDA (cherry picked from commit b13162b364aeff35e3bdeea9c9a31e5ce66f8c9a) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
* [BUILD] Reference fasterxml.jackson.version in sql/core/pom.xmltedyu2015-05-091-1/+1
| | | | | | | | | | | | | Author: tedyu <yuzhihong@gmail.com> Closes #6031 from tedyu/master and squashes the following commits: 5c2580c [tedyu] Reference fasterxml.jackson.version in sql/core/pom.xml ff2a44f [tedyu] Merge branch 'master' of github.com:apache/spark 28c8394 [tedyu] Upgrade version of jackson-databind in sql/core/pom.xml (cherry picked from commit bd74301ff87f545e5808e13dd50dea12edd3db92) Signed-off-by: Michael Armbrust <michael@databricks.com>
* Upgrade version of jackson-databind in sql/core/pom.xmltedyu2015-05-091-1/+1
| | | | | | | | | | | | | | | | | Currently version of jackson-databind in sql/core/pom.xml is 2.3.0 This is older than the version specified in root pom.xml This PR upgrades the version in sql/core/pom.xml so that they're consistent. Author: tedyu <yuzhihong@gmail.com> Closes #6028 from tedyu/master and squashes the following commits: 28c8394 [tedyu] Upgrade version of jackson-databind in sql/core/pom.xml (cherry picked from commit 3071aac387ca0b80201022c9c2f245437c77a375) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [STREAMING] [DOCS] Fix wrong url about API docs of StreamingListenerdobashim2015-05-091-1/+1
| | | | | | | | | | | | | A little fix about wrong url of the API document. (org.apache.spark.streaming.scheduler.StreamingListener) Author: dobashim <dobashim@oss.nttdata.co.jp> Closes #6024 from dobashim/master and squashes the following commits: ac9a955 [dobashim] [STREAMING][DOCS] Fix wrong url about API docs of StreamingListener (cherry picked from commit 7d0f17208cda641651dcbd1bc0da639cd74307e7) Signed-off-by: Sean Owen <sowen@cloudera.com>
* [SPARK-7403] [WEBUI] Link URL in objects on Timeline View is wrong in case ↵Kousuke Saruta2015-05-093-20/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | of running on YARN When we use Spark on YARN and have AllJobPage via ResourceManager's proxy, the link URL in objects which represent each job on timeline view is wrong. In timeline-view.js, the link is generated as follows. ``` window.location.href = "job/?id=" + getJobId(this); ``` This assumes the URL displayed on the web browser ends with "jobs/" but when we access AllJobPage via the proxy, the url displayed does not end with "jobs/" The proxy doesn't return status code 301 or 302 so the url displayed still indicates the base url, not "/jobs" even though displaying AllJobPages. ![2015-05-07 3 34 37](https://cloud.githubusercontent.com/assets/4736016/7501079/a8507ad6-f46c-11e4-9bed-62abea170f4c.png) Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #5947 from sarutak/fix-link-in-timeline and squashes the following commits: aaf40e1 [Kousuke Saruta] Added Copyright for vis.js 01bee7b [Kousuke Saruta] Fixed timeline-view.js in order to get correct href (cherry picked from commit 12b95abc7047a8f2fd25a3c8dbb9904eb305eba6) Signed-off-by: Sean Owen <sowen@cloudera.com>
* [SPARK-7438] [SPARK CORE] Fixed validation of relativeSD in countApproxDistinctVinod K C2015-05-094-6/+6
| | | | | | | | | | | | | | | Author: Vinod K C <vinod.kc@huawei.com> Closes #5974 from vinodkc/fix_countApproxDistinct_Validation and squashes the following commits: 3a3d59c [Vinod K C] Reverted removal of validation relativeSD<0.000017 799976e [Vinod K C] Removed testcase to assert IAE when relativeSD>3.7 8ddbfae [Vinod K C] Remove blank line b1b00a3 [Vinod K C] Removed relativeSD validation from python API,RDD.scala will do validation 122d378 [Vinod K C] Fixed validation of relativeSD in countApproxDistinct (cherry picked from commit dda6d9f4045fa2d1265abffa9d7dbdc967448417) Signed-off-by: Sean Owen <sowen@cloudera.com>
* [SPARK-7498] [ML] removed varargs annotation from Params.setDefaultsJoseph K. Bradley2015-05-082-2/+2
| | | | | | | | | | | | | | In SPARK-7429 and PR https://github.com/apache/spark/pull/5960, I added the varargs annotation to Params.setDefault which takes a variable number of ParamPairs. It worked locally and on Jenkins for me. However, mengxr reported issues compiling on his machine. So I'm reverting the change introduced in https://github.com/apache/spark/pull/5960 by removing varargs. Author: Joseph K. Bradley <joseph@databricks.com> Closes #6021 from jkbradley/revert-varargs and squashes the following commits: 098ed39 [Joseph K. Bradley] removed varargs annotation from Params.setDefaults taking multiple ParamPairs (cherry picked from commit 29926238418223b0888d418d163feebf0217b35e) Signed-off-by: Xiangrui Meng <meng@databricks.com>
* [SPARK-7262] [ML] Binary LogisticRegression with L1/L2 (elastic net) using ↵DB Tsai2015-05-085-40/+821
| | | | | | | | | | | | | | | | | | | | | | | | | OWLQN in new ML package 1) Handle scaling and addBias internally. 2) L1/L2 elasticnet using OWLQN optimizer. Author: DB Tsai <dbt@netflix.com> Closes #5967 from dbtsai/lor and squashes the following commits: fa029bb [DB Tsai] made the bound smaller 0806002 [DB Tsai] better initial intercept and more test 5c31824 [DB Tsai] fix import c387e25 [DB Tsai] Merge branch 'master' into lor c84e931 [DB Tsai] Made MultiClassSummarizer private f98e711 [DB Tsai] address feedback a784321 [DB Tsai] fix style 8ec65d2 [DB Tsai] remove new line f3f8c88 [DB Tsai] add more tests and they match R which is good. fix a bug 34705bc [DB Tsai] first commit (cherry picked from commit 86ef4cfd436867d88bdc211f76d6ea668d474558) Signed-off-by: Xiangrui Meng <meng@databricks.com>
* [SPARK-7375] [SQL] Avoid row copying in exchange when ↵Josh Rosen2015-05-081-56/+100
| | | | | | | | | | | | | | | | | | | | | | | | | sort.serializeMapOutputs takes effect This patch refactors the SQL `Exchange` operator's logic for determining whether map outputs need to be copied before being shuffled. As part of this change, we'll now avoid unnecessary copies in cases where sort-based shuffle operates on serialized map outputs (as in #4450 / SPARK-4550). This patch also includes a change to copy the input to RangePartitioner partition bounds calculation, which is necessary because this calculation buffers mutable Java objects. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5948) <!-- Reviewable:end --> Author: Josh Rosen <joshrosen@databricks.com> Closes #5948 from JoshRosen/SPARK-7375 and squashes the following commits: f305ff3 [Josh Rosen] Reduce scope of some variables in Exchange 899e1d7 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-7375 6a6bfce [Josh Rosen] Fix issue related to RangePartitioning: ad006a4 [Josh Rosen] [SPARK-7375] Avoid defensive copying in exchange operator when sort.serializeMapOutputs takes effect. (cherry picked from commit cde5483884068b0ae1470b9b9b3ee54ab944ab12) Signed-off-by: Yin Huai <yhuai@databricks.com>
* [SPARK-7231] [SPARKR] Changes to make SparkR DataFrame dplyr friendly.Shivaram Venkataraman2015-05-088-29/+249
| | | | | | | | | | | | | | | | | | | | | | | | | Changes include 1. Rename sortDF to arrange 2. Add new aliases `group_by` and `sample_frac`, `summarize` 3. Add more user friendly column addition (mutate), rename 4. Support mean as an alias for avg in Scala and also support n_distinct, n as in dplyr Using these changes we can pretty much run the examples as described in http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html with the same syntax The only thing missing in SparkR is auto resolving column names when used in an expression i.e. making something like `select(flights, delay)` works in dply but we right now need `select(flights, flights$delay)` or `select(flights, "delay")`. But this is a complicated change and I'll file a new issue for it cc sun-rui rxin Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #6005 from shivaram/sparkr-df-api and squashes the following commits: 5e0716a [Shivaram Venkataraman] Fix some roxygen bugs 1254953 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into sparkr-df-api 0521149 [Shivaram Venkataraman] Changes to make SparkR DataFrame dplyr friendly. Changes include 1. Rename sortDF to arrange 2. Add new aliases `group_by` and `sample_frac`, `summarize` 3. Add more user friendly column addition (mutate), rename 4. Support mean as an alias for avg in Scala and also support n_distinct, n as in dplyr (cherry picked from commit 0a901dd3a1eb3fd459d45b771ce4ad2cfef2a944) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
* [SPARK-7451] [YARN] Preemption of executors is counted as failure causing ↵Ashwin Shankar2015-05-081-1/+3
| | | | | | | | | | | | | | | | | Spark job to fail Added a check to handle container exit status for the preemption scenario, log an INFO message in such cases and move on. andrewor14 Author: Ashwin Shankar <ashankar@netflix.com> Closes #5993 from ashwinshankar77/SPARK-7451 and squashes the following commits: 90900cf [Ashwin Shankar] Fix log info message cf8b6cf [Ashwin Shankar] Stop counting preemption of executors as failure (cherry picked from commit b6c797b08cbd08d7aab59ad0106af0f5f41ef186) Signed-off-by: Sandy Ryza <sandy@cloudera.com>
* [SPARK-7488] [ML] Feature Parity in PySpark for ml.recommendationBurak Yavuz2015-05-084-4/+318
| | | | | | | | | | | | | | | | | Adds Python Api for `ALS` under `ml.recommendation` in PySpark. Also adds seed as a settable parameter in the Scala Implementation of ALS. Author: Burak Yavuz <brkyvz@gmail.com> Closes #6015 from brkyvz/ml-rec and squashes the following commits: be6e931 [Burak Yavuz] addressed comments eaed879 [Burak Yavuz] readd numFeatures 0bd66b1 [Burak Yavuz] fixed seed 7f6d964 [Burak Yavuz] merged master 52e2bda [Burak Yavuz] added ALS (cherry picked from commit 84bf931f36edf1f319c9116f7f326959a6118991) Signed-off-by: Xiangrui Meng <meng@databricks.com>
* [SPARK-7237] Clean function in several RDD methodstedyu2015-05-082-10/+41
| | | | | | | | | | | | | | | | | | | | | | Author: tedyu <yuzhihong@gmail.com> Closes #5959 from ted-yu/master and squashes the following commits: f83d445 [tedyu] Move cleaning outside of mapPartitionsWithIndex 56d7c92 [tedyu] Consolidate import of Random f6014c0 [tedyu] Remove cleaning in RDD#filterWith 36feb6c [tedyu] Try to get correct syntax 55d01eb [tedyu] Try to get correct syntax c2786df [tedyu] Correct syntax d92bfcf [tedyu] Correct syntax in test 164d3e4 [tedyu] Correct variable name 8b50d93 [tedyu] Address Andrew's review comments 0c8d47e [tedyu] Add test for mapWith() 6846e40 [tedyu] Add test for flatMapWith() 6c124a9 [tedyu] Clean function in several RDD methods (cherry picked from commit 54e6fa0563ffa8788ec2fd1b8740445ef3c2ce5a) Signed-off-by: Andrew Or <andrew@databricks.com>
* [SPARK-7469] [SQL] DAG visualization: show SQL query operatorsAndrew Or2015-05-0828-50/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | The DAG visualization currently displays only low-level Spark primitives (e.g. `map`, `reduceByKey`, `filter` etc.). For SQL, these aren't particularly useful. Instead, we should display higher level physical operators (e.g. `Filter`, `Exchange`, `ShuffleHashJoin`). cc marmbrus ----------------- **Before** <img src="https://issues.apache.org/jira/secure/attachment/12731586/before.png" width="600px"/> ----------------- **After** (Pay attention to the words) <img src="https://issues.apache.org/jira/secure/attachment/12731587/after.png" width="600px"/> ----------------- Author: Andrew Or <andrew@databricks.com> Closes #5999 from andrewor14/dag-viz-sql and squashes the following commits: 0db23a4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-sql 1e211db [Andrew Or] Update comment 0d49fd6 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-sql ffd237a [Andrew Or] Fix style 202dac1 [Andrew Or] Make ignoreParent false by default e61b1ab [Andrew Or] Visualize SQL operators, not low-level Spark primitives 569034a [Andrew Or] Add a flag to ignore parent settings and scopes (cherry picked from commit bd61f07039064833108070e19b752d4c46045766) Signed-off-by: Andrew Or <andrew@databricks.com>
* [SPARK-6955] Perform port retries at NettyBlockTransferService levelAaron Davidson2015-05-084-39/+102
| | | | | | | | | | | | | | | | Currently we're doing port retries in the TransportServer level, but this is not specified by the TransportContext API and it has other further-reaching impacts like causing undesirable behavior for the Yarn and Standalone shuffle services. Author: Aaron Davidson <aaron@databricks.com> Closes #5575 from aarondav/port-bind and squashes the following commits: 3c2d6ed [Aaron Davidson] Oops, never do it. a5d9432 [Aaron Davidson] Remove shouldHostShuffleServiceIfEnabled e901eb2 [Aaron Davidson] fix local-cluster mode for ExternalShuffleServiceSuite 59e5e38 [Aaron Davidson] [SPARK-6955] Perform port retries at NettyBlockTransferService level (cherry picked from commit ffdc40ce7a799f2564f57b958d0f32f1d1636488) Signed-off-by: Andrew Or <andrew@databricks.com>
* updated ec2 instance typesBrendan Collins2015-05-081-23/+47
| | | | | | | | | | | | | | | | | | I needed to run some d2 instances, so I updated the spark_ec2.py accordingly Author: Brendan Collins <bcollins@blueraster.com> Closes #6014 from brendancol/ec2-instance-types-update and squashes the following commits: d7b4191 [Brendan Collins] Merge branch 'ec2-instance-types-update' of github.com:brendancol/spark into ec2-instance-types-update 6366c45 [Brendan Collins] added back cc1.4xlarge fc2931f [Brendan Collins] updated ec2 instance types 80c2aa6 [Brendan Collins] vertically aligned whitespace 85c6236 [Brendan Collins] vertically aligned whitespace 1657c26 [Brendan Collins] updated ec2 instance types (cherry picked from commit 1c78f6866ebbcfb41d9875bfa3c0b9fa23b188bf) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
* [SPARK-5913] [MLLIB] Python API for ChiSqSelectorYanbo Liang2015-05-082-2/+67
| | | | | | | | | | | | | | Add a Python API for mllib.feature.ChiSqSelector https://issues.apache.org/jira/browse/SPARK-5913 Author: Yanbo Liang <ybliang8@gmail.com> Closes #5939 from yanboliang/spark-5913 and squashes the following commits: cdaac99 [Yanbo Liang] Python API for ChiSqSelector (cherry picked from commit 35c9599b94de759204ed33cdd46d8ee108bccd86) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
* [SPARK-4699] [SQL] Make caseSensitive configurable in spark sql analyzerJacky Li2015-05-0815-70/+127
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | based on #3558 Author: Jacky Li <jacky.likun@huawei.com> Author: wangfei <wangfei1@huawei.com> Author: scwf <wangfei1@huawei.com> Closes #5806 from scwf/case and squashes the following commits: cd51712 [wangfei] fix compile d4b724f [wangfei] address michael's comment af512c7 [wangfei] fix conflicts 4ef1be7 [wangfei] fix conflicts 269cf21 [scwf] fix conflicts b73df6c [scwf] style issue 9e11752 [scwf] improve SimpleCatalystConf b35529e [scwf] minor style a3f7659 [scwf] remove unsed imports 2a56515 [scwf] fix conflicts 6db4bf5 [scwf] also fix for HiveContext 7fc4a98 [scwf] fix test case d5a9933 [wangfei] fix style eee75ba [wangfei] fix EmptyConf 6ef31cf [wangfei] revert pom changes 5d7c456 [wangfei] set CASE_SENSITIVE false in TestHive 966e719 [wangfei] set CASE_SENSITIVE false in hivecontext fd30e25 [wangfei] added override 69b3b70 [wangfei] fix AnalysisSuite 5472b08 [wangfei] fix compile issue 56034ca [wangfei] fix conflicts and improve for catalystconf 664d1e9 [Jacky Li] Merge branch 'master' of https://github.com/apache/spark into case 12eca9a [Jacky Li] solve conflict with master 39e369c [Jacky Li] fix confilct after DataFrame PR dee56e9 [Jacky Li] fix test case failure 05b09a3 [Jacky Li] fix conflict base on the latest master branch 73c16b1 [Jacky Li] fix bug in sql/hive 9bf4cc7 [Jacky Li] fix bug in catalyst 005c56d [Jacky Li] make SQLContext caseSensitivity configurable 6332e0f [Jacky Li] fix bug fcbf0d9 [Jacky Li] fix scalastyle check e7bca31 [Jacky Li] make caseSensitive configuration in Analyzer and Catalog 91b1b96 [Jacky Li] make caseSensitive configurable in Analyzer f57f15c [Jacky Li] add testcase 578d167 [Jacky Li] make caseSensitive configurable (cherry picked from commit 6dad76e5eba3c2925bfc9d142f31f7c2dc649886) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-7390] [SQL] Only merge other CovarianceCounter when its count is ↵Liang-Chi Hsieh2015-05-081-10/+12
| | | | | | | | | | | | | | | | | greater than zero JIRA: https://issues.apache.org/jira/browse/SPARK-7390 Also fix a minor typo. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #5931 from viirya/fix_covariancecounter and squashes the following commits: 352eda6 [Liang-Chi Hsieh] Only merge other CovarianceCounter when its count is greater than zero. (cherry picked from commit 90527f560462cc2d693176bd961b02767e460e06) Signed-off-by: Xiangrui Meng <meng@databricks.com>
* [SPARK-7378] [CORE] Handle deep links to unloaded apps.Marcelo Vanzin2015-05-081-19/+29
| | | | | | | | | | | | | | | | | | | | | | | | The code was treating deep links as if they were attempt IDs, so for example if you tried to load "/history/app1/jobs" directly, that would fail because the code would treat "jobs" as an attempt id. This change modifies the code to try both cases - first without an attempt id, then with it, so that deep links are handled correctly. This assumes that the links in the Spark UI do not clash with the attempt id namespace, though, which is the case for YARN at least, which is the only backend that currently publishes attempt IDs. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #5922 from vanzin/SPARK-7378 and squashes the following commits: 96f648b [Marcelo Vanzin] Fix comparison. ed3bcd4 [Marcelo Vanzin] Merge branch 'master' into SPARK-7378 23483e4 [Marcelo Vanzin] Fat fingers. b728f08 [Marcelo Vanzin] [SPARK-7378] [core] Handle deep links to unloaded apps. (cherry picked from commit 5467c34c3d6538e053957b5513df218f1f5bae6b) Signed-off-by: Andrew Or <andrew@databricks.com>
* [MINOR] [CORE] Allow History Server to read kerberos opts from config file.Marcelo Vanzin2015-05-081-1/+1
| | | | | | | | | | | | | Order of initialization code was wrong. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #5998 from vanzin/hs-conf-fix and squashes the following commits: 00b6b6b [Marcelo Vanzin] [minor] [core] Allow History Server to read kerberos opts from config file. (cherry picked from commit 9042f8f3784f10f695cba6b80c054695b1c152c5) Signed-off-by: Andrew Or <andrew@databricks.com>
* [SPARK-7466] DAG visualization: fix orphan nodesAndrew Or2015-05-081-1/+1
| | | | | | | | | | | | | | | | | | Simple fix. We were comparing an option with `null`. Before: <img src="https://issues.apache.org/jira/secure/attachment/12731383/before.png" width="250px"/> After: <img src="https://issues.apache.org/jira/secure/attachment/12731384/after.png" width="250px"/> Author: Andrew Or <andrew@databricks.com> Closes #6002 from andrewor14/dag-viz-orphan-nodes and squashes the following commits: a1468dc [Andrew Or] Fix null check (cherry picked from commit 3b0c5e71f156516fd8bbbeda70e69b487b0c1418) Signed-off-by: Andrew Or <andrew@databricks.com>
* [MINOR] Defeat early garbage collection of test suite variableTim Ellison2015-05-081-0/+1
| | | | | | | | | | | | | | | The JVM is free to collect references to variables that no longer participate in a computation. This simple patch adds an operation to the variable 'rdd' to ensure it is not collected early in the test suite's explicit calls to GC. ref: http://bugs.java.com/view_bug.do?bug_id=6721588 Author: Tim Ellison <t.p.ellison@gmail.com> Closes #6010 from tellison/master and squashes the following commits: 77d1c8f [Tim Ellison] Defeat early garbage collection of test suite variable by aggressive JVMs (cherry picked from commit 31da40dfeeeab69ee7974992328e3f67046ad3da) Signed-off-by: Andrew Or <andrew@databricks.com>
* [SPARK-7489] [SPARK SHELL] Spark shell crashes when compiled with scala 2.11vinodkc2015-05-081-1/+1
| | | | | | | | | | | | | | | Spark shell crashes when compiled with scala 2.11 and SPARK_PREPEND_CLASSES=true There is a similar Resolved JIRA issue -SPARK-7470 and a PR https://github.com/apache/spark/pull/5997 , which handled same issue only in scala 2.10 Author: vinodkc <vinod.kc.in@gmail.com> Closes #6013 from vinodkc/fix_sqlcontext_exception_scala_2.11 and squashes the following commits: 119061c [vinodkc] Spark shell crashes when compiled with scala 2.11 (cherry picked from commit 4e7360e12dc71c2391764e3596a7971b4d9d7bfc) Signed-off-by: Andrew Or <andrew@databricks.com>
* [WEBUI] Remove debug feature for vis.jsKousuke Saruta2015-05-083-3/+0
| | | | | | | | | | | | | | | | | | `vis.min.js` refers `vis.map` and this even refers `vis.js` which is used for debug `vis.js` but this debug feature is not needed for Spark itself. This issue is really minor so I don't file this in JIRA. /CC andrewor14 Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #5994 from sarutak/remove-debug-feature-for-vis and squashes the following commits: 8be038f [Kousuke Saruta] Remove vis.map entry from .rat-exclude 7404945 [Kousuke Saruta] Removed debug feature for vis.js (cherry picked from commit c45c09b015f6f1111fcf9e3c3109a253bbd1d259) Signed-off-by: Andrew Or <andrew@databricks.com>
* [MINOR] Ignore python/lib/pyspark.zipzsxwing2015-05-081-0/+1
| | | | | | | | | | | | | Add `python/lib/pyspark.zip` to `.gitignore`. After merging #5580, `python/lib/pyspark.zip` will be generated when building Spark. Author: zsxwing <zsxwing@gmail.com> Closes #6017 from zsxwing/gitignore and squashes the following commits: 39b10c4 [zsxwing] Ignore python/lib/pyspark.zip (cherry picked from commit dc71e47f047604e3a9972fc386a462d03bff72cf) Signed-off-by: Andrew Or <andrew@databricks.com>
* [SPARK-7490] [CORE] [Minor] MapOutputTracker.deserializeMapStatuses: close ↵Evan Jones2015-05-081-1/+5
| | | | | | | | | | | | | | | | | | | | input streams GZIPInputStream allocates native memory that is not freed until close() or when the finalizer runs. It is best to close() these streams explicitly. stephenh made the same change for serializeMapStatuses in commit b0d884f0. This is the same change for deserialize. (I ran the unit test suite! it seems to have passed. I did not make a JIRA since this seems "trivial", and the guidelines suggest it is not required for trivial changes) Author: Evan Jones <ejones@twitter.com> Closes #5982 from evanj/master and squashes the following commits: 0d76e85 [Evan Jones] [CORE] MapOutputTracker.deserializeMapStatuses: close input streams (cherry picked from commit 25889d8d97094325f10fbf52f3b36412f212eeb2) Signed-off-by: Sean Owen <sowen@cloudera.com>
* [SPARK-6627] Finished rename to ShuffleBlockResolverKay Ousterhout2015-05-0816-95/+94
| | | | | | | | | | | | | | | | | | | | | The previous cleanup-commit for SPARK-6627 renamed ShuffleBlockManager to ShuffleBlockResolver, but didn't rename the associated subclasses and variables; this commit does that. I'm unsure whether it's ok to rename ExternalShuffleBlockManager, since that's technically a public class? cc pwendell Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #5764 from kayousterhout/SPARK-6627 and squashes the following commits: 43add1e [Kay Ousterhout] Spacing fix 96080bf [Kay Ousterhout] Test fixes d8a5d36 [Kay Ousterhout] [SPARK-6627] Finished rename to ShuffleBlockResolver (cherry picked from commit 4b3bb0e43ca7e1a27308516608419487b6a844e6) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
* [SPARK-7133] [SQL] Implement struct, array, and map field accessorWenchen Fan2015-05-0816-191/+327
| | | | | | | | | | | | | | | | | | | | | | | | | It's the first step: generalize UnresolvedGetField to support all map, struct, and array TODO: add `apply` in Scala and `__getitem__` in Python, and unify the `getItem` and `getField` methods to one single API(or should we keep them for compatibility?). Author: Wenchen Fan <cloud0fan@outlook.com> Closes #5744 from cloud-fan/generalize and squashes the following commits: 715c589 [Wenchen Fan] address comments 7ea5b31 [Wenchen Fan] fix python test 4f0833a [Wenchen Fan] add python test f515d69 [Wenchen Fan] add apply method and test cases 8df6199 [Wenchen Fan] fix python test 239730c [Wenchen Fan] fix test compile 2a70526 [Wenchen Fan] use _bin_op in dataframe.py 6bf72bc [Wenchen Fan] address comments 3f880c3 [Wenchen Fan] add java doc ab35ab5 [Wenchen Fan] fix python test b5961a9 [Wenchen Fan] fix style c9d85f5 [Wenchen Fan] generalize UnresolvedGetField to support all map, struct, and array (cherry picked from commit 2d05f325dc3c70349bd17ed399897f22d967c687) Signed-off-by: Michael Armbrust <michael@databricks.com>