aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Revert "Preparing Spark release v1.2.1-rc2"Patrick Wendell2015-02-0229-29/+29
| | | | This reverts commit b77f87673d1f9f03d4c83cf583158227c551359b.
* Revert "Preparing development version 1.2.2-SNAPSHOT"Patrick Wendell2015-02-0229-29/+29
| | | | This reverts commit 0a16abadc59082b7d3a24d7f3625236658632813.
* Revert "[SPARK-5195][sql]Update HiveMetastoreCatalog.scala(override the ↵Patrick Wendell2015-02-022-15/+0
| | | | | | MetastoreRelation's sameresult method only compare databasename and table name)" This reverts commit 54864403c4f132d9c1380c015122a849dd44dff8.
* [SPARK-5195][sql]Update HiveMetastoreCatalog.scala(override the ↵seayi2015-02-022-0/+15
| | | | | | | | | | | | | | | | | | | | | | | MetastoreRelation's sameresult method only compare databasename and table name) override the MetastoreRelation's sameresult method only compare databasename and table name because in previous : cache table t1; select count(*) from t1; it will read data from memory but the sql below will not,instead it read from hdfs: select count(*) from t1 t; because cache data is keyed by logical plan and compare with sameResult ,so when table with alias the same table 's logicalplan is not the same logical plan with out alias so modify the sameresult method only compare databasename and table name Author: seayi <405078363@qq.com> Author: Michael Armbrust <michael@databricks.com> Closes #3898 from seayi/branch-1.2 and squashes the following commits: 8f0c7d2 [seayi] Update CachedTableSuite.scala a277120 [seayi] Update HiveMetastoreCatalog.scala 8d910aa [seayi] Update HiveMetastoreCatalog.scala
* Disabling Utils.chmod700 for WindowsMartin Weindel2015-02-021-1/+3
| | | | | | | | | | | | | | | | | | | | This patch makes Spark 1.2.1rc2 work again on Windows. Without it you get following log output on creating a Spark context: INFO org.apache.spark.SparkEnv:59 - Registering BlockManagerMaster ERROR org.apache.spark.util.Utils:75 - Failed to create local root dir in .... Ignoring this directory. ERROR org.apache.spark.storage.DiskBlockManager:75 - Failed to create any local dir. Author: Martin Weindel <martin.weindel@gmail.com> Author: mweindel <m.weindel@usu-software.de> Closes #4299 from MartinWeindel/branch-1.2 and squashes the following commits: 535cb7f [Martin Weindel] fixed last commit f17072e [Martin Weindel] moved condition to caller to avoid confusion on chmod700() return value 4de5e91 [Martin Weindel] reverted to unix line ends fe2740b [mweindel] moved comment ac4749c [mweindel] fixed chmod700 for Windows
* [Docs] Fix Building Spark link textNicholas Chammas2015-02-021-1/+1
| | | | | | | | | | | Author: Nicholas Chammas <nicholas.chammas@gmail.com> Closes #4312 from nchammas/patch-2 and squashes the following commits: 9d943aa [Nicholas Chammas] [Docs] Fix Building Spark link text (cherry picked from commit 3f941b68a2336aa7876aeda99865e7c19b53bc5c) Signed-off-by: Andrew Or <andrew@databricks.com>
* Preparing development version 1.2.2-SNAPSHOTPatrick Wendell2015-01-2829-29/+29
|
* Preparing Spark release v1.2.1-rc2Patrick Wendell2015-01-2829-29/+29
|
* Revert "Preparing Spark release v1.2.1-rc1"Patrick Wendell2015-01-2729-29/+29
| | | | This reverts commit 3e2d7d310b76c293b9ac787f204e6880f508f6ec.
* Revert "Preparing development version 1.2.2-SNAPSHOT"Patrick Wendell2015-01-2729-29/+29
| | | | This reverts commit f53a4319ba5f0843c077e64ae5a41e2fac835a5b.
* [MLlib] fix python example of ALS in guideDavies Liu2015-01-271-6/+5
| | | | | | | | | | | | | fix python example of ALS in guide, use Rating instead of np.array. Author: Davies Liu <davies@databricks.com> Closes #4226 from davies/fix_als_guide and squashes the following commits: 1433d76 [Davies Liu] fix python example of als in guide (cherry picked from commit fdaad4eb0388cfe43b5b6600927eb7b9182646f9) Signed-off-by: Xiangrui Meng <meng@databricks.com>
* SPARK-5308 [BUILD] MD5 / SHA1 hash format doesn't match standard Maven outputSean Owen2015-01-271-2/+8
| | | | | | | | | | | | | | Here's one way to make the hashes match what Maven's plugins would create. It takes a little extra footwork since OS X doesn't have the same command line tools. An alternative is just to make Maven output these of course - would that be better? I ask in case there is a reason I'm missing, like, we need to hash files that Maven doesn't build. Author: Sean Owen <sowen@cloudera.com> Closes #4161 from srowen/SPARK-5308 and squashes the following commits: 70d09d0 [Sean Owen] Use $(...) syntax e25eff8 [Sean Owen] Generate MD5, SHA1 hashes in a format like Maven's plugin (cherry picked from commit ff356e2a21e31998cda3062e560a276a3bfaa7ab) Signed-off-by: Patrick Wendell <patrick@databricks.com>
* Preparing development version 1.2.2-SNAPSHOTPatrick Wendell2015-01-2729-29/+29
|
* Preparing Spark release v1.2.1-rc1Patrick Wendell2015-01-2729-29/+29
|
* Revert "Preparing Spark release v1.2.1-rc1"Patrick Wendell2015-01-2629-29/+29
| | | | This reverts commit e87eb2b42f137c22194cfbca2abf06fecdf943da.
* Revert "Preparing development version 1.2.2-SNAPSHOT"Patrick Wendell2015-01-2629-29/+29
| | | | This reverts commit adfed7086f10fa8db4eeac7996c84cf98f625e9a.
* Preparing development version 1.2.2-SNAPSHOTUbuntu2015-01-2729-29/+29
|
* Preparing Spark release v1.2.1-rc1Ubuntu2015-01-2729-29/+29
|
* Updating versions for Spark 1.2.1Patrick Wendell2015-01-263-4/+5
|
* SPARK-4147 [CORE] Reduce log4j dependencySean Owen2015-01-261-9/+11
| | | | | | | | | | | | | Defer use of log4j class until it's known that log4j 1.2 is being used. This may avoid dealing with log4j dependencies for callers that reroute slf4j to another logging framework. The only change is to push one half of the check in the original `if` condition inside. This is a trivial change, may or may not actually solve a problem, but I think it's all that makes sense to do for SPARK-4147. Author: Sean Owen <sowen@cloudera.com> Closes #4190 from srowen/SPARK-4147 and squashes the following commits: 4e99942 [Sean Owen] Defer use of log4j class until it's known that log4j 1.2 is being used. This may avoid dealing with log4j dependencies for callers that reroute slf4j to another logging framework. (cherry picked from commit 54e7b456dd56c9e52132154e699abca87563465b) Signed-off-by: Patrick Wendell <patrick@databricks.com>
* [SPARK-5355] use j.u.c.ConcurrentHashMap instead of TrieMapDavies Liu2015-01-263-21/+23
| | | | | | | | | | | | | | | | | | | | j.u.c.ConcurrentHashMap is more battle tested. cc rxin JoshRosen pwendell Author: Davies Liu <davies@databricks.com> Closes #4208 from davies/safe-conf and squashes the following commits: c2182dc [Davies Liu] address comments, fix tests 3a1d821 [Davies Liu] fix test da14ced [Davies Liu] Merge branch 'master' of github.com:apache/spark into safe-conf ae4d305 [Davies Liu] change to j.u.c.ConcurrentMap f8fa1cf [Davies Liu] change to TrieMap a1d769a [Davies Liu] make SparkConf thread-safe (cherry picked from commit 142093179a4c40bdd90744191034de7b94a963ff) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
* SPARK-4430 [STREAMING] [TEST] Apache RAT Checks fail spuriously on test filesSean Owen2015-01-251-7/+2
| | | | | | | | | | | | | Another trivial one. The RAT failure was due to temp files from `FailureSuite` not being cleaned up. This just makes the cleanup more reliable by using the standard temp dir mechanism. Author: Sean Owen <sowen@cloudera.com> Closes #4189 from srowen/SPARK-4430 and squashes the following commits: 9ea63ff [Sean Owen] Properly acquire a temp directory to ensure it is cleaned up at shutdown, which helps avoid a RAT check failure (cherry picked from commit 0528b85cf96f9c9c074b5fbb5b9c5dd8071c0bc7) Signed-off-by: Andrew Or <andrew@databricks.com>
* Revert "[SPARK-5344][WebUI] HistoryServer cannot recognize that inprogress ↵Andrew Or2015-01-251-3/+1
| | | | | | file was renamed to completed file" This reverts commit 8f55beeb51e6ea72e63af3f276497f61dd24d09b.
* [SPARK-5344][WebUI] HistoryServer cannot recognize that inprogress file was ↵Kousuke Saruta2015-01-251-1/+3
| | | | | | | | | | | | | | | | | | | renamed to completed file `FsHistoryProvider` tries to update application status but if `checkForLogs` is called before `.inprogress` file is renamed to completed file, the file is not recognized as completed. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #4132 from sarutak/SPARK-5344 and squashes the following commits: 9658008 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5344 d2c72b6 [Kousuke Saruta] Fixed update issue of FsHistoryProvider (cherry picked from commit 8f5c827b01026bf45fc774ed7387f11a941abea8) Signed-off-by: Andrew Or <andrew@databricks.com> Conflicts: core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
* SPARK-4506 [DOCS] Addendum: Update more docs to reflect that standalone ↵Sean Owen2015-01-251-2/+2
| | | | | | | | | | | | | | | works in cluster mode This is a trivial addendum to SPARK-4506, which was already resolved. noted by Asim Jalis in SPARK-4506. Author: Sean Owen <sowen@cloudera.com> Closes #4160 from srowen/SPARK-4506 and squashes the following commits: 5f5f7df [Sean Owen] Update more docs to reflect that standalone works in cluster mode (cherry picked from commit 9f6435763d173d2abf82d16b5878983fa8bf3419) Signed-off-by: Andrew Or <andrew@databricks.com>
* SPARK-5382: Use SPARK_CONF_DIR in spark-class and spark-submit, spark-su...Jacek Lewandowski2015-01-252-2/+9
| | | | | | | | | | ...bmit2.cmd if it is defined Author: Jacek Lewandowski <lewandowski.jacek@gmail.com> Closes #4177 from jacek-lewandowski/SPARK-5382-1.2 and squashes the following commits: 41cef25 [Jacek Lewandowski] SPARK-5382: Use SPARK_CONF_DIR in spark-class and spark-submit, spark-submit2.cmd if it is defined
* SPARK-5382: Use SPARK_CONF_DIR in spark-class if it is definedJacek Lewandowski2015-01-251-2/+3
| | | | | | | | Author: Jacek Lewandowski <lewandowski.jacek@gmail.com> Closes #4179 from jacek-lewandowski/SPARK-5382-1.3 and squashes the following commits: 55d7791 [Jacek Lewandowski] SPARK-5382: Use SPARK_CONF_DIR in spark-class if it is defined
* SPARK-3852 [DOCS] Document spark.driver.extra* configsSean Owen2015-01-251-0/+21
| | | | | | | | | | | | | As per the JIRA. I copied the `spark.executor.extra*` text, but removed info that appears to be specific to the `executor` config and not `driver`. Author: Sean Owen <sowen@cloudera.com> Closes #4185 from srowen/SPARK-3852 and squashes the following commits: f60a8a1 [Sean Owen] Document spark.driver.extra* configs (cherry picked from commit c586b45dd25b50be7f195df2ce91b307e1ed71a9) Signed-off-by: Andrew Or <andrew@databricks.com>
* [SPARK-5402] log executor ID at executor-construction timeRyan Williams2015-01-251-5/+8
| | | | | | | | | | | | | | | | also rename "slaveHostname" to "executorHostname" Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #4195 from ryan-williams/exec and squashes the following commits: e60a7bb [Ryan Williams] log executor ID at executor-construction time (cherry picked from commit aea25482c370fbcf712a464501605bc16ee4ed5d) Signed-off-by: Andrew Or <andrew@databricks.com> Conflicts: core/src/main/scala/org/apache/spark/executor/Executor.scala
* [SPARK-5401] set executor ID before creating MetricsSystemRyan Williams2015-01-252-2/+6
| | | | | | | | Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #4194 from ryan-williams/metrics and squashes the following commits: 7c5a33f [Ryan Williams] set executor ID before creating MetricsSystem
* [SPARK-5058] Part 2. Typos and broken URLJongyoul Lee2015-01-231-1/+1
| | | | | | | | | | | | | - Also fixed java link Author: Jongyoul Lee <jongyoul@gmail.com> Closes #4172 from jongyoul/SPARK-FIXDOC and squashes the following commits: 6be03e5 [Jongyoul Lee] [SPARK-5058] Part 2. Typos and broken URL - Also fixed java link (cherry picked from commit 09e09c548e7722fca1cdc89bd37de2cee58f4ce9) Signed-off-by: Reynold Xin <rxin@databricks.com>
* [SPARK-5351][GraphX] Do not use Partitioner.defaultPartitioner as a ↵Takeshi Yamamuro2015-01-232-2/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | partitioner of EdgeRDDImp... If the value of 'spark.default.parallelism' does not match the number of partitoins in EdgePartition(EdgeRDDImpl), the following error occurs in ReplicatedVertexView.scala:72; object GraphTest extends Logging { def run[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED]): VertexRDD[Int] = { graph.aggregateMessages( ctx => { ctx.sendToSrc(1) ctx.sendToDst(2) }, _ + _) } } val g = GraphLoader.edgeListFile(sc, "graph.txt") val rdd = GraphTest.run(g) java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions at org.apache.spark.rdd.ZippedPartitionsBaseRDD.getPartitions(ZippedPartitionsRDD.scala:57) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:206) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:204) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:206) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:204) at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:82) at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80) at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:193) at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:191) ... Author: Takeshi Yamamuro <linguin.m.s@gmail.com> Closes #4136 from maropu/EdgePartitionBugFix and squashes the following commits: 0cd8942 [Ankur Dave] Use more concise getOrElse aad4a2c [Ankur Dave] Add unit test for non-default number of edge partitions 0a2f32b [Takeshi Yamamuro] Do not use Partitioner.defaultPartitioner as a partitioner of EdgeRDDImpl (cherry picked from commit e224dbb011789297cd6c6ba095f702c042869ed6) Signed-off-by: Ankur Dave <ankurdave@gmail.com>
* [SPARK-5063] More helpful error messages for several invalid operationsJosh Rosen2015-01-236-14/+138
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds more helpful error messages for invalid programs that define nested RDDs, broadcast RDDs, perform actions inside of transformations (e.g. calling `count()` from inside of `map()`), and call certain methods on stopped SparkContexts. Currently, these invalid programs lead to confusing NullPointerExceptions at runtime and have been a major source of questions on the mailing list and StackOverflow. In a few cases, I chose to log warnings instead of throwing exceptions in order to avoid any chance that this patch breaks programs that worked "by accident" in earlier Spark releases (e.g. programs that define nested RDDs but never run any jobs with them). In SparkContext, the new `assertNotStopped()` method is used to check whether methods are being invoked on a stopped SparkContext. In some cases, user programs will not crash in spite of calling methods on stopped SparkContexts, so I've only added `assertNotStopped()` calls to methods that always throw exceptions when called on stopped contexts (e.g. by dereferencing a null `dagScheduler` pointer). Author: Josh Rosen <joshrosen@databricks.com> Closes #3884 from JoshRosen/SPARK-5063 and squashes the following commits: a38774b [Josh Rosen] Fix spelling typo a943e00 [Josh Rosen] Convert two exceptions into warnings in order to avoid breaking user programs in some edge-cases. 2d0d7f7 [Josh Rosen] Fix test to reflect 1.2.1 compatibility 3f0ea0c [Josh Rosen] Revert two unintentional formatting changes 8e5da69 [Josh Rosen] Remove assertNotStopped() calls for methods that were sometimes safe to call on stopped SC's in Spark 1.2 8cff41a [Josh Rosen] IllegalStateException fix 6ef68d0 [Josh Rosen] Fix Python line length issues. 9f6a0b8 [Josh Rosen] Add improved error messages to PySpark. 13afd0f [Josh Rosen] SparkException -> IllegalStateException 8d404f3 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-5063 b39e041 [Josh Rosen] Fix BroadcastSuite test which broadcasted an RDD 99cc09f [Josh Rosen] Guard against calling methods on stopped SparkContexts. 34833e8 [Josh Rosen] Add more descriptive error message. 57cc8a1 [Josh Rosen] Add error message when directly broadcasting RDD. 15b2e6b [Josh Rosen] [SPARK-5063] Useful error messages for nested RDDs and actions inside of transformations (cherry picked from commit cef1f092a628ac20709857b4388bb10e0b5143b0) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
* [SPARK-5233][Streaming] Fix error replaying of WAL introduced bugjerryshao2015-01-224-20/+32
| | | | | | | | | | | | | | | | Because of lacking of `BlockAllocationEvent` in WAL recovery, the dangled event will mix into the new batch, which will lead to the wrong result. Details can be seen in [SPARK-5233](https://issues.apache.org/jira/browse/SPARK-5233). Author: jerryshao <saisai.shao@intel.com> Closes #4032 from jerryshao/SPARK-5233 and squashes the following commits: f0b0c0b [jerryshao] Further address the comments a237c75 [jerryshao] Address the comments e356258 [jerryshao] Fix bug in unit test 558bdc3 [jerryshao] Correctly replay the WAL log when recovering from failure (cherry picked from commit 3c3fa632e6ba45ce536065aa1145698385301fb2) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
* [HOTFIX] Fixed compilation error due to missing SparkContext._ implicit ↵Tathagata Das2015-01-221-0/+1
| | | | conversions.
* [SPARK-5147][Streaming] Delete the received data WAL log periodicallyTathagata Das2015-01-219-50/+172
| | | | | | | | | | | | | | | | | | | | | | | This is a refactored fix based on jerryshao 's PR #4037 This enabled deletion of old WAL files containing the received block data. Improvements over #4037 - Respecting the rememberDuration of all receiver streams. In #4037, if there were two receiver streams with multiple remember durations, the deletion would have delete based on the shortest remember duration, thus deleting data prematurely for the receiver stream with longer remember duration. - Added unit test to test creation of receiver WAL, automatic deletion, and respecting of remember duration. jerryshao I am going to merge this ASAP to make it 1.2.1 Thanks for the initial draft of this PR. Made my job much easier. Author: Tathagata Das <tathagata.das1565@gmail.com> Author: jerryshao <saisai.shao@intel.com> Closes #4149 from tdas/SPARK-5147 and squashes the following commits: 730798b [Tathagata Das] Added comments. c4cf067 [Tathagata Das] Minor fixes 2579b27 [Tathagata Das] Refactored the fix to make sure that the cleanup respects the remember duration of all the receiver streams 2736fd1 [jerryshao] Delete the old WAL log periodically (cherry picked from commit 3027f06b4127ab23a43c5ce8cebf721e3b6766e5) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
* [SPARK-5355] make SparkConf thread-safeDavies Liu2015-01-211-2/+3
| | | | | | | | | | | | | | | | The SparkConf is not thread-safe, but is accessed by many threads. The getAll() could return parts of the configs if another thread is access it. This PR changes SparkConf.settings to a thread-safe TrieMap. Author: Davies Liu <davies@databricks.com> Closes #4143 from davies/safe-conf and squashes the following commits: f8fa1cf [Davies Liu] change to TrieMap a1d769a [Davies Liu] make SparkConf thread-safe (cherry picked from commit 9bad062268676aaa66dcbddd1e0ab7f2d7742425) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
* Make sure only owner can read / write to directories created for the job.Marcelo Vanzin2015-01-216-54/+68
| | | | | | | Whenever a directory is created by the utility method, immediately restrict its permissions so that only the owner has access to its contents. Signed-off-by: Josh Rosen <joshrosen@databricks.com>
* [SPARK-5006][Deploy]spark.port.maxRetries doesn't workWangTaoTheTonic2015-01-2114-32/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-5006 I think the issue is produced in https://github.com/apache/spark/pull/1777. Not digging mesos's backend yet. Maybe should add same logic either. Author: WangTaoTheTonic <barneystinson@aliyun.com> Author: WangTao <barneystinson@aliyun.com> Closes #3841 from WangTaoTheTonic/SPARK-5006 and squashes the following commits: 8cdf96d [WangTao] indent thing 2d86d65 [WangTaoTheTonic] fix line length 7cdfd98 [WangTaoTheTonic] fit for new HttpServer constructor 61a370d [WangTaoTheTonic] some minor fixes bc6e1ec [WangTaoTheTonic] rebase 67bcb46 [WangTaoTheTonic] put conf at 3rd position, modify suite class, add comments f450cd1 [WangTaoTheTonic] startServiceOnPort will use a SparkConf arg 29b751b [WangTaoTheTonic] rebase as ExecutorRunnableUtil changed to ExecutorRunnable 396c226 [WangTaoTheTonic] make the grammar more like scala 191face [WangTaoTheTonic] invalid value name 62ec336 [WangTaoTheTonic] spark.port.maxRetries doesn't work Conflicts: external/mqtt/src/test/scala/org/apache/spark/streaming/mqtt/MQTTStreamSuite.scala
* [SPARK-5064][GraphX] Add numEdges upperbound validation for R-MAT graph ↵Kenji Kikushima2015-01-212-0/+16
| | | | | | | | | | | | | | | | | generator to prevent infinite loop I looked into GraphGenerators#chooseCell, and found that chooseCell can't generate more edges than pow(2, (2 * (log2(numVertices)-1))) to make a Power-law graph. (Ex. numVertices:4 upperbound:4, numVertices:8 upperbound:16, numVertices:16 upperbound:64) If we request more edges over the upperbound, rmatGraph fall into infinite loop. So, how about adding an argument validation? Author: Kenji Kikushima <kikushima.kenji@lab.ntt.co.jp> Closes #3950 from kj-ki/SPARK-5064 and squashes the following commits: 4ee18c7 [Ankur Dave] Reword error message and add unit test d760bc7 [Kenji Kikushima] Add numEdges upperbound validation for R-MAT graph generator to prevent infinite loop. (cherry picked from commit 3ee3ab592eee831d759c940eb68231817ad6d083) Signed-off-by: Ankur Dave <ankurdave@gmail.com>
* [SPARK-4161]Spark shell class path is not correctly set if ↵GuoQiang Li2015-01-211-0/+7
| | | | | | | | | | | | "spark.driver.extraClassPath" is set in defaults.conf Author: GuoQiang Li <witgo@qq.com> Closes #3050 from witgo/SPARK-4161 and squashes the following commits: abb6fa4 [GuoQiang Li] move usejavacp opt to spark-shell 89e39e7 [GuoQiang Li] review commit c2a6f04 [GuoQiang Li] Spark shell class path is not correctly set if "spark.driver.extraClassPath" is set in defaults.conf
* [SPARK-4569] Rename 'externalSorting' in AggregatorIlya Ganelin2015-01-211-4/+6
| | | | | | | | | | | | | Hi all - I've renamed the unhelpfully named variable and added a comment clarifying what's actually happening. Author: Ilya Ganelin <ilya.ganelin@capitalone.com> Closes #3666 from ilganeli/SPARK-4569B and squashes the following commits: 1810394 [Ilya Ganelin] [SPARK-4569] Rename 'externalSorting' in Aggregator e2d2092 [Ilya Ganelin] [SPARK-4569] Rename 'externalSorting' in Aggregator d7cefec [Ilya Ganelin] [SPARK-4569] Rename 'externalSorting' in Aggregator 5b3f39c [Ilya Ganelin] [SPARK-4569] Rename in Aggregator
* [SPARK-4759] Fix driver hanging from coalescing partitionsAndrew Or2015-01-212-16/+22
| | | | | | | | | | | | | | | | | | The driver hangs sometimes when we coalesce RDD partitions. See JIRA for more details and reproduction. This is because our use of empty string as default preferred location in `CoalescedRDDPartition` causes the `TaskSetManager` to schedule the corresponding task on host `""` (empty string). The intended semantics here, however, is that the partition does not have a preferred location, and the TSM should schedule the corresponding task accordingly. Author: Andrew Or <andrew@databricks.com> Closes #3633 from andrewor14/coalesce-preferred-loc and squashes the following commits: e520d6b [Andrew Or] Oops 3ebf8bd [Andrew Or] A few comments f370a4e [Andrew Or] Fix tests 2f7dfb6 [Andrew Or] Avoid using empty string as default preferred location (cherry picked from commit 4f93d0cabe5d1fc7c0fd0a33d992fd85df1fecb4) Signed-off-by: Andrew Or <andrew@databricks.com>
* [HOTFIX] Update pom.xml to pull MapR's Hadoop version 2.4.1.Kannan Rajah2015-01-201-3/+3
| | | | | | | | | | | Author: Kannan Rajah <rkannan82@gmail.com> Closes #4108 from rkannan82/master and squashes the following commits: eca095b [Kannan Rajah] Update pom.xml to pull MapR's Hadoop version 2.4.1. (cherry picked from commit ec5b0f2cef4b30047c7f88bdc00d10b6aa308124) Signed-off-by: Patrick Wendell <patrick@databricks.com>
* [SPARK-5275] [Streaming] include python source codeDavies Liu2015-01-201-0/+8
| | | | | | | | | | | | | | | | Include the python source code into assembly jar. cc mengxr pwendell Author: Davies Liu <davies@databricks.com> Closes #4128 from davies/build_streaming2 and squashes the following commits: 546af4c [Davies Liu] fix indent 48859b2 [Davies Liu] include python source code (cherry picked from commit bad6c5721167153d7ed834b49f87bf2980c6ed67) Signed-off-by: Patrick Wendell <patrick@databricks.com>
* [SPARK-4959][SQL] Attributes are case sensitive when using a select query ↵Cheng Hao2015-01-202-6/+17
| | | | | | | | | | | | from a projection(Backport to Spark-1.2) This is a follow up of #3796 , which can not be merged back to Spark-1.2. Manually merge it. Author: Cheng Hao <hao.cheng@intel.com> Closes #4013 from chenghao-intel/spark_4959_backport and squashes the following commits: 1f6c93d [Cheng Hao] backport to Spark-1.2
* SPARK-4660: Use correct class loader in JavaSerializer (copy of PR #3840...Jacek Lewandowski2015-01-201-1/+1
| | | | | | | | | | | | | ... by Piotr Kolaczkowski) Author: Jacek Lewandowski <lewandowski.jacek@gmail.com> Closes #4113 from jacek-lewandowski/SPARK-4660-master and squashes the following commits: a5e84ca [Jacek Lewandowski] SPARK-4660: Use correct class loader in JavaSerializer (copy of PR #3840 by Piotr Kolaczkowski) (cherry picked from commit c93a57f0d6dc32b127aa68dbe4092ab0b22a9667) Signed-off-by: Patrick Wendell <patrick@databricks.com>
* [SPARK-4803] [streaming] Remove duplicate RegisterReceiver messageIlayaperumal Gopinathan2015-01-202-9/+2
| | | | | | | | | | | | | | | | | | | | | - The ReceiverTracker receivers `RegisterReceiver` messages two times 1) When the actor at `ReceiverSupervisorImpl`'s preStart is invoked 2) After the receiver is started at the executor `onReceiverStart()` at `ReceiverSupervisorImpl` Though, RegisterReceiver message uses the same streamId and the receiverInfo gets updated everytime the message is processed at the `ReceiverTracker`, it makes sense to call register receiver only after the receiver is started. Author: Ilayaperumal Gopinathan <igopinathan@pivotal.io> Closes #3648 from ilayaperumalg/RTActor-remove-prestart and squashes the following commits: 868efab [Ilayaperumal Gopinathan] Increase receiverInfo collector timeout to 2 secs 3118e5e [Ilayaperumal Gopinathan] Fix StreamingListenerSuite's startedReceiverStreamIds size 634abde [Ilayaperumal Gopinathan] Remove duplicate RegisterReceiver message (cherry picked from commit 4afad9c7702239f6d5b1b49dc48ee08580964e17) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
* [SPARK-4504][Examples] fix run-example failure if multiple assembly jars existVenkata Ramana Gollamudi2015-01-192-18/+36
| | | | | | | | | | | | | | | | | | Fix run-example script to fail fast with useful error message if multiple example assembly JARs are present. Author: Venkata Ramana Gollamudi <ramana.gollamudi@huawei.com> Closes #3377 from gvramana/run-example_fails and squashes the following commits: fa7f481 [Venkata Ramana Gollamudi] Fixed review comments, avoiding ls output scanning. 6aa1ab7 [Venkata Ramana Gollamudi] Fix run-examples script error during multiple jars (cherry picked from commit 74de94ea6db96a04b278c6106264313504d7b8f3) Signed-off-by: Josh Rosen <joshrosen@databricks.com> Conflicts: bin/compute-classpath.sh
* [SPARK-5282][mllib]: RowMatrix easily gets int overflow in the memory size ↵Yuhao Yang2015-01-191-2/+2
| | | | | | | | | | | | | | | | | | warning JIRA: https://issues.apache.org/jira/browse/SPARK-5282 fix the possible int overflow in the memory computation warning Author: Yuhao Yang <hhbyyh@gmail.com> Closes #4069 from hhbyyh/addscStop and squashes the following commits: e54e5c8 [Yuhao Yang] change to MB based number 7afac23 [Yuhao Yang] 5282: fix int overflow in the warning (cherry picked from commit 4432568aac1d4a44fa1a7c3469f095eb7a6ce945) Signed-off-by: Xiangrui Meng <meng@databricks.com>