aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* SPARK-1965 [WEBUI] Spark UI throws NPE on trying to load the app page for ↵Sean Owen2015-02-281-1/+10
| | | | | | | | | | | | | | non-existent app Don't throw NPE if appId is unknown. kayousterhout is this a decent enough band-aid for avoiding a full-blown NPE? it should just render empty content instead Author: Sean Owen <sowen@cloudera.com> Closes #4777 from srowen/SPARK-1965 and squashes the following commits: 7e16590 [Sean Owen] Update app not found message cb878d6 [Sean Owen] Return basic "not found" page for unknown appId d8270da [Sean Owen] Don't throw NPE if appId is unknown
* SPARK-5983 [WEBUI] Don't respond to HTTP TRACE in HTTP-based UIsSean Owen2015-02-282-0/+12
| | | | | | | | | | Disallow TRACE HTTP method in servlets Author: Sean Owen <sowen@cloudera.com> Closes #4765 from srowen/SPARK-5983 and squashes the following commits: 421b25b [Sean Owen] Disallow TRACE HTTP method in servlets
* SPARK-6063 MLlib doesn't pass mvn scalastyle check due to UTF chars in ↵Michael Griffiths2015-02-281-1/+1
| | | | | | | | | | | | | | | LDAModel.scala Remove unicode characters from MLlib file. Author: Michael Griffiths <msjgriffiths@gmail.com> Author: Griffiths, Michael (NYC-RPM) <michael.griffiths@reprisemedia.com> Closes #4815 from msjgriffiths/SPARK-6063 and squashes the following commits: bcd7de1 [Griffiths, Michael (NYC-RPM)] Change \u201D quote marks around 'theta' to standard single apostrophe (\x27) 38eb535 [Michael Griffiths] Merge pull request #2 from apache/master b08e865 [Michael Griffiths] Merge pull request #1 from apache/master
* [SPARK-5775] [SQL] BugFix: GenericRow cannot be cast to SpecificMutableRow ↵Cheng Lian2015-02-283-24/+217
| | | | | | | | | | | | | | | | | | | | | when nested data and partitioned table This PR adapts anselmevignon's #4697 to master and branch-1.3. Please refer to PR description of #4697 for details. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4792) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Author: Cheng Lian <liancheng@users.noreply.github.com> Author: Yin Huai <yhuai@databricks.com> Closes #4792 from liancheng/spark-5775 and squashes the following commits: 538f506 [Cheng Lian] Addresses comments cee55cf [Cheng Lian] Merge pull request #4 from yhuai/spark-5775-yin b0b74fb [Yin Huai] Remove runtime pattern matching. ca6e038 [Cheng Lian] Fixes SPARK-5775
* MAINTENANCE: Automated closing of pull requests.Patrick Wendell2015-02-270-0/+0
| | | | | | | | | This commit exists to close the following pull requests on Github: Closes #1128 (close requested by 'srowen') Closes #3425 (close requested by 'srowen') Closes #4770 (close requested by 'srowen') Closes #2813 (close requested by 'srowen')
* [SPARK-5979][SPARK-6032] Smaller safer --packages fixBurak Yavuz2015-02-272-18/+51
| | | | | | | | | | | | | | pwendell tdas This is the safer parts of PR #4754: - SPARK-5979: All dependencies with the groupId `org.apache.spark` passed through `--packages`, were being excluded from the dependency tree on the assumption that they would be in the assembly jar. This is not the case, therefore the exclusion rules had to be defined more explicitly. - SPARK-6032: Ivy prints a whole lot of logs while retrieving dependencies. These were printed to `System.out`. Moved the logging to `System.err`. Author: Burak Yavuz <brkyvz@gmail.com> Closes #4802 from brkyvz/simple-streaming-fix and squashes the following commits: e0f38cb [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into simple-streaming-fix bad921c [Burak Yavuz] [SPARK-5979][SPARK-6032] Smaller safer fix
* [SPARK-6070] [yarn] Remove unneeded classes from shuffle service jar.Marcelo Vanzin2015-02-271-1/+2
| | | | | | | | | | | | | These may conflict with the classes already in the NM. We shouldn't be repackaging them. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #4820 from vanzin/SPARK-6070 and squashes the following commits: 871b566 [Marcelo Vanzin] The "d'oh how didn't I think of it before" solution. 3cba946 [Marcelo Vanzin] Use profile instead, so that dependencies don't need to be explicitly listed. 7a18a1b [Marcelo Vanzin] [SPARK-6070] [yarn] Remove unneeded classes from shuffle service jar.
* [SPARK-6055] [PySpark] fix incorrect __eq__ of DataTypeDavies Liu2015-02-274-137/+86
| | | | | | | | | | | | | | | | | | | | The _eq_ of DataType is not correct, class cache is not use correctly (created class can not be find by dataType), then it will create lots of classes (saved in _cached_cls), never released. Also, all same DataType have same hash code, there will be many object in a dict with the same hash code, end with hash attach, it's very slow to access this dict (depends on the implementation of CPython). This PR also improve the performance of inferSchema (avoid the unnecessary converter of object). cc pwendell JoshRosen Author: Davies Liu <davies@databricks.com> Closes #4808 from davies/leak and squashes the following commits: 6a322a4 [Davies Liu] tests refactor 3da44fc [Davies Liu] fix __eq__ of Singleton 534ac90 [Davies Liu] add more checks 46999dc [Davies Liu] fix tests d9ae973 [Davies Liu] fix memory leak in sql
* [SPARK-5751] [SQL] Sets SPARK_HOME as SPARK_PID_DIR when running Thrift ↵Cheng Lian2015-02-281-2/+11
| | | | | | | | | | | | | | | | | | | server test suites This is a follow-up of #4720. By default, `spark-daemon.sh` writes PID files under `/tmp`, which makes it impossible to start multiple server instances simultaneously. This PR sets `SPARK_PID_DIR` to Spark home directory to workaround this problem. Many thanks to chenghao-intel for pointing out this issue! <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4758) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #4758 from liancheng/thriftserver-pid-dir and squashes the following commits: 252fa0f [Cheng Lian] Uses temporary directory as Thrift server PID directory 1b3d1e3 [Cheng Lian] Sets SPARK_HOME as SPARK_PID_DIR when running Thrift server test suites
* [Streaming][Minor] Remove useless type signature of Java Kafka direct stream APISaisai Shao2015-02-271-1/+1
| | | | | | | | | | cc tdas . Author: Saisai Shao <saisai.shao@intel.com> Closes #4817 from jerryshao/signature-minor-fix and squashes the following commits: eebfaac [Saisai Shao] Remove useless type parameter
* [SPARK-4587] [mllib] [docs] Fixed save,load calls in ML guide examplesJoseph K. Bradley2015-02-275-40/+60
| | | | | | | | | | | | | Should pass spark context to save/load CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #4816 from jkbradley/ml-io-doc-fix and squashes the following commits: 83d369d [Joseph K. Bradley] added comment to save,load parts of ML guide examples 2841170 [Joseph K. Bradley] Fixed save,load calls in ML guide examples
* [SPARK-6059][Yarn] Add volatile to ApplicationMaster's reporterThread and ↵zsxwing2015-02-271-2/+2
| | | | | | | | | | | | allocator `ApplicationMaster.reporterThread` and `ApplicationMaster.allocator` are accessed in multiple threads, so they should be marked as `volatile`. Author: zsxwing <zsxwing@gmail.com> Closes #4814 from zsxwing/SPARK-6059 and squashes the following commits: 17d9386 [zsxwing] Add volatile to ApplicationMaster's reporterThread and allocator
* [SPARK-6058][Yarn] Log the user class exception in ApplicationMasterzsxwing2015-02-271-2/+1
| | | | | | | | | | Because ApplicationMaster doesn't set SparkUncaughtExceptionHandler, the exception in the user class won't be logged. This PR added a `logError` for it. Author: zsxwing <zsxwing@gmail.com> Closes #4813 from zsxwing/SPARK-6058 and squashes the following commits: 806c932 [zsxwing] Log the user class exception
* [SPARK-6036][CORE] avoid race condition between eventlogListener and akka ↵Zhang, Liye2015-02-261-3/+3
| | | | | | | | | | | | | actor system For detail description, pls refer to [SPARK-6036](https://issues.apache.org/jira/browse/SPARK-6036). Author: Zhang, Liye <liye.zhang@intel.com> Closes #4785 from liyezhang556520/EventLogInProcess and squashes the following commits: 8b0b0a6 [Zhang, Liye] stop listener after DAGScheduler 79b15b3 [Zhang, Liye] SPARK-6036 avoid race condition between eventlogListener and akka actor system
* fix spark-6033, clarify the spark.worker.cleanup behavior in standalone mode许鹏2015-02-261-2/+1
| | | | | | | | | | | | | | jira case spark-6033 https://issues.apache.org/jira/browse/SPARK-6033 In standalone deploy mode, the cleanup will only remove the stopped application's directories. The original description about the cleanup behavior is incorrect. Author: 许鹏 <peng.xu@fraudmetrix.cn> Closes #4803 from hseagle/spark-6033 and squashes the following commits: 927a6a0 [许鹏] fix the incorrect description about the spark.worker.cleanup in standalone mode
* [SPARK-6046] Privatize SparkConf.translateConfKeyAndrew Or2015-02-262-3/+3
| | | | | | | | | | The warning of deprecated configs is actually done when the configs are set, not when they are get. As a result we don't need to explicitly call `translateConfKey` outside of `SparkConf` just to print the warning again in vain. Author: Andrew Or <andrew@databricks.com> Closes #4797 from andrewor14/warn-deprecated-config and squashes the following commits: 8fb43e6 [Andrew Or] Privatize SparkConf.translateConfKey
* SPARK-2168 [Spark core] Use relative URIs for the app links in the History ↵Lukasz Jastrzebski2015-02-261-0/+56
| | | | | | | | | | | | | | Server. As agreed in PR #1160 adding test to verify if history server generates relative links to applications. Author: Lukasz Jastrzebski <lukasz.jastrzebski@gmail.com> Closes #4778 from elyast/master and squashes the following commits: 0c07fab [Lukasz Jastrzebski] Incorporating comments for SPARK-2168 6d7866d [Lukasz Jastrzebski] Adjusting test for SPARK-2168 for master branch d6f4fbe [Lukasz Jastrzebski] Added test for SPARK-2168
* [SPARK-5495][UI] Add app and driver kill function in master web UIjerryshao2015-02-263-5/+58
| | | | | | | | | | | | | | | | | | Add application kill function in master web UI for standalone mode. Details can be seen in [SPARK-5495](https://issues.apache.org/jira/browse/SPARK-5495). The snapshot of UI shows as below: ![snapshot](https://dl.dropboxusercontent.com/u/19230832/master_ui.png) Please help to review, thanks a lot. Author: jerryshao <saisai.shao@intel.com> Closes #4288 from jerryshao/SPARK-5495 and squashes the following commits: fa3e486 [jerryshao] Add some conditions 9a7be93 [jerryshao] Add kill Driver function a239776 [jerryshao] Change the code format ff5195d [jerryshao] Add app kill function in master web UI
* [SPARK-5771][UI][hotfix] Change Requested Cores into * if default cores is ↵jerryshao2015-02-261-1/+1
| | | | | | | | | | | | not set cc andrewor14, srowen. Author: jerryshao <saisai.shao@intel.com> Closes #4800 from jerryshao/SPARK-5771 and squashes the following commits: a2483c2 [jerryshao] Change the UI of Requested Cores into * if default cores is not set
* [SPARK-6024][SQL] When a data source table has too many columns, it's schema ↵Yin Huai2015-02-263-6/+54
| | | | | | | | | | | | | | | | | cannot be stored in metastore. JIRA: https://issues.apache.org/jira/browse/SPARK-6024 Author: Yin Huai <yhuai@databricks.com> Closes #4795 from yhuai/wideSchema and squashes the following commits: 4882e6f [Yin Huai] Address comments. 73e71b4 [Yin Huai] Address comments. 143927a [Yin Huai] Simplify code. cc1d472 [Yin Huai] Make the schema wider. 12bacae [Yin Huai] If the JSON string of a schema is too large, split it before storing it in metastore. e9b4f70 [Yin Huai] Failed test.
* [SPARK-6037][SQL] Avoiding duplicate Parquet schema mergingLiang-Chi Hsieh2015-02-271-16/+7
| | | | | | | | | | `FilteringParquetRowInputFormat` manually merges Parquet schemas before computing splits. However, it is duplicate because the schemas are already merged in `ParquetRelation2`. We don't need to re-merge them at `InputFormat`. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #4786 from viirya/dup_parquet_schemas_merge and squashes the following commits: ef78a5a [Liang-Chi Hsieh] Avoiding duplicate Parquet schema merging.
* [SPARK-5529][CORE]Add expireDeadHosts in HeartbeatReceiverHong Shen2015-02-267-49/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a blockManager has not send heartBeat more than 120s, BlockManagerMasterActor will remove it. But coarseGrainedSchedulerBackend can only remove executor after an DisassociatedEvent. We should expireDeadHosts at HeartbeatReceiver. Author: Hong Shen <hongshen@tencent.com> Closes #4363 from shenh062326/my_change3 and squashes the following commits: 2c9a46a [Hong Shen] Change some code style. 1a042ff [Hong Shen] Change some code style. 2dc456e [Hong Shen] Change some code style. d221493 [Hong Shen] Fix test failed 7448ac6 [Hong Shen] A minor change in sparkContext and heartbeatReceiver b904aed [Hong Shen] Fix failed test 52725af [Hong Shen] Remove assert in SparkContext.killExecutors 5bedcb8 [Hong Shen] Remove assert in SparkContext.killExecutors a858fb5 [Hong Shen] A minor change in HeartbeatReceiver 3e221d9 [Hong Shen] A minor change in HeartbeatReceiver 6bab7aa [Hong Shen] Change a code style. 07952f3 [Hong Shen] Change configs name and code style. ce9257e [Hong Shen] Fix test failed bccd515 [Hong Shen] Fix test failed 8e77408 [Hong Shen] Fix test failed c1dfda1 [Hong Shen] Fix test failed e197e20 [Hong Shen] Fix test failed fb5df97 [Hong Shen] Remove ExpireDeadHosts in BlockManagerMessages b5c0441 [Hong Shen] Remove expireDeadHosts in BlockManagerMasterActor c922cb0 [Hong Shen] Add expireDeadHosts in HeartbeatReceiver
* SPARK-4579 [WEBUI] Scheduling Delay appears negativeSean Owen2015-02-261-6/+7
| | | | | | | | | | Ensure scheduler delay handles unfinished task case, and ensure delay is never negative even due to rounding Author: Sean Owen <sowen@cloudera.com> Closes #4796 from srowen/SPARK-4579 and squashes the following commits: ad6713c [Sean Owen] Ensure scheduler delay handles unfinished task case, and ensure delay is never negative even due to rounding
* SPARK-6045 RecordWriter should be checked against null in PairRDDFunctio...tedyu2015-02-261-0/+1
| | | | | | | | | | | ...ns#saveAsNewAPIHadoopDataset Author: tedyu <yuzhihong@gmail.com> Closes #4794 from tedyu/master and squashes the following commits: 2632a57 [tedyu] SPARK-6045 RecordWriter should be checked against null in PairRDDFunctions#saveAsNewAPIHadoopDataset 2d8d4b1 [tedyu] SPARK-6045 RecordWriter should be checked against null in PairRDDFunctions#saveAsNewAPIHadoopDataset
* [SPARK-5951][YARN] Remove unreachable driver memory properties in yarn ↵mohit.goyal2015-02-261-6/+0
| | | | | | | | | | | | client mode Remove unreachable driver memory properties in yarn client mode Author: mohit.goyal <mohit.goyal@guavus.com> Closes #4730 from zuxqoj/master and squashes the following commits: 977dc96 [mohit.goyal] remove not rechable deprecated variables in yarn client mode
* Add a note for context termination for History server on Yarnmoussa taifi2015-02-261-0/+2
| | | | | | | | | | | | The history server on Yarn only shows completed jobs. This adds a note concerning the needed explicit context termination at the end of a spark job which is a best practice anyway. Related to SPARK-2972 and SPARK-3458 Author: moussa taifi <moutai10@gmail.com> Closes #4721 from moutai/add-history-server-note-for-closing-the-spark-context and squashes the following commits: 9f5b6c3 [moussa taifi] Fix upper case typo for YARN 3ad3db4 [moussa taifi] Add context termination for History server on Yarn
* SPARK-4300 [CORE] Race condition during SparkWorker shutdownSean Owen2015-02-261-2/+1
| | | | | | | | | | | | | Close appender saving stdout/stderr before destroying process to avoid exception on reading closed input stream. (This also removes a redundant `waitFor()` although it was harmless) CC tdas since I think you wrote this method. Author: Sean Owen <sowen@cloudera.com> Closes #4787 from srowen/SPARK-4300 and squashes the following commits: e0cdabf [Sean Owen] Close appender saving stdout/stderr before destroying process to avoid exception on reading closed input stream
* [SPARK-6018] [YARN] NoSuchMethodError in Spark app is swallowed by YARN AMCheolsoo Park2015-02-261-3/+3
| | | | | | | | | | Author: Cheolsoo Park <cheolsoop@netflix.com> Closes #4773 from piaozhexiu/SPARK-6018 and squashes the following commits: 2a919d5 [Cheolsoo Park] Rename e with cause to avoid duplicate names 1e71d2d [Cheolsoo Park] Replace placeholder with throwable eb5750d [Cheolsoo Park] NoSuchMethodError in Spark app is swallowed by YARN AM
* [SPARK-6027][SPARK-5546] Fixed --jar and --packages not working for ↵Tathagata Das2015-02-262-16/+55
| | | | | | | | | | | | | | | | | | KafkaUtils and improved error message The problem with SPARK-6027 in short is that JARs like the kafka-assembly.jar does not work in python as the added JAR is not visible in the classloader used by Py4J. Py4J uses Class.forName(), which does not uses the systemclassloader, but the JARs are only visible in the Thread's contextclassloader. So this back uses the context class loader to create the KafkaUtils dstream object. This works for both cases where the Kafka libraries are added with --jars spark-streaming-kafka-assembly.jar or with --packages spark-streaming-kafka Also improves the error message. davies Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #4779 from tdas/kafka-python-fix and squashes the following commits: fb16b04 [Tathagata Das] Removed import c1fdf35 [Tathagata Das] Fixed long line and improved documentation 7b88be8 [Tathagata Das] Fixed --jar not working for KafkaUtils and improved error message
* [SPARK-3562]Periodic cleanup event logsxukun 002289472015-02-263-35/+110
| | | | | | | | | | | | | | | Author: xukun 00228947 <xukun.xu@huawei.com> Closes #4214 from viper-kun/cleaneventlog and squashes the following commits: 7a5b9c5 [xukun 00228947] fix issue 31674ee [xukun 00228947] fix issue 6e3d06b [xukun 00228947] fix issue 373f3b9 [xukun 00228947] fix issue 71782b5 [xukun 00228947] fix issue 5b45035 [xukun 00228947] fix issue 70c28d6 [xukun 00228947] fix issues adcfe86 [xukun 00228947] Periodic cleanup event logs
* Modify default value description for ↵Li Zhihui2015-02-261-1/+1
| | | | | | | | | | | | | spark.scheduler.minRegisteredResourcesRatio on docs. The configuration is not supported in mesos mode now. See https://github.com/apache/spark/pull/1462 Author: Li Zhihui <zhihui.li@intel.com> Closes #4781 from li-zhihui/fixdocconf and squashes the following commits: 63e7a44 [Li Zhihui] Modify default value description for spark.scheduler.minRegisteredResourcesRatio on docs.
* SPARK-4704 [CORE] SparkSubmitDriverBootstrap doesn't flush outputSean Owen2015-02-261-2/+2
| | | | | | | | | | | | Join on output threads to make sure any lingering output from process reaches stdout, stderr before exiting CC andrewor14 since I believe he created this section of code Author: Sean Owen <sowen@cloudera.com> Closes #4788 from srowen/SPARK-4704 and squashes the following commits: ad7114e [Sean Owen] Join on output threads to make sure any lingering output from process reaches stdout, stderr before exiting
* [SPARK-5363] Fix bug in PythonRDD: remove() inside iterator is not safeDavies Liu2015-02-261-7/+6
| | | | | | | | | | | | | | Removing elements from a mutable HashSet while iterating over it can cause the iteration to incorrectly skip over entries that were not removed. If this happened, PythonRDD would write fewer broadcast variables than the Python worker was expecting to read, which would cause the Python worker to hang indefinitely. Author: Davies Liu <davies@databricks.com> Closes #4776 from davies/fix_hang and squashes the following commits: a4384a5 [Davies Liu] fix bug: remvoe() inside iterator is not safe
* [SPARK-6004][MLlib] Pick the best model when training GradientBoostedTrees ↵Liang-Chi Hsieh2015-02-261-3/+9
| | | | | | | | | | | | | with validation Since the validation error does not change monotonically, in practice, it should be proper to pick the best model when training GradientBoostedTrees with validation instead of stopping it early. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #4763 from viirya/gbt_record_model and squashes the following commits: 452e049 [Liang-Chi Hsieh] Address comment. ea2fae2 [Liang-Chi Hsieh] Pick the best model when training GradientBoostedTrees with validation.
* [SPARK-6007][SQL] Add numRows param in DataFrame.show()Jacky Li2015-02-264-6/+27
| | | | | | | | | | | | | | It is useful to let the user decide the number of rows to show in DataFrame.show Author: Jacky Li <jacky.likun@huawei.com> Closes #4767 from jackylk/show and squashes the following commits: a0e0f4b [Jacky Li] fix testcase 7cdbe91 [Jacky Li] modify according to comment bb54537 [Jacky Li] for Java compatibility d7acc18 [Jacky Li] modify according to comments 981be52 [Jacky Li] add numRows param in DataFrame.show()
* [SPARK-5801] [core] Avoid creating nested directories.Marcelo Vanzin2015-02-264-5/+32
| | | | | | | | | | | | | | | | Cache the value of the local root dirs to use for storing local data, so that the same directories are reused. Also, to avoid an extra level of nesting, use a different env variable to propagate the local dirs from the Worker to the executors. And make the executor directory use a different name. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #4747 from vanzin/SPARK-5801 and squashes the following commits: e0114e1 [Marcelo Vanzin] Update unit test. 18ee0a7 [Marcelo Vanzin] [SPARK-5801] [core] Avoid creating nested directories.
* [SPARK-6016][SQL] Cannot read the parquet table after overwriting the ↵Yin Huai2015-02-273-42/+42
| | | | | | | | | | | | | existing table when spark.sql.parquet.cacheMetadata=true Please see JIRA (https://issues.apache.org/jira/browse/SPARK-6016) for details of the bug. Author: Yin Huai <yhuai@databricks.com> Closes #4775 from yhuai/parquetFooterCache and squashes the following commits: 78787b1 [Yin Huai] Remove footerCache in FilteringParquetRowInputFormat. dff6fba [Yin Huai] Failed unit test.
* [SPARK-6023][SQL] ParquetConversions fails to replace the destination ↵Yin Huai2015-02-262-7/+152
| | | | | | | | | | | | | | | MetastoreRelation of an InsertIntoTable node to ParquetRelation2 JIRA: https://issues.apache.org/jira/browse/SPARK-6023 Author: Yin Huai <yhuai@databricks.com> Closes #4782 from yhuai/parquetInsertInto and squashes the following commits: ae7e806 [Yin Huai] Convert MetastoreRelation in InsertIntoTable and InsertIntoHiveTable. ba543cd [Yin Huai] More tests. 50b6d0f [Yin Huai] Update error messages. 346780c [Yin Huai] Failed test.
* [SPARK-5914] to run spark-submit requiring only user perm on windowsJudy Nash2015-02-261-0/+6
| | | | | | | | | | | | Because windows on-default does not grant read permission to jars except to admin, spark-submit would fail with "ClassNotFound" exception if user runs slave service with only user permission. This fix is to add read permission to owner of the jar (which would be the slave service account in windows ) Author: Judy Nash <judynash@microsoft.com> Closes #4742 from judynash/SPARK-5914 and squashes the following commits: e288e56 [Judy Nash] Fix spacing and refactor code 1de3c0e [Judy Nash] [SPARK-5914] Enable spark-submit to run requiring only user permission on windows
* [SPARK-5976][MLLIB] Add partitioner to factors returned by ALSXiangrui Meng2015-02-252-23/+64
| | | | | | | | | | | | | The model trained by ALS requires partitioning information to do quick lookup of a user/item factor for making recommendation on individual requests. In the new implementation, we didn't set partitioners in the factors returned by ALS, which would cause performance regression. srowen coderxiang Author: Xiangrui Meng <meng@databricks.com> Closes #4748 from mengxr/SPARK-5976 and squashes the following commits: 9373a09 [Xiangrui Meng] add partitioner to factors returned by ALS 260f183 [Xiangrui Meng] add a test for partitioner
* [SPARK-5974] [SPARK-5980] [mllib] [python] [docs] Update ML guide with ↵Joseph K. Bradley2015-02-259-97/+296
| | | | | | | | | | | | | | | | | | | | | save/load, Python GBT * Add GradientBoostedTrees Python examples to ML guide * I ran these in the pyspark shell, and they worked. * Add save/load to examples in ML guide * Added note to python docs about predict,transform not working within RDD actions,transformations in some cases (See SPARK-5981) CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #4750 from jkbradley/SPARK-5974 and squashes the following commits: c410e38 [Joseph K. Bradley] Added note to LabeledPoint about attributes bcae18b [Joseph K. Bradley] Added import of models for save/load examples in ml guide. Fixed line length for tree.py, feature.py (but not other ML Pyspark files yet). 6d81c3e [Joseph K. Bradley] completed python GBT examples 9903309 [Joseph K. Bradley] Added note to python docs about predict,transform not working within RDD actions,transformations in some cases c7dfad8 [Joseph K. Bradley] Added model save/load to ML guide. Added GBT examples to ML guide
* [SPARK-1182][Docs] Sort the configuration parameters in configuration.mdBrennon York2015-02-252-512/+496
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sorts all configuration options present on the `configuration.md` page to ease readability. Author: Brennon York <brennon.york@capitalone.com> Closes #3863 from brennonyork/SPARK-1182 and squashes the following commits: 5696f21 [Brennon York] fixed merge conflict with port comments 81a7b10 [Brennon York] capitalized A in Allocation e240486 [Brennon York] moved all spark.mesos properties into the running-on-mesos doc 7de5f75 [Brennon York] moved serialization from application to compression and serialization section a16fec0 [Brennon York] moved shuffle settings from network to shuffle f8fa286 [Brennon York] sorted encryption category 1023f15 [Brennon York] moved initialExecutors e9d62aa [Brennon York] fixed akka.heartbeat.interval 25e6f6f [Brennon York] moved spark.executer.user* 4625ade [Brennon York] added spark.executor.extra* items 4ee5648 [Brennon York] fixed merge conflicts 1b49234 [Brennon York] sorting mishap 2b5758b [Brennon York] sorting mishap 6fbdf42 [Brennon York] sorting mishap 55dc6f8 [Brennon York] sorted security ec34294 [Brennon York] sorted dynamic allocation 2a7c4a3 [Brennon York] sorted scheduling aa9acdc [Brennon York] sorted networking a4380b8 [Brennon York] sorted execution behavior 27f3919 [Brennon York] sorted compression and serialization 80a5bbb [Brennon York] sorted spark ui 3f32e5b [Brennon York] sorted shuffle behavior 6c51b38 [Brennon York] sorted runtime environment efe9d6f [Brennon York] sorted application properties
* [SPARK-5926] [SQL] make DataFrame.explain leverage queryExecution.logicalYanbo Liang2015-02-251-1/+1
| | | | | | | | | | | | | | | | | DataFrame.explain return wrong result when the query is DDL command. For example, the following two queries should print out the same execution plan, but it not. sql("create table tb as select * from src where key > 490").explain(true) sql("explain extended create table tb as select * from src where key > 490") This is because DataFrame.explain leverage logicalPlan which had been forced executed, we should use the unexecuted plan queryExecution.logical. Author: Yanbo Liang <ybliang8@gmail.com> Closes #4707 from yanboliang/spark-5926 and squashes the following commits: fa6db63 [Yanbo Liang] logicalPlan is not lazy 0e40a1b [Yanbo Liang] make DataFrame.explain leverage queryExecution.logical
* [SPARK-5999][SQL] Remove duplicate Literal matching blockLiang-Chi Hsieh2015-02-252-19/+3
| | | | | | | | Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #4760 from viirya/dup_literal and squashes the following commits: 06e7516 [Liang-Chi Hsieh] Remove duplicate Literal matching block.
* [SPARK-6010] [SQL] Merging compatible Parquet schemas before computing splitsCheng Lian2015-02-253-1/+45
| | | | | | | | | | | | | | | | `ReadContext.init` calls `InitContext.getMergedKeyValueMetadata`, which doesn't know how to merge conflicting user defined key-value metadata and throws exception. In our case, when dealing with different but compatible schemas, we have different Spark SQL schema JSON strings in different Parquet part-files, thus causes this problem. Reading similar Parquet files generated by Hive doesn't suffer from this issue. In this PR, we manually merge the schemas before passing it to `ReadContext` to avoid the exception. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4768) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #4768 from liancheng/spark-6010 and squashes the following commits: 9002f0a [Cheng Lian] Fixes SPARK-6010
* [SPARK-5944] [PySpark] fix version in Python API docsDavies Liu2015-02-254-5/+9
| | | | | | | | | | | use RELEASE_VERSION when building the Python API docs Author: Davies Liu <davies@databricks.com> Closes #4731 from davies/api_version and squashes the following commits: c9744c9 [Davies Liu] Update create-release.sh 08cbc3f [Davies Liu] fix python docs
* [SPARK-5982] Remove incorrect Local Read Time MetricKay Ousterhout2015-02-255-16/+0
| | | | | | | | | | | | | | | | This metric is incomplete, because the files are memory mapped, so much of the read from disk occurs later as tasks actually read the file's data. This should be merged into 1.3, so that we never expose this incorrect metric to users. CC pwendell ksakellis sryza Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #4749 from kayousterhout/SPARK-5982 and squashes the following commits: 9737b5e [Kay Ousterhout] More fixes a1eb300 [Kay Ousterhout] Removed one more use of local read time cf13497 [Kay Ousterhout] [SPARK-5982] Remove incorrectwq Local Read Time Metric
* [SPARK-1955][GraphX]: VertexRDD can incorrectly assume index sharingBrennon York2015-02-251-3/+9
| | | | | | | | | | Fixes the issue whereby when VertexRDD's are `diff`ed, `innerJoin`ed, or `leftJoin`ed and have different partition sizes they fail under the `zipPartitions` method. This fix tests whether the partitions are equal or not and, if not, will repartition the other to match the partition size of the calling VertexRDD. Author: Brennon York <brennon.york@capitalone.com> Closes #4705 from brennonyork/SPARK-1955 and squashes the following commits: 0882590 [Brennon York] updated to properly handle differently-partitioned vertexRDDs
* [SPARK-5970][core] Register directory created in getOrCreateLocalRootDirs ↵Milan Straka2015-02-251-1/+1
| | | | | | | | | | | | for automatic deletion. As documented in createDirectory, the result of createDirectory is not registered for automatic removal. Currently there are 4 directories left in `/tmp` after just running `pyspark`. Author: Milan Straka <fox@ucw.cz> Closes #4759 from foxik/remove-tmp-dirs and squashes the following commits: 280450d [Milan Straka] Use createTempDir in getOrCreateLocalRootDirs...
* SPARK-5930 [DOCS] Documented default of spark.shuffle.io.retryWait is confusingSean Owen2015-02-251-1/+3
| | | | | | | | | | | | Clarify default max wait in spark.shuffle.io.retryWait docs CC andrewor14 Author: Sean Owen <sowen@cloudera.com> Closes #4769 from srowen/SPARK-5930 and squashes the following commits: ae2792b [Sean Owen] Clarify default max wait in spark.shuffle.io.retryWait docs