| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Should pass spark context to save/load
CC: mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #4816 from jkbradley/ml-io-doc-fix and squashes the following commits:
83d369d [Joseph K. Bradley] added comment to save,load parts of ML guide examples
2841170 [Joseph K. Bradley] Fixed save,load calls in ML guide examples
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
jira case spark-6033 https://issues.apache.org/jira/browse/SPARK-6033
In standalone deploy mode, the cleanup will only remove the stopped application's directories.
The original description about the cleanup behavior is incorrect.
Author: 许鹏 <peng.xu@fraudmetrix.cn>
Closes #4803 from hseagle/spark-6033 and squashes the following commits:
927a6a0 [许鹏] fix the incorrect description about the spark.worker.cleanup in standalone mode
|
|
|
|
|
|
|
|
|
|
|
|
| |
The history server on Yarn only shows completed jobs. This adds a note concerning the needed explicit context termination at the end of a spark job which is a best practice anyway.
Related to SPARK-2972 and SPARK-3458
Author: moussa taifi <moutai10@gmail.com>
Closes #4721 from moutai/add-history-server-note-for-closing-the-spark-context and squashes the following commits:
9f5b6c3 [moussa taifi] Fix upper case typo for YARN
3ad3db4 [moussa taifi] Add context termination for History server on Yarn
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: xukun 00228947 <xukun.xu@huawei.com>
Closes #4214 from viper-kun/cleaneventlog and squashes the following commits:
7a5b9c5 [xukun 00228947] fix issue
31674ee [xukun 00228947] fix issue
6e3d06b [xukun 00228947] fix issue
373f3b9 [xukun 00228947] fix issue
71782b5 [xukun 00228947] fix issue
5b45035 [xukun 00228947] fix issue
70c28d6 [xukun 00228947] fix issues
adcfe86 [xukun 00228947] Periodic cleanup event logs
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
spark.scheduler.minRegisteredResourcesRatio on docs.
The configuration is not supported in mesos mode now.
See https://github.com/apache/spark/pull/1462
Author: Li Zhihui <zhihui.li@intel.com>
Closes #4781 from li-zhihui/fixdocconf and squashes the following commits:
63e7a44 [Li Zhihui] Modify default value description for spark.scheduler.minRegisteredResourcesRatio on docs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
save/load, Python GBT
* Add GradientBoostedTrees Python examples to ML guide
* I ran these in the pyspark shell, and they worked.
* Add save/load to examples in ML guide
* Added note to python docs about predict,transform not working within RDD actions,transformations in some cases (See SPARK-5981)
CC: mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #4750 from jkbradley/SPARK-5974 and squashes the following commits:
c410e38 [Joseph K. Bradley] Added note to LabeledPoint about attributes
bcae18b [Joseph K. Bradley] Added import of models for save/load examples in ml guide. Fixed line length for tree.py, feature.py (but not other ML Pyspark files yet).
6d81c3e [Joseph K. Bradley] completed python GBT examples
9903309 [Joseph K. Bradley] Added note to python docs about predict,transform not working within RDD actions,transformations in some cases
c7dfad8 [Joseph K. Bradley] Added model save/load to ML guide. Added GBT examples to ML guide
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sorts all configuration options present on the `configuration.md` page to ease readability.
Author: Brennon York <brennon.york@capitalone.com>
Closes #3863 from brennonyork/SPARK-1182 and squashes the following commits:
5696f21 [Brennon York] fixed merge conflict with port comments
81a7b10 [Brennon York] capitalized A in Allocation
e240486 [Brennon York] moved all spark.mesos properties into the running-on-mesos doc
7de5f75 [Brennon York] moved serialization from application to compression and serialization section
a16fec0 [Brennon York] moved shuffle settings from network to shuffle
f8fa286 [Brennon York] sorted encryption category
1023f15 [Brennon York] moved initialExecutors
e9d62aa [Brennon York] fixed akka.heartbeat.interval
25e6f6f [Brennon York] moved spark.executer.user*
4625ade [Brennon York] added spark.executor.extra* items
4ee5648 [Brennon York] fixed merge conflicts
1b49234 [Brennon York] sorting mishap
2b5758b [Brennon York] sorting mishap
6fbdf42 [Brennon York] sorting mishap
55dc6f8 [Brennon York] sorted security
ec34294 [Brennon York] sorted dynamic allocation
2a7c4a3 [Brennon York] sorted scheduling
aa9acdc [Brennon York] sorted networking
a4380b8 [Brennon York] sorted execution behavior
27f3919 [Brennon York] sorted compression and serialization
80a5bbb [Brennon York] sorted spark ui
3f32e5b [Brennon York] sorted shuffle behavior
6c51b38 [Brennon York] sorted runtime environment
efe9d6f [Brennon York] sorted application properties
|
|
|
|
|
|
|
|
|
|
|
|
| |
Clarify default max wait in spark.shuffle.io.retryWait docs
CC andrewor14
Author: Sean Owen <sowen@cloudera.com>
Closes #4769 from srowen/SPARK-5930 and squashes the following commits:
ae2792b [Sean Owen] Clarify default max wait in spark.shuffle.io.retryWait docs
|
|
|
|
|
|
|
|
|
|
| |
Corrected 3 Typos in the GraphX programming guide. I hope this is the correct way to contribute.
Author: Benedikt Linse <benedikt.linse@gmail.com>
Closes #4766 from 1123/master and squashes the following commits:
8a63812 [Benedikt Linse] fixing 3 typos in the graphx programming guide
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
select empty should NOT be the same as select. make sure selectExpr is behaving the same.
join param documentation
link to source doesn't work in jekyll generated file
cross reference of columns (i.e. enabling linking)
show(): move df example before df.show()
move tests in SQLContext out of docstring otherwise doc is too long
Column.desc and .asc doesn't have any documentation
in documentation, sort functions.*)
Author: Davies Liu <davies@databricks.com>
Closes #4756 from davies/df_docs and squashes the following commits:
f30502c [Davies Liu] fix doc
32f0d46 [Davies Liu] fix DataFrame docs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
One can early stop if the decrease in error rate is lesser than a certain tol or if the error increases if the training data is overfit.
This introduces a new method runWithValidation which takes in a pair of RDD's , one for the training data and the other for the validation.
Author: MechCoder <manojkumarsivaraj334@gmail.com>
Closes #4677 from MechCoder/spark-5436 and squashes the following commits:
1bb21d4 [MechCoder] Combine regression and classification tests into a single one
e4d799b [MechCoder] Addresses indentation and doc comments
b48a70f [MechCoder] COSMIT
b928a19 [MechCoder] Move validation while training section under usage tips
fad9b6e [MechCoder] Made the following changes 1. Add section to documentation 2. Return corresponding to bestValidationError 3. Allow negative tolerance.
55e5c3b [MechCoder] One liner for prevValidateError
3e74372 [MechCoder] TST: Add test for classification
77549a9 [MechCoder] [SPARK-5436] Validate GradientBoostedTrees using runWithValidation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add Slf4jSink to Spark Metrics using Coda Hale's SlfjReporter.
This sends metrics to log4j, allowing spark users to reuse log4j pipeline for metrics collection.
Reviewed existing unit tests and didn't see any sink-related tests. Please advise on if tests should be added.
Author: Judy <judynash@microsoft.com>
Author: judynash <judynash@microsoft.com>
Closes #4644 from judynash/master and squashes the following commits:
57ef214 [judynash] doc clarification and indent fixes
a751a66 [Judy] Spark-5708: Add Slf4jSink to Spark Metrics
|
|
|
|
|
|
|
|
|
|
| |
Variance is calculated on labels/responses.
Author: Xiangrui Meng <meng@databricks.com>
Closes #4740 from mengxr/patch-1 and squashes the following commits:
673317b [Xiangrui Meng] [MLLIB] Change x_i to y_i in Variance's user guide
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Removed SVD code from examples.
* Corrected Java API doc link.
* Updated variable names: `AtransposeA` -> `ata`.
* Minor changes.
brkyvz
Author: Xiangrui Meng <meng@databricks.com>
Closes #4737 from mengxr/update-block-matrix-user-guide and squashes the following commits:
70f53ac [Xiangrui Meng] update block matrix user guide
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes:
* typo in Scala example
* Removed comment "usually applied on sparse data" since that is debatable
* small edits to text for clarity
CC: avulanov I noticed a typo post-hoc and ended up making a few small edits. Do the changes look OK?
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #4732 from jkbradley/chisqselector-docs and squashes the following commits:
9656a3b [Joseph K. Bradley] added Java example for ChiSqSelector to guide
3f3f9f4 [Joseph K. Bradley] small fixes to ChiSqSelector docs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added description of ChiSqSelector and few words about feature selection in general. I could add a code example, however it would not look reasonable in the absence of feature discretizer or a dataset in the `data` folder that has redundant features.
Author: Alexander Ulanov <nashb@yandex.ru>
Closes #4709 from avulanov/SPARK-5912 and squashes the following commits:
19a8a4e [Alexander Ulanov] Addressing reviewers comments @jkbradley
58d9e4d [Alexander Ulanov] Addressing reviewers comments @jkbradley
eb6b9fe [Alexander Ulanov] Typo
2921a1d [Alexander Ulanov] ChiSqSelector example of use
c845350 [Alexander Ulanov] ChiSqSelector docs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-5724
In AkkaUtil, we set several failure detector related the parameters as following
```
al akkaConf = ConfigFactory.parseMap(conf.getAkkaConf.toMap[String, String])
.withFallback(akkaSslConfig).withFallback(ConfigFactory.parseString(
s"""
|akka.daemonic = on
|akka.loggers = [""akka.event.slf4j.Slf4jLogger""]
|akka.stdout-loglevel = "ERROR"
|akka.jvm-exit-on-fatal-error = off
|akka.remote.require-cookie = "$requireCookie"
|akka.remote.secure-cookie = "$secureCookie"
|akka.remote.transport-failure-detector.heartbeat-interval = $akkaHeartBeatInterval s
|akka.remote.transport-failure-detector.acceptable-heartbeat-pause = $akkaHeartBeatPauses s
|akka.remote.transport-failure-detector.threshold = $akkaFailureDetector
|akka.actor.provider = "akka.remote.RemoteActorRefProvider"
|akka.remote.netty.tcp.transport-class = "akka.remote.transport.netty.NettyTransport"
|akka.remote.netty.tcp.hostname = "$host"
|akka.remote.netty.tcp.port = $port
|akka.remote.netty.tcp.tcp-nodelay = on
|akka.remote.netty.tcp.connection-timeout = $akkaTimeout s
|akka.remote.netty.tcp.maximum-frame-size = ${akkaFrameSize}B
|akka.remote.netty.tcp.execution-pool-size = $akkaThreads
|akka.actor.default-dispatcher.throughput = $akkaBatchSize
|akka.log-config-on-start = $logAkkaConfig
|akka.remote.log-remote-lifecycle-events = $lifecycleEvents
|akka.log-dead-letters = $lifecycleEvents
|akka.log-dead-letters-during-shutdown = $lifecycleEvents
""".stripMargin))
```
Actually, we do not have any parameter naming "akka.remote.transport-failure-detector.threshold"
see: http://doc.akka.io/docs/akka/2.3.4/general/configuration.html
what we have is "akka.remote.watch-failure-detector.threshold"
Author: CodingCat <zhunansjtu@gmail.com>
Closes #4512 from CodingCat/SPARK-5724 and squashes the following commits:
bafe56e [CodingCat] fix the grammar in configuration doc
338296e [CodingCat] remove failure-detector related info
8bfcfd4 [CodingCat] fix the misconfiguration in AkkaUtils
|
|
|
|
|
|
|
|
|
|
|
|
| |
MapReduce API
This looks like a simple typo ```SparkContext.newHadoopRDD``` instead of ```SparkContext.newAPIHadoopRDD``` as in actual http://spark.apache.org/docs/1.2.1/api/scala/index.html#org.apache.spark.SparkContext
Author: Alexander <abezzubov@nflabs.com>
Closes #4718 from bzz/hadoop-InputFormats-doc-fix and squashes the following commits:
680a4c4 [Alexander] Fix typo in docs on custom Hadoop InputFormats
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For SPARK-5867:
* The spark.ml programming guide needs to be updated to use the new SQL DataFrame API instead of the old SchemaRDD API.
* It should also include Python examples now.
For SPARK-5892:
* Fix Python docs
* Various other cleanups
BTW, I accidentally merged this with master. If you want to compile it on your own, use this branch which is based on spark/branch-1.3 and cherry-picks the commits from this PR: [https://github.com/jkbradley/spark/tree/doc-review-1.3-check]
CC: mengxr (ML), davies (Python docs)
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #4675 from jkbradley/doc-review-1.3 and squashes the following commits:
f191bb0 [Joseph K. Bradley] small cleanups
e786efa [Joseph K. Bradley] small doc corrections
6b1ab4a [Joseph K. Bradley] fixed python lint test
946affa [Joseph K. Bradley] Added sample data for ml.MovieLensALS example. Changed spark.ml Java examples to use DataFrames API instead of sql()
da81558 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into doc-review-1.3
629dbf5 [Joseph K. Bradley] Updated based on code review: * made new page for old migration guides * small fixes * moved inherit_doc in python
b9df7c4 [Joseph K. Bradley] Small cleanups: toDF to toDF(), adding s for string interpolation
34b067f [Joseph K. Bradley] small doc correction
da16aef [Joseph K. Bradley] Fixed python mllib docs
8cce91c [Joseph K. Bradley] GMM: removed old imports, added some doc
695f3f6 [Joseph K. Bradley] partly done trying to fix inherit_doc for class hierarchies in python docs
a72c018 [Joseph K. Bradley] made ChiSqTestResult appear in python docs
b05a80d [Joseph K. Bradley] organize imports. doc cleanups
e572827 [Joseph K. Bradley] updated programming guide for ml and mllib
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the previous version, PIC stores clustering assignments as an `RDD[(Long, Int)]`. This is mapped to `RDD<Tuple2<Object, Object>>` in Java and hence Java users have to cast types manually. We should either create a new method called `javaAssignments` that returns `JavaRDD[(java.lang.Long, java.lang.Int)]` or wrap the result pair in a class. I chose the latter approach in this PR. Now assignments are stored as an `RDD[Assignment]`, where `Assignment` is a class with `id` and `cluster`.
Similarly, in FPGrowth, the frequent itemsets are stored as an `RDD[(Array[Item], Long)]`, which is mapped to `RDD<Tuple2<Object, Object>>`. Though we provide a "Java-friendly" method `javaFreqItemsets` that returns `JavaRDD[(Array[Item], java.lang.Long)]`. It doesn't really work because `Array[Item]` is mapped to `Object` in Java. So in this PR I created a class `FreqItemset` to wrap the results. It has `items` and `freq`, as well as a `javaItems` method that returns `List<Item>` in Java.
I'm not certain that the names I chose are proper: `Assignment`/`id`/`cluster` and `FreqItemset`/`items`/`freq`. Please let me know if there are better suggestions.
CC: jkbradley
Author: Xiangrui Meng <meng@databricks.com>
Closes #4695 from mengxr/SPARK-5900 and squashes the following commits:
865b5ca [Xiangrui Meng] make Assignment serializable
cffa96e [Xiangrui Meng] fix test
9c0e590 [Xiangrui Meng] remove unused Tuple2
1b9db3d [Xiangrui Meng] make PIC and FPGrowth Java-friendly
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
...) will not work
I've updated documentation to reflect true behavior of this setting in client vs. cluster mode.
Author: Ilya Ganelin <ilya.ganelin@capitalone.com>
Closes #4665 from ilganeli/SPARK-5570 and squashes the following commits:
5d1c8dd [Ilya Ganelin] Added example configuration code
a51700a [Ilya Ganelin] Getting rid of extra spaces
85f7a08 [Ilya Ganelin] Reworded note
5889d43 [Ilya Ganelin] Formatting adjustment
f149ba1 [Ilya Ganelin] Minor updates
1fec7a5 [Ilya Ganelin] Updated to add clarification for other driver properties
db47595 [Ilya Ganelin] Slight formatting update
c899564 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-5570
17b751d [Ilya Ganelin] Updated documentation for driver-memory to reflect its true behavior in client vs cluster mode
|
|
|
|
|
|
|
|
|
|
|
|
| |
Updated PIC user guide to reflect API changes and added a simple Java example. The API is still not very Java-friendly. I created SPARK-5990 for this issue.
Author: Xiangrui Meng <meng@databricks.com>
Closes #4680 from mengxr/SPARK-5897 and squashes the following commits:
847d216 [Xiangrui Meng] apache header
87719a2 [Xiangrui Meng] remove PIC image
2dd921f [Xiangrui Meng] update PIC user guide and add a Java example
|
|
|
|
|
|
|
|
|
|
| |
Docs for BlockMatrix. mengxr
Author: Burak Yavuz <brkyvz@gmail.com>
Closes #4664 from brkyvz/SPARK-5507PR and squashes the following commits:
4db30b0 [Burak Yavuz] [SPARK-5507] Added documentation for BlockMatrix
|
|
|
|
|
|
|
|
|
|
| |
The API is still not very Java-friendly because `Array[Item]` in `freqItemsets` is recognized as `Object` in Java. We might want to define a case class to wrap the return pair to make it Java friendly.
Author: Xiangrui Meng <meng@databricks.com>
Closes #4661 from mengxr/SPARK-5519 and squashes the following commits:
58ccc25 [Xiangrui Meng] add user guide with example code for fp-growth
|
|
|
|
|
|
|
|
|
|
| |
numClassesForClassification has been renamed to numClasses.
Author: MechCoder <manojkumarsivaraj334@gmail.com>
Closes #4672 from MechCoder/minor-doc and squashes the following commits:
d2ddb7f [MechCoder] Minor doc fix in GBT classification example
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Packages support
Documentation for maven coordinates + Spark Package support. Added pyspark tests for `--packages`
Author: Burak Yavuz <brkyvz@gmail.com>
Author: Davies Liu <davies@databricks.com>
Closes #4662 from brkyvz/SPARK-5811 and squashes the following commits:
56ccccd [Burak Yavuz] fixed broken test
64cb8ee [Burak Yavuz] passed pep8 on local
c07b81e [Burak Yavuz] fixed pep8
a8bd6b7 [Burak Yavuz] submit PR
4ef4046 [Burak Yavuz] ready for PR
8fb02e5 [Burak Yavuz] merged master
25c9b9f [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into python-jar
560d13b [Burak Yavuz] before PR
17d3f76 [Davies Liu] support .jar as python package
a3eb717 [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into SPARK-5811
c60156d [Burak Yavuz] [SPARK-5811] Added documentation for maven coordinates
|
|
|
|
|
|
|
|
|
| |
Author: CodingCat <zhunansjtu@gmail.com>
Closes #4656 from CodingCat/fix_typo and squashes the following commits:
b41d15c [CodingCat] recover
689fe46 [CodingCat] fix typo
|
|
|
|
|
|
|
|
| |
Author: Patrick Wendell <patrick@databricks.com>
Closes #4638 from pwendell/SPARK-5850 and squashes the following commits:
386126f [Patrick Wendell] SPARK-5850: Remove experimental label for Scala 2.11 and FlumePollingStream.
|
|
|
|
| |
That patch had a line break in the middle of a {{ }} expression, which is not allowed.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
User guide for isotonic regression added to docs/mllib-regression.md including code examples for Scala and Java.
Author: martinzapletal <zapletal-martin@email.cz>
Closes #4536 from zapletal-martin/SPARK-5502 and squashes the following commits:
67fe773 [martinzapletal] SPARK-5502 reworded model prediction rules to use more general language rather than the code/implementation specific terms
80bd4c3 [martinzapletal] SPARK-5502 created docs page for isotonic regression, added links to the page, updated data and examples
7d8136e [martinzapletal] SPARK-5502 Added documentation for Isotonic regression including examples for Scala and Java
504b5c3 [martinzapletal] SPARK-5502 Added documentation for Isotonic regression including examples for Scala and Java
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, Spark Streaming Programming Guide after updateStateByKey explanation links to file stateful_network_wordcount.py and note "For the complete Scala code ..." for any language tab selected. This is an incoherence.
I've changed the guide and link its pertinent example file. JavaStatefulNetworkWordCount.java example was not created so I added to the commit.
Author: gasparms <gmunoz@stratio.com>
Closes #4589 from gasparms/feature/streaming-guide and squashes the following commits:
7f37f89 [gasparms] More style changes
ec202b0 [gasparms] Follow spark style guide
f527328 [gasparms] Improve example to look like scala example
4d8785c [gasparms] Remove throw exception
e92e6b8 [gasparms] Fix incoherence
92db405 [gasparms] Fix Streaming Programming Guide. Change files according the selected language
|
|
|
|
|
|
|
|
|
|
| |
Put example code close to the algorithm description.
Author: Xiangrui Meng <meng@databricks.com>
Closes #4598 from mengxr/SPARK-5806 and squashes the following commits:
a137872 [Xiangrui Meng] re-organize sections in mllib-clustering.md
|
|
|
|
|
|
|
|
|
|
| |
Fixes SPARK-5805 : Fix the type error in the final example given in MLlib - Clustering documentation.
Author: Emre Sevinç <emre.sevinc@gmail.com>
Closes #4596 from emres/SPARK-5805 and squashes the following commits:
1029f66 [Emre Sevinç] SPARK-5805 Fixed the type error in documentation.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Updated examples using the new api and added DataFrame concept
Author: Antonio Navarro Perez <ajnavarro@users.noreply.github.com>
Closes #4560 from ajnavarro/ajnavarro-doc-sql-update and squashes the following commits:
82ebcf3 [Antonio Navarro Perez] Changed a missing JavaSQLContext to SQLContext.
8d5376a [Antonio Navarro Perez] fixed typo
8196b6b [Antonio Navarro Perez] [SQL][DOCS] Update sql documentation
|
|
|
|
|
|
|
|
|
|
| |
(for master / 1.4 only)
Author: Sean Owen <sowen@cloudera.com>
Closes #4526 from srowen/SPARK-5727.2 and squashes the following commits:
83ba49c [Sean Owen] Remove Debian packaging
|
|
|
|
|
|
|
|
|
|
|
|
| |
Looking at the code, I believe this remark about `take(n)` computing partitions on the driver is no longer correct. Apologies if I'm wrong.
This came up in http://stackoverflow.com/q/28436559/3318517.
Author: Daniel Darabos <darabos.daniel@gmail.com>
Closes #4533 from darabos/patch-2 and squashes the following commits:
cc80f3a [Daniel Darabos] Remove outdated remark about take(n).
|
|
|
|
|
|
|
|
|
|
|
| |
This just adds a deprecation message. It's intended for backporting to branch 1.3 but can go in master too, to be followed by another PR that removes it for 1.4.
Author: Sean Owen <sowen@cloudera.com>
Closes #4516 from srowen/SPARK-5727.1 and squashes the following commits:
d48989f [Sean Owen] Refer to Spark 1.4
6c1c8b3 [Sean Owen] Deprecate Debian packaging
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Deprecate inferSchema() and applySchema(), use createDataFrame() instead, which could take an optional `schema` to create an DataFrame from an RDD. The `schema` could be StructType or list of names of columns.
Author: Davies Liu <davies@databricks.com>
Closes #4498 from davies/create and squashes the following commits:
08469c1 [Davies Liu] remove Scala/Java API for now
c80a7a9 [Davies Liu] fix hive test
d1bd8f2 [Davies Liu] cleanup applySchema
9526e97 [Davies Liu] createDataFrame from RDD with columns
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Yarn's config option `spark.yarn.user.classpath.first` does not work the same way as
`spark.files.userClassPathFirst`; Yarn's version is a lot more dangerous, in that it
modifies the system classpath, instead of restricting the changes to the user's class
loader. So this change implements the behavior of the latter for Yarn, and deprecates
the more dangerous choice.
To be able to achieve feature-parity, I also implemented the option for drivers (the existing
option only applies to executors). So now there are two options, each controlling whether
to apply userClassPathFirst to the driver or executors. The old option was deprecated, and
aliased to the new one (`spark.executor.userClassPathFirst`).
The existing "child-first" class loader also had to be fixed. It didn't handle resources, and it
was also doing some things that ended up causing JVM errors depending on how things
were being called.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #3233 from vanzin/SPARK-2996 and squashes the following commits:
9cf9cf1 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
a1499e2 [Marcelo Vanzin] Remove SPARK_HOME propagation.
fa7df88 [Marcelo Vanzin] Remove 'test.resource' file, create it dynamically.
a8c69f1 [Marcelo Vanzin] Review feedback.
cabf962 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
a1b8d7e [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
3f768e3 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
2ce3c7a [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
0e6d6be [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
70d4044 [Marcelo Vanzin] Fix pyspark/yarn-cluster test.
0fe7777 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
0e6ef19 [Marcelo Vanzin] Move class loaders around and make names more meaninful.
fe970a7 [Marcelo Vanzin] Review feedback.
25d4fed [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
3cb6498 [Marcelo Vanzin] Call the right loadClass() method on the parent.
fbb8ab5 [Marcelo Vanzin] Add locking in loadClass() to avoid deadlocks.
2e6c4b7 [Marcelo Vanzin] Mention new setting in documentation.
b6497f9 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
a10f379 [Marcelo Vanzin] Some feedback.
3730151 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
f513871 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
44010b6 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
7b57cba [Marcelo Vanzin] Remove now outdated message.
5304d64 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
35949c8 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
54e1a98 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996
d1273b2 [Marcelo Vanzin] Add test file to rat exclude.
fa1aafa [Marcelo Vanzin] Remove write check on user jars.
89d8072 [Marcelo Vanzin] Cleanups.
a963ea3 [Marcelo Vanzin] Implement spark.driver.userClassPathFirst for standalone cluster mode.
50afa5f [Marcelo Vanzin] Fix Yarn executor command line.
7d14397 [Marcelo Vanzin] Register user jars in executor up front.
7f8603c [Marcelo Vanzin] Fix yarn-cluster mode without userClassPathFirst.
20373f5 [Marcelo Vanzin] Fix ClientBaseSuite.
55c88fa [Marcelo Vanzin] Run all Yarn integration tests via spark-submit.
0b64d92 [Marcelo Vanzin] Add deprecation warning to yarn option.
4a84d87 [Marcelo Vanzin] Fix the child-first class loader.
d0394b8 [Marcelo Vanzin] Add "deprecated configs" to SparkConf.
46d8cf2 [Marcelo Vanzin] Update doc with new option, change name to "userClassPathFirst".
a314f2d [Marcelo Vanzin] Enable driver class path isolation in SparkSubmit.
91f7e54 [Marcelo Vanzin] [yarn] Enable executor class path isolation.
a853e74 [Marcelo Vanzin] Re-work CoarseGrainedExecutorBackend command line arguments.
89522ef [Marcelo Vanzin] Add class path isolation support for Yarn cluster mode.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the LDA user guide from jkbradley with Java and Scala code example.
Author: Xiangrui Meng <meng@databricks.com>
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #4465 from mengxr/lda-guide and squashes the following commits:
6dcb7d1 [Xiangrui Meng] update java example in the user guide
76169ff [Xiangrui Meng] update java example
36c3ae2 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into lda-guide
c2a1efe [Joseph K. Bradley] Added LDA programming guide, plus Java example (which is in the guide and probably should be removed).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I am the author of netlib-java and I found this documentation to be out of date. Some main points:
1. Breeze has not depended on jBLAS for some time
2. netlib-java provides a pure JVM implementation as the fallback (the original docs did not appear to be aware of this, claiming that gfortran was necessary)
3. The licensing issue is not just about LGPL: optimised natives have proprietary licenses. Building with the LGPL flag turned on really doesn't help you get past this.
4. I really think it's best to direct people to my detailed setup guide instead of trying to compress it into one sentence. It is different for each architecture, each OS, and for each backend.
I hope this helps to clear things up :smile:
Author: Sam Halliday <sam.halliday@Gmail.com>
Author: Sam Halliday <sam.halliday@gmail.com>
Closes #4448 from fommil/patch-1 and squashes the following commits:
18cda11 [Sam Halliday] remove link to skillsmatters at request of @mengxr
a35e4a9 [Sam Halliday] reword netlib-java/breeze docs
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://issues.apache.org/jira/browse/SPARK-2945
spark.executor.instances works. As this JIRA recommended, we should add docs for this common config.
Author: WangTaoTheTonic <wangtao111@huawei.com>
Closes #4350 from WangTaoTheTonic/SPARK-2945 and squashes the following commits:
4c3913a [WangTaoTheTonic] not compatible with dynamic allocation
5fa9c46 [WangTaoTheTonic] add doc for spark.executor.instances
|
|
|
|
|
|
|
|
|
|
| |
A recent patch #4051 made the initial number default to 0. With this change, any Spark application using dynamic allocation's default settings will ramp up very slowly. Since we never request more executors than needed to saturate the pending tasks, it is safe to ramp up quickly. The current default of 60 may be too slow.
Author: Andrew Or <andrew@databricks.com>
Closes #4409 from andrewor14/dynamic-allocation-interval and squashes the following commits:
d3cc485 [Andrew Or] Lower request interval
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GaussianMixture
Simple description and code samples (and sample data) for GaussianMixture
Author: Travis Galoppo <tjg2107@columbia.edu>
Closes #4401 from tgaloppo/spark-5013 and squashes the following commits:
c9ff9a5 [Travis Galoppo] Fixed link in mllib-clustering.md Added Gaussian mixture and power iteration as available clustering techniques in mllib-guide
2368690 [Travis Galoppo] Minor fixes
3eb41fa [Travis Galoppo] [SPARK-5013] Added documentation and sample data file for GaussianMixture
|
|
|
|
|
|
|
|
|
|
|
| |
Change spark-version from 1.1.0 to 1.2.0 in the example for spark-ec2/Launch Cluster.
Author: Miguel Peralvo <miguel.peralvo@gmail.com>
Closes #4300 from MiguelPeralvo/patch-1 and squashes the following commits:
38adf0b [Miguel Peralvo] Update ec2-scripts.md
1850869 [Miguel Peralvo] Update ec2-scripts.md
|
|
|
|
|
|
|
|
|
|
| |
Trivial fix.
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes #4400 from adrian-wang/docdate and squashes the following commits:
31bbe40 [Daoyuan Wang] doc fix for date
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Add meta description tags on some of the most important doc pages
- Shorten the titles of some pages to have more relevant keywords; for
example there's no reason to have "Spark SQL Programming Guide - Spark
1.2.0 documentation", we can just say "Spark SQL - Spark 1.2.0
documentation".
Author: Matei Zaharia <matei@databricks.com>
Closes #4381 from mateiz/docs-seo and squashes the following commits:
4940563 [Matei Zaharia] [SPARK-5608] Improve SEO of Spark documentation pages
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
when creating SparkContext
This patch introduces a new configuration option, `spark.extraListeners`, that allows SparkListeners to be specified in SparkConf and registered before the SparkContext is initialized. From the configuration documentation:
> A comma-separated list of classes that implement SparkListener; when initializing SparkContext, instances of these classes will be created and registered with Spark's listener bus. If a class has a single-argument constructor that accepts a SparkConf, that constructor will be called; otherwise, a zero-argument constructor will be called. If no valid constructor can be found, the SparkContext creation will fail with an exception.
This motivation for this patch is to allow monitoring code to be easily injected into existing Spark programs without having to modify those programs' code.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #4111 from JoshRosen/SPARK-5190-register-sparklistener-in-sc-constructor and squashes the following commits:
8370839 [Josh Rosen] Two minor fixes after merging with master
6e0122c [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-5190-register-sparklistener-in-sc-constructor
1a5b9a0 [Josh Rosen] Remove SPARK_EXTRA_LISTENERS environment variable.
2daff9b [Josh Rosen] Add a couple of explanatory comments for SPARK_EXTRA_LISTENERS.
b9973da [Josh Rosen] Add test to ensure that conf and env var settings are merged, not overriden.
d6f3113 [Josh Rosen] Use getConstructors() instead of try-catch to find right constructor.
d0d276d [Josh Rosen] Move code into setupAndStartListenerBus() method
b22b379 [Josh Rosen] Instantiate SparkListeners from classes listed in configurations.
9c0d8f1 [Josh Rosen] Revert "[SPARK-5190] Allow SparkListeners to be registered before SparkContext starts."
217ecc0 [Josh Rosen] Revert "Add addSparkListener to JavaSparkContext"
25988f3 [Josh Rosen] Add addSparkListener to JavaSparkContext
163ba19 [Josh Rosen] [SPARK-5190] Allow SparkListeners to be registered before SparkContext starts.
|
|
|
|
|
|
|
|
|
|
| |
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes #3820 from adrian-wang/parquettimestamp and squashes the following commits:
b1e2a0d [Daoyuan Wang] fix for nanos
4dadef1 [Daoyuan Wang] fix wrong read
93f438d [Daoyuan Wang] parquet timestamp support
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes several minor formatting issues in the [Continuous Compilation] [1] section.
[1]: http://spark.apache.org/docs/latest/building-spark.html#continuous-compilation
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4316)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes #4316 from liancheng/fix-build-instruction-docs and squashes the following commits:
0a92e01 [Cheng Lian] Fixes several formatting issues
|