| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Speculation hates direct output committer, as there are multiple corner cases that may cause data corruption and/or data loss.
Please see this [PR comment] [1] for more details.
[1]: https://github.com/apache/spark/pull/8191#issuecomment-131598385
Author: Cheng Lian <lian@databricks.com>
Closes #8317 from liancheng/spark-9899/speculation-hates-direct-output-committer.
(cherry picked from commit f3ff4c41d2e32bd0f2419d1c9c68fcd0c2593e41)
Signed-off-by: Michael Armbrust <michael@databricks.com>
Conflicts:
sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala
|
|
|
|
|
|
|
|
|
|
|
| |
We should rounding the result of multiply/division of decimal to expected precision/scale, also check overflow.
Author: Davies Liu <davies@databricks.com>
Closes #8287 from davies/decimal_division.
(cherry picked from commit 1f4c4fe6dfd8cc52b5fddfd67a31a77edbb1a036)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`DictionaryEncoding` uses Scala runtime reflection to avoid boxing costs while building the directory array. However, this code path may hit [SI-6240] [1] and throw exception.
[1]: https://issues.scala-lang.org/browse/SI-6240
Author: Cheng Lian <lian@databricks.com>
Closes #8306 from liancheng/spark-9627/in-memory-cache-scala-reflection.
(cherry picked from commit 21bdbe9fe69be47be562de24216a469e5ee64c7b)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DataFrame.withColumn in Python should be consistent with the Scala one (replacing the existing column that has the same name).
cc marmbrus
Author: Davies Liu <davies@databricks.com>
Closes #8300 from davies/with_column.
(cherry picked from commit 08887369c890e0dd87eb8b34e8c32bb03307bf24)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
spark.shuffle.reduceLocality.enabled by default.
https://issues.apache.org/jira/browse/SPARK-10087
In some cases, when spark.shuffle.reduceLocality.enabled is enabled, we are scheduling all reducers to the same executor (the cluster has plenty of resources). Changing spark.shuffle.reduceLocality.enabled to false resolve the problem.
Comments of https://github.com/apache/spark/pull/8280 provide more details of the symptom of this issue.
This PR changes the default setting of `spark.shuffle.reduceLocality.enabled` to `false` for branch 1.5.
Author: Yin Huai <yhuai@databricks.com>
Closes #8296 from yhuai/setNumPartitionsCorrectly-branch1.5.
|
|
|
|
|
|
|
|
|
| |
Author: Davies Liu <davies@databricks.com>
Closes #8305 from davies/format_number.
(cherry picked from commit e05da5cb5ea253e6372f648fc8203204f2a8df8d)
Signed-off-by: Reynold Xin <rxin@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This continues the work from #8256. I removed `since` tags from private/protected/local methods/variables (see https://github.com/apache/spark/commit/72fdeb64630470f6f46cf3eed8ffbfe83a7c4659). MechCoder
Closes #8256
Author: Xiangrui Meng <meng@databricks.com>
Author: Xiaoqing Wang <spark445@126.com>
Author: MechCoder <manojkumarsivaraj334@gmail.com>
Closes #8288 from mengxr/SPARK-8918.
(cherry picked from commit 5b62bef8cbf73f910513ef3b1f557aa94b384854)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
### JIRA
[[SPARK-10106] Add `ifelse` Column function to SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10106)
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
Closes #8303 from yu-iskw/SPARK-10106.
(cherry picked from commit d898c33f774b9a3db2fb6aa8f0cb2c2ac6004b58)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, users of evaluator (`CrossValidator` and `TrainValidationSplit`) would only maximize the metric in evaluator, leading to a hacky solution which negated metrics to be minimized and caused erroneous negative values to be reported to the user.
This PR adds a `isLargerBetter` attribute to the `Evaluator` base class, instructing users of `Evaluator` on whether the chosen metric should be maximized or minimized.
CC jkbradley
Author: Feynman Liang <fliang@databricks.com>
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #8290 from feynmanliang/SPARK-10097.
(cherry picked from commit 28a98464ea65aa7b35e24fca5ddaa60c2e5d53ee)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
complicated
I added lots of Column functinos into SparkR. And I also added `rand(seed: Int)` and `randn(seed: Int)` in Scala. Since we need such APIs for R integer type.
### JIRA
[[SPARK-9856] Add expression functions into SparkR whose params are complicated - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9856)
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
Closes #8264 from yu-iskw/SPARK-9856-3.
(cherry picked from commit 2fcb9cb9552dac1d78dcca5d4d5032b4fa6c985c)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
|
|
|
|
|
|
|
|
|
|
|
|
| |
1, Add Python example for mllib FP-growth user guide.
2, Correct mistakes of Scala and Java examples.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #8279 from yanboliang/spark-10084.
(cherry picked from commit 802b5b8791fc2c892810981b2479a04175aa3dcd)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
New user guide section ml-decision-tree.md, including code examples.
I have run all examples, including the Java ones.
CC: manishamde yanboliang mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #8244 from jkbradley/ml-dt-docs.
(cherry picked from commit 39e4ebd521defdb68a0787bcd3bde6bc855f5198)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add warnings according to SPARK-8949 in `SparkContext`
- warnings in scaladoc
- log warnings when preferred locations feature is used through `SparkContext`'s constructor
However I didn't found any documentation reference of this feature. Please direct me if you know any reference to this feature.
Author: Han JU <ju.han.felix@gmail.com>
Closes #7874 from darkjh/SPARK-8949.
(cherry picked from commit 3d16a545007922ee6fa36e5f5c3959406cb46484)
Signed-off-by: Sean Owen <sowen@cloudera.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
By using `StringIndexer`, we can obtain indexed label on new column. So a following estimator should use this new column through pipeline if it wants to use string indexed label.
I think it is better to make it explicit on documentation.
Author: lewuathe <lewuathe@me.com>
Closes #8205 from Lewuathe/SPARK-9977.
(cherry picked from commit ba2a07e2b6c5a39597b64041cd5bf342ef9631f5)
Signed-off-by: Sean Owen <sowen@cloudera.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Fix typo in ntile function.
Author: Moussa Taifi <moutai10@gmail.com>
Closes #8261 from moutai/patch-2.
(cherry picked from commit 865a3df3d578c0442c97d749c81f554b560da406)
Signed-off-by: Sean Owen <sowen@cloudera.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`Lists.newArrayList` -> `Arrays.asList`
CC jkbradley feynmanliang
Anybody into replacing usages of `Lists.newArrayList` in the examples / source code too? this method isn't useful in Java 7 and beyond.
Author: Sean Owen <sowen@cloudera.com>
Closes #8272 from srowen/SPARK-10070.
(cherry picked from commit f141efeafb42b14b5fcfd9aa8c5275162042349f)
Signed-off-by: Sean Owen <sowen@cloudera.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Link was broken because it included tick marks.
Author: Bill Chambers <wchambers@ischool.berkeley.edu>
Closes #8302 from anabranch/patch-1.
(cherry picked from commit b23c4d3ffc36e47c057360c611d8ab1a13877699)
Signed-off-by: Reynold Xin <rxin@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
spark.streaming.backpressure.{enable-->enabled} and fixed deprecated annotations
Small changes
- Renamed conf spark.streaming.backpressure.{enable --> enabled}
- Change Java Deprecated annotations to Scala deprecated annotation with more information.
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes #8299 from tdas/SPARK-9967.
(cherry picked from commit bc9a0e03235865d2ec33372f6400dec8c770778a)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
accesses cacheLocs
In Scala, `Seq.fill` always seems to return a List. Accessing a list by index is an O(N) operation. Thus, the following code will be really slow (~10 seconds on my machine):
```scala
val numItems = 100000
val s = Seq.fill(numItems)(1)
for (i <- 0 until numItems) s(i)
```
It turns out that we had a loop like this in DAGScheduler code, although it's a little tricky to spot. In `getPreferredLocsInternal`, there's a call to `getCacheLocs(rdd)(partition)`. The `getCacheLocs` call returns a Seq. If this Seq is a List and the RDD contains many partitions, then indexing into this list will cost O(partitions). Thus, when we loop over our tasks to compute their individual preferred locations we implicitly perform an N^2 loop, reducing scheduling throughput.
This patch fixes this by replacing `Seq` with `Array`.
Author: Josh Rosen <joshrosen@databricks.com>
Closes #8178 from JoshRosen/dagscheduler-perf.
(cherry picked from commit 010b03ed52f35fd4d426d522f8a9927ddc579209)
Signed-off-by: Reynold Xin <rxin@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
| |
SPARK-9436 simplifies the Pregel code. graphx-programming-guide needs to be modified accordingly since it lists the old Pregel code
Author: Alexander Ulanov <nashb@yandex.ru>
Closes #7831 from avulanov/SPARK-9508-pregel-doc2.
(cherry picked from commit 1c843e284818004f16c3f1101c33b510f80722e3)
Signed-off-by: Reynold Xin <rxin@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
| |
cc JoshRosen
Author: Davies Liu <davies@databricks.com>
Closes #8245 from davies/python_doc.
(cherry picked from commit de3223872a217c5224ba7136604f6b7753b29108)
Signed-off-by: Reynold Xin <rxin@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
UDFs on complex types
This is kind of a weird case, but given a sufficiently complex query plan (in this case a TungstenProject with an Exchange underneath), we could have NPEs on the executors due to the time when we were calling transformAllExpressions
In general we should ensure that all transformations occur on the driver and not on the executors. Some reasons for avoid executor side transformations include:
* (this case) Some operator constructors require state such as access to the Spark/SQL conf so doing a makeCopy on the executor can fail.
* (unrelated reason for avoid executor transformations) ExprIds are calculated using an atomic integer, so you can violate their uniqueness constraint by constructing them anywhere other than the driver.
This subsumes #8285.
Author: Reynold Xin <rxin@databricks.com>
Author: Michael Armbrust <michael@databricks.com>
Closes #8295 from rxin/SPARK-10096.
(cherry picked from commit 1ff0580eda90f9247a5233809667f5cebaea290e)
Signed-off-by: Reynold Xin <rxin@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In UnsafeRow, we use the private field of BigInteger for better performance, but it actually didn't contribute much (3% in one benchmark) to end-to-end runtime, and make it not portable (may fail on other JVM implementations).
So we should use the public API instead.
cc rxin
Author: Davies Liu <davies@databricks.com>
Closes #8286 from davies/portable_decimal.
(cherry picked from commit 270ee677750a1f2adaf24b5816857194e61782ff)
Signed-off-by: Davies Liu <davies.liu@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Add `when` and `otherwise` as `Column` methods
- Add `When` as an expression function
- Add `%otherwise%` infix as an alias of `otherwise`
Since R doesn't support a feature like method chaining, `otherwise(when(condition, value), value)` style is a little annoying for me. If `%otherwise%` looks strange for shivaram, I can remove it. What do you think?
### JIRA
[[SPARK-10075] Add `when` expressino function in SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10075)
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
Closes #8266 from yu-iskw/SPARK-10075.
(cherry picked from commit bf32c1f7f47dd907d787469f979c5859e02ce5e6)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
HiveSparkSubmitSuite and HiveThriftServer2 test suites
Scala process API has a known bug ([SI-8768] [1]), which may be the reason why several test suites which fork sub-processes are flaky.
This PR replaces Scala process API with Java process API in `CliSuite`, `HiveSparkSubmitSuite`, and `HiveThriftServer2` related test suites to see whether it fix these flaky tests.
[1]: https://issues.scala-lang.org/browse/SI-8768
Author: Cheng Lian <lian@databricks.com>
Closes #8168 from liancheng/spark-9939/use-java-process-api.
(cherry picked from commit a5b5b936596ceb45f5f5b68bf1d6368534fb9470)
Signed-off-by: Cheng Lian <lian@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
before setting trackerState to Started
Test failure: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=spark-test/3305/testReport/junit/org.apache.spark.streaming/StreamingContextSuite/stop_gracefully/
There is a race condition that setting `trackerState` to `Started` could happen after calling `startReceiver`. Then `startReceiver` won't start the receivers because it uses `! isTrackerStarted` to check if ReceiverTracker is stopping or stopped. But actually, `trackerState` is `Initialized` and will be changed to `Started` soon.
Therefore, we should use `isTrackerStopping || isTrackerStopped`.
Author: zsxwing <zsxwing@gmail.com>
Closes #8294 from zsxwing/SPARK-9504.
(cherry picked from commit 90273eff9604439a5a5853077e232d34555c67d7)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
generate blocks fills up to capacity
Generated blocks are inserted into an ArrayBlockingQueue, and another thread pulls stuff from the ArrayBlockingQueue and pushes it into BlockManager. Now if that queue fills up to capacity (default is 10 blocks), then the inserting into queue (done in the function updateCurrentBuffer) get blocked inside a synchronized block. However, the thread that is pulling blocks from the queue uses the same lock to check the current (active or stopped) while pulling from the queue. Since the block generating threads is blocked (as the queue is full) on the lock, this thread that is supposed to drain the queue gets blocked. Ergo, deadlock.
Solution: Moved blocking call to ArrayBlockingQueue outside the synchronized to prevent deadlock.
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes #8257 from tdas/SPARK-10072.
(cherry picked from commit 1aeae05bb20f01ab7ccaa62fe905a63e020074b5)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
```
R/functions.R:74:1: style: lines should not be more than 100 characters.
jc <- callJStatic("org.apache.spark.sql.functions", "lit", ifelse(class(x) == "Column", xjc, x))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
Closes #8297 from yu-iskw/minor-lint-r.
(cherry picked from commit b4b35f133aecaf84f04e8e444b660a33c6b7894a)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is against master, but we need to apply it to 1.5 branch as well.
cc shivaram and rxin
Author: Hossein <hossein@databricks.com>
Closes #8291 from falaki/SparkRVersion1.5.
(cherry picked from commit 04e0fea79b9acfa3a3cb81dbacb08f9d287b42c3)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
|
|
|
|
|
|
|
|
|
|
|
| |
mengxr jkbradley
Author: Feynman Liang <fliang@databricks.com>
Closes #8184 from feynmanliang/SPARK-9889-DCT-docs.
(cherry picked from commit badf7fa650f9801c70515907fcc26b58d7ec3143)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
FailureSuite
Failures in streaming.FailureSuite can leak StreamingContext and SparkContext which fails all subsequent tests
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes #8289 from tdas/SPARK-10098.
(cherry picked from commit 9108eff74a2815986fd067b273c2a344b6315405)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Currently there is no test case for `Params#arrayLengthGt`.
Author: lewuathe <lewuathe@me.com>
Closes #8223 from Lewuathe/SPARK-10012.
(cherry picked from commit c635a16f64c939182196b46725ef2d00ed107cca)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Added since tags to mllib.tree
Author: Bryan Cutler <bjcutler@us.ibm.com>
Closes #7380 from BryanCutler/sinceTag-mllibTree-8924.
(cherry picked from commit 1dbffba37a84c62202befd3911d25888f958191d)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
| |
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #8282 from vanzin/SPARK-10088.
(cherry picked from commit 492ac1facbc79ee251d45cff315598ec9935a0e2)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
| |
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #8283 from vanzin/SPARK-10089.
(cherry picked from commit fa41e0242f075843beff7dc600d1a6bac004bdc7)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Turns out that inner classes of inner objects are referenced directly, and thus moving it will break binary compatibility.
Author: Michael Armbrust <michael@databricks.com>
Closes #8281 from marmbrus/binaryCompat.
(cherry picked from commit 80cb25b228e821a80256546a2f03f73a45cf7645)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
spark-streaming-XXX-assembly jars
Removed contents already included in Spark assembly jar from spark-streaming-XXX-assembly jars.
Author: zsxwing <zsxwing@gmail.com>
Closes #8069 from zsxwing/SPARK-9574.
(cherry picked from commit bf1d6614dcb8f5974e62e406d9c0f8aac52556d3)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
See https://issues.apache.org/jira/browse/SPARK-10085
Author: Piotr Migdal <pmigdal@gmail.com>
Closes #8284 from stared/spark-10085.
(cherry picked from commit 8bae9015b7e7b4528ca2bc5180771cb95d2aac13)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Add Python example for mllib LDAModel user guide
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #8227 from yanboliang/spark-10032.
(cherry picked from commit 747c2ba8006d5b86f3be8dfa9ace639042a35628)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
user guide
Add Python examples for mllib IsotonicRegression user guide
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #8225 from yanboliang/spark-10029.
(cherry picked from commit f4fa61effe34dae2f0eab0bef57b2dee220cf92f)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Updates FPM user guide to include Association Rules.
Author: Feynman Liang <fliang@databricks.com>
Closes #8207 from feynmanliang/SPARK-9900-arules.
(cherry picked from commit f5ea3912900ccdf23e2eb419a342bfe3c0c0b61b)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CountVectorizerModel
jira: https://issues.apache.org/jira/browse/SPARK-9028
Add an estimator for CountVectorizerModel. The estimator will extract a vocabulary from document collections according to the term frequency.
I changed the meaning of minCount as a filter across the corpus. This aligns with Word2Vec and the similar parameter in SKlearn.
Author: Yuhao Yang <hhbyyh@gmail.com>
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #7388 from hhbyyh/cvEstimator.
(cherry picked from commit 354f4582b637fa25d3892ec2b12869db50ed83c9)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
parameters functions
### JIRA
[[SPARK-10007] Update `NAMESPACE` file in SparkR for simple parameters functions - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10007)
Author: Yuu ISHIKAWA <yuu.ishikawa@gmail.com>
Closes #8277 from yu-iskw/SPARK-10007.
(cherry picked from commit 1968276af0f681fe51328b7dd795bd21724a5441)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Parquet hard coded a JUL logger which always writes to stdout. This PR redirects it via SLF4j JUL bridge handler, so that we can control Parquet logs via `log4j.properties`.
This solution is inspired by https://github.com/Parquet/parquet-mr/issues/390#issuecomment-46064909.
Author: Cheng Lian <lian@databricks.com>
Closes #8196 from liancheng/spark-8118/redirect-parquet-jul.
(cherry picked from commit 5723d26d7e677b89383de3fcf2c9a821b68a65b7)
Signed-off-by: Cheng Lian <lian@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
it might be a typo introduced at the first moment or some leftover after some renaming......
the name of the method accessing the index file is called `getBlockData` now (not `getBlockLocation` as indicated in the comments)
Author: CodingCat <zhunansjtu@gmail.com>
Closes #8238 from CodingCat/minor_1.
(cherry picked from commit c34e9ff0eac2032283b959fe63b47cc30f28d21c)
Signed-off-by: Sean Owen <sowen@cloudera.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Fix the issue that ```layers``` and ```weights``` should be public variables of ```MultilayerPerceptronClassificationModel```. Users can not get ```layers``` and ```weights``` from a ```MultilayerPerceptronClassificationModel``` currently.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #8263 from yanboliang/mlp-public.
(cherry picked from commit dd0614fd618ad28cb77aecfbd49bb319b98fdba0)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
binary in ArrayData
The type for array of array in Java is slightly different than array of others.
cc cloud-fan
Author: Davies Liu <davies@databricks.com>
Closes #8250 from davies/array_binary.
(cherry picked from commit 5af3838d2e59ed83766f85634e26918baa53819f)
Signed-off-by: Reynold Xin <rxin@databricks.com>
|
|
|
|
|
|
|
|
|
| |
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
Closes #8265 from yu-iskw/minor-translate-comment.
(cherry picked from commit a0910315dae88b033e38a1de07f39ca21f6552ad)
Signed-off-by: Reynold Xin <rxin@databricks.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This PR adds a short description of `ml.feature` package with code example. The Java package doc will come in a separate PR. jkbradley
Author: Xiangrui Meng <meng@databricks.com>
Closes #8260 from mengxr/SPARK-7808.
(cherry picked from commit e290029a356222bddf4da1be0525a221a5a1630b)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
|
|
|
|
|
|
|
|
| |
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #8251 from vanzin/SPARK-10059.
(cherry picked from commit ee093c8b927e8d488aeb76115c7fb0de96af7720)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
|