aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-11608][MLLIB][DOC] Added migration guide for MLlib 1.6Joseph K. Bradley2015-12-162-15/+42
| | | | | | | | | | No known breaking changes, but some deprecations and changes of behavior. CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #10235 from jkbradley/mllib-guide-update-1.6.
* [SPARK-12361][PYSPARK][TESTS] Should set PYSPARK_DRIVER_PYTHON before Python ↵Jeff Zhang2015-12-161-1/+2
| | | | | | | | | | tests Although this patch still doesn't solve the issue why the return code is 0 (see JIRA description), it resolves the issue of python version mismatch. Author: Jeff Zhang <zjffdu@apache.org> Closes #10322 from zjffdu/SPARK-12361.
* [SPARK-12309][ML] Use sqlContext from MLlibTestSparkContext for spark.ml ↵Yanbo Liang2015-12-165-11/+5
| | | | | | | | | | | | test suites Use ```sqlContext``` from ```MLlibTestSparkContext``` rather than creating new one for spark.ml test suites. I have checked thoroughly and found there are four test cases need to update. cc mengxr jkbradley Author: Yanbo Liang <ybliang8@gmail.com> Closes #10279 from yanboliang/spark-12309.
* [SPARK-9694][ML] Add random seed Param to Scala CrossValidatorYanbo Liang2015-12-162-3/+16
| | | | | | | | Add random seed Param to Scala CrossValidator Author: Yanbo Liang <ybliang8@gmail.com> Closes #9108 from yanboliang/spark-9694.
* [SPARK-6518][MLLIB][EXAMPLE][DOC] Add example code and user guide for ↵Yu ISHIKAWA2015-12-164-0/+165
| | | | | | | | | | | bisecting k-means This PR includes only an example code in order to finish it quickly. I'll send another PR for the docs soon. Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #9952 from yu-iskw/SPARK-6518.
* [SPARK-12345][MESOS] Filter SPARK_HOME when submitting Spark jobs with Mesos ↵Timothy Chen2015-12-162-2/+7
| | | | | | | | | | | | cluster mode. SPARK_HOME is now causing problem with Mesos cluster mode since spark-submit script has been changed recently to take precendence when running spark-class scripts to look in SPARK_HOME if it's defined. We should skip passing SPARK_HOME from the Spark client in cluster mode with Mesos, since Mesos shouldn't use this configuration but should use spark.executor.home instead. Author: Timothy Chen <tnachen@gmail.com> Closes #10332 from tnachen/scheduler_ui.
* [SPARK-12215][ML][DOC] User guide section for KMeans in spark.mlYu ISHIKAWA2015-12-163-28/+100
| | | | | | | | cc jkbradley Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #10244 from yu-iskw/SPARK-12215.
* [SPARK-12310][SPARKR] Add write.json and write.parquet for SparkRYanbo Liang2015-12-164-56/+119
| | | | | | | | Add ```write.json``` and ```write.parquet``` for SparkR, and deprecated ```saveAsParquetFile```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10281 from yanboliang/spark-12310.
* [SPARK-12318][SPARKR] Save mode in SparkR should be error by defaultJeff Zhang2015-12-162-6/+13
| | | | | | | | shivaram Please help review. Author: Jeff Zhang <zjffdu@apache.org> Closes #10290 from zjffdu/SPARK-12318.
* [SPARK-8745] [SQL] remove GenerateProjectionDavies Liu2015-12-168-319/+11
| | | | | | | | cc rxin Author: Davies Liu <davies@databricks.com> Closes #10316 from davies/remove_generate_projection.
* [SPARK-12324][MLLIB][DOC] Fixes the sidebar in the ML documentationTimothy Hunter2015-12-164-34/+149
| | | | | | | | | | | | | | | | | | | | | | | This fixes the sidebar, using a pure CSS mechanism to hide it when the browser's viewport is too narrow. Credit goes to the original author Titan-C (mentioned in the NOTICE). Note that I am not a CSS expert, so I can only address comments up to some extent. Default view: <img width="936" alt="screen shot 2015-12-14 at 12 46 39 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793597/6d1d6eda-a261-11e5-836b-6eb2054e9054.png"> When collapsed manually by the user: <img width="1004" alt="screen shot 2015-12-14 at 12 54 02 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793669/c991989e-a261-11e5-8bf6-aecf3bdb6319.png"> Disappears when column is too narrow: <img width="697" alt="screen shot 2015-12-14 at 12 47 22 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793607/7754dbcc-a261-11e5-8b15-e0d074b0e47c.png"> Can still be opened by the user if necessary: <img width="651" alt="screen shot 2015-12-14 at 12 51 15 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793612/7bf82968-a261-11e5-9cc3-e827a7a6b2b0.png"> Author: Timothy Hunter <timhunter@databricks.com> Closes #10297 from thunterdb/12324.
* Revert "[SPARK-12105] [SQL] add convenient show functions"Reynold Xin2015-12-161-16/+9
| | | | This reverts commit 31b391019ff6eb5a483f4b3e62fd082de7ff8416.
* Revert "[HOTFIX] Compile error from commit 31b3910"Reynold Xin2015-12-161-1/+1
| | | | This reverts commit 840bd2e008da5b22bfa73c587ea2c57666fffc60.
* Style fix for the previous 3 JDBC filter push down commits.Reynold Xin2015-12-151-9/+8
|
* [SPARK-12315][SQL] isnotnull operator not pushed down for JDBC datasource.hyukjinkwon2015-12-152-0/+3
| | | | | | | | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-12315 `IsNotNull` filter is not being pushed down for JDBC datasource. It looks it is SQL standard according to [SQL-92](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt), SQL:1999, [SQL:2003](http://www.wiscorp.com/sql_2003_standard.zip) and [SQL:201x](http://www.wiscorp.com/sql20nn.zip) and I believe most databases support this. In this PR, I simply added the case for `IsNotNull` filter to produce a proper filter string. Author: hyukjinkwon <gurwls223@gmail.com> This patch had conflicts when merged, resolved by Committer: Reynold Xin <rxin@databricks.com> Closes #10287 from HyukjinKwon/SPARK-12315.
* [SPARK-12314][SQL] isnull operator not pushed down for JDBC datasource.hyukjinkwon2015-12-152-0/+2
| | | | | | | | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-12314 `IsNull` filter is not being pushed down for JDBC datasource. It looks it is SQL standard according to [SQL-92](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt), SQL:1999, [SQL:2003](http://www.wiscorp.com/sql_2003_standard.zip) and [SQL:201x](http://www.wiscorp.com/sql20nn.zip) and I believe most databases support this. In this PR, I simply added the case for `IsNull` filter to produce a proper filter string. Author: hyukjinkwon <gurwls223@gmail.com> This patch had conflicts when merged, resolved by Committer: Reynold Xin <rxin@databricks.com> Closes #10286 from HyukjinKwon/SPARK-12314.
* [SPARK-12249][SQL] JDBC non-equality comparison operator not pushed down.hyukjinkwon2015-12-152-0/+3
| | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-12249 Currently `!=` operator is not pushed down correctly. I simply added a case for this. Author: hyukjinkwon <gurwls223@gmail.com> Closes #10233 from HyukjinKwon/SPARK-12249.
* [SPARK-12304][STREAMING] Make Spark Streaming web UI display more fri…proflin2015-12-151-1/+7
| | | | | | | | | | | | | | | | | | | | …endly Receiver graphs Currently, the Spark Streaming web UI uses the same maxY when displays 'Input Rate Times& Histograms' and 'Per-Receiver Times& Histograms'. This may lead to somewhat un-friendly graphs: once we have tens of Receivers or more, every 'Per-Receiver Times' line almost hits the ground. This issue proposes to calculate a new maxY against the original one, which is shared among all the `Per-Receiver Times& Histograms' graphs. Before: ![before-5](https://cloud.githubusercontent.com/assets/15843379/11761362/d790c356-a0fa-11e5-860e-4b834603de1d.png) After: ![after-5](https://cloud.githubusercontent.com/assets/15843379/11761361/cfabf692-a0fa-11e5-97d0-4ad124aaca2a.png) Author: proflin <proflin.me@gmail.com> Closes #10318 from proflin/SPARK-12304.
* [SPARK-4117][YARN] Spark on Yarn handle AM being told command from RMDevaraj K2015-12-151-1/+8
| | | | | | | | | | | | Spark on Yarn handle AM being told command from RM When RM throws ApplicationAttemptNotFoundException for allocate invocation, making the ApplicationMaster to finish immediately without any retries. Author: Devaraj K <devaraj@apache.org> Closes #10129 from devaraj-kavali/SPARK-4117.
* [SPARK-10477][SQL] using DSL in ColumnPruningSuite to improve readabilityWenchen Fan2015-12-152-21/+27
| | | | | | Author: Wenchen Fan <cloud0fan@outlook.com> Closes #8645 from cloud-fan/test.
* [SPARK-12062][CORE] Change Master to asyc rebuild UI when application completesBryan Cutler2015-12-152-29/+52
| | | | | | | | This change builds the event history of completed apps asynchronously so the RPC thread will not be blocked and allow new workers to register/remove if the event log history is very large and takes a long time to rebuild. Author: Bryan Cutler <bjcutler@us.ibm.com> Closes #10284 from BryanCutler/async-MasterUI-SPARK-12062.
* [SPARK-9886][CORE] Fix to use ShutdownHookManager inNaveen2015-12-151-11/+5
| | | | | | | | ExternalBlockStore.scala Author: Naveen <naveenminchu@gmail.com> Closes #10313 from naveenminchu/branch-fix-SPARK-9886.
* [SPARK-10123][DEPLOY] Support specifying deploy mode from configurationjerryshao2015-12-155-7/+64
| | | | | | | | Please help to review, thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes #10195 from jerryshao/SPARK-10123.
* [SPARK-9026][SPARK-4514] Modifications to JobWaiter, FutureAction, and ↵Richard W. Eggert II2015-12-157-158/+251
| | | | | | | | | | | | | | | AsyncRDDActions to support non-blocking operation These changes rework the implementations of `SimpleFutureAction`, `ComplexFutureAction`, `JobWaiter`, and `AsyncRDDActions` such that asynchronous callbacks on the generated `Futures` NEVER block waiting for a job to complete. A small amount of mutex synchronization is necessary to protect the internal fields that manage cancellation, but these locks are only held very briefly and in practice should almost never cause any blocking to occur. The existing blocking APIs of these classes are retained, but they simply delegate to the underlying non-blocking API and `Await` the results with indefinite timeouts. Associated JIRA ticket: https://issues.apache.org/jira/browse/SPARK-9026 Also fixes: https://issues.apache.org/jira/browse/SPARK-4514 This pull request contains all my own original work, which I release to the Spark project under its open source license. Author: Richard W. Eggert II <richard.eggert@gmail.com> Closes #9264 from reggert/fix-futureaction.
* [SPARK-9516][UI] Improvement of Thread Dump PageCodingCat2015-12-154-43/+118
| | | | | | | | | | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-9516 - [x] new look of Thread Dump Page - [x] click column title to sort - [x] grep - [x] search as you type squito JoshRosen It's ready for the review now Author: CodingCat <zhunansjtu@gmail.com> Closes #7910 from CodingCat/SPARK-9516.
* [SPARK-12351][MESOS] Add documentation about submitting Spark with mesos ↵Timothy Chen2015-12-152-6/+35
| | | | | | | | | | cluster mode. Adding more documentation about submitting jobs with mesos cluster mode. Author: Timothy Chen <tnachen@gmail.com> Closes #10086 from tnachen/mesos_supervise_docs.
* [SPARK-12130] Replace shuffleManagerClass with shortShuffleMgrNames in ↵Lianhui Wang2015-12-158-12/+18
| | | | | | | | | | ExternalShuffleBlockResolver Replace shuffleManagerClassName with shortShuffleMgrName is to reduce time of string's comparison. and put sort's comparison on the front. cc JoshRosen andrewor14 Author: Lianhui Wang <lianhuiwang09@gmail.com> Closes #10131 from lianhuiwang/spark-12130.
* [SPARK-12056][CORE] Part 2 Create a TaskAttemptContext only after calling ↵tedyu2015-12-151-2/+2
| | | | | | | | | | | | | setConf This is continuation of SPARK-12056 where change is applied to SqlNewHadoopRDD.scala andrewor14 FYI Author: tedyu <yuzhihong@gmail.com> Closes #10164 from tedyu/master.
* [HOTFIX] Compile error from commit 31b3910Andrew Or2015-12-151-1/+1
|
* [SPARK-12105] [SQL] add convenient show functionsJean-Baptiste Onofré2015-12-151-9/+16
| | | | | | Author: Jean-Baptiste Onofré <jbonofre@apache.org> Closes #10130 from jbonofre/SPARK-12105.
* [SPARK-12236][SQL] JDBC filter tests all pass if filters are not really ↵hyukjinkwon2015-12-153-21/+19
| | | | | | | | | | | | | | | | pushed down https://issues.apache.org/jira/browse/SPARK-12236 Currently JDBC filters are not tested properly. All the tests pass even if the filters are not pushed down due to Spark-side filtering. In this PR, Firstly, I corrected the tests to properly check the pushed down filters by removing Spark-side filtering. Also, `!=` was being tested which is actually not pushed down. So I removed them. Lastly, I moved the `stripSparkFilter()` function to `SQLTestUtils` as this functions would be shared for all tests for pushed down filters. This function would be also shared with ORC datasource as the filters for that are also not being tested properly. Author: hyukjinkwon <gurwls223@gmail.com> Closes #10221 from HyukjinKwon/SPARK-12236.
* [SPARK-12271][SQL] Improve error message when Dataset.as[ ] has incompatible ↵Nong Li2015-12-154-7/+18
| | | | | | | | schemas. Author: Nong Li <nong@databricks.com> Closes #10260 from nongli/spark-11271.
* [MINOR][ML] Rename weights to coefficients for examples/DeveloperApiExampleYanbo Liang2015-12-152-19/+19
| | | | | | | | | | Rename ```weights``` to ```coefficients``` for examples/DeveloperApiExample. cc mengxr jkbradley Author: Yanbo Liang <ybliang8@gmail.com> Closes #10280 from yanboliang/spark-coefficients.
* [STREAMING][MINOR] Fix typo in function name of StateImpljerryshao2015-12-153-3/+3
| | | | | | | | cc\ tdas zsxwing , please review. Thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes #10305 from jerryshao/fix-typo-state-impl.
* [SPARK-12332][TRIVIAL][TEST] Fix minor typo in ResetSystemPropertiesHolden Karau2015-12-151-1/+1
| | | | | | | | Fix a minor typo (unbalanced bracket) in ResetSystemProperties. Author: Holden Karau <holden@us.ibm.com> Closes #10303 from holdenk/SPARK-12332-trivial-typo-in-ResetSystemProperties-comment.
* [SPARK-12288] [SQL] Support UnsafeRow in Coalesce/Except/Intersect.gatorsmile2015-12-142-1/+46
| | | | | | | | | | Support UnsafeRow for the Coalesce/Except/Intersect. Could you review if my code changes are ok? davies Thank you! Author: gatorsmile <gatorsmile@gmail.com> Closes #10285 from gatorsmile/unsafeSupportCIE.
* [SPARK-12188][SQL][FOLLOW-UP] Code refactoring and comment correction in ↵gatorsmile2015-12-141-1/+1
| | | | | | | | | | Dataset APIs marmbrus This PR is to address your comment. Thanks for your review! Author: gatorsmile <gatorsmile@gmail.com> Closes #10214 from gatorsmile/followup12188.
* [SPARK-12274][SQL] WrapOption should not have type constraint for childWenchen Fan2015-12-141-4/+1
| | | | | | | | I think it was a mistake, and we have not catched it so far until https://github.com/apache/spark/pull/10260 which begin to check if the `fromRowExpression` is resolved. Author: Wenchen Fan <wenchen@databricks.com> Closes #10263 from cloud-fan/encoder.
* [SPARK-12327] Disable commented code lintr temporarilyShivaram Venkataraman2015-12-141-1/+1
| | | | | | | | cc yhuai felixcheung shaneknapp Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #10300 from shivaram/comment-lintr-disable.
* [SPARK-12016] [MLLIB] [PYSPARK] Wrap Word2VecModel when loading it in pysparkLiang-Chi Hsieh2015-12-143-34/+67
| | | | | | | | | | JIRA: https://issues.apache.org/jira/browse/SPARK-12016 We should not directly use Word2VecModel in pyspark. We need to wrap it in a Word2VecModelWrapper when loading it in pyspark. Author: Liang-Chi Hsieh <viirya@appier.com> Closes #10100 from viirya/fix-load-py-wordvecmodel.
* [MINOR][DOC] Fix broken word2vec linkBenFradet2015-12-141-1/+1
| | | | | | | | Follow-up of [SPARK-12199](https://issues.apache.org/jira/browse/SPARK-12199) and #10193 where a broken link has been left as is. Author: BenFradet <benjamin.fradet@gmail.com> Closes #10282 from BenFradet/SPARK-12199.
* [SPARK-12275][SQL] No plan for BroadcastHint in some conditionyucai2015-12-132-1/+8
| | | | | | | | | | When SparkStrategies.BasicOperators's "case BroadcastHint(child) => apply(child)" is hit, it only recursively invokes BasicOperators.apply with this "child". It makes many strategies have no change to process this plan, which probably leads to "No plan" issue, so we use planLater to go through all strategies. https://issues.apache.org/jira/browse/SPARK-12275 Author: yucai <yucai.yu@intel.com> Closes #10265 from yucai/broadcast_hint.
* [SPARK-12213][SQL] use multiple partitions for single distinct queryDavies Liu2015-12-1310-990/+422
| | | | | | | | | | | | | | | | | | | | | | | | | Currently, we could generate different plans for query with single distinct (depends on spark.sql.specializeSingleDistinctAggPlanning), one works better on low cardinality columns, the other works better for high cardinality column (default one). This PR change to generate a single plan (three aggregations and two exchanges), which work better in both cases, then we could safely remove the flag `spark.sql.specializeSingleDistinctAggPlanning` (introduced in 1.6). For a query like `SELECT COUNT(DISTINCT a) FROM table` will be ``` AGG-4 (count distinct) Shuffle to a single reducer Partial-AGG-3 (count distinct, no grouping) Partial-AGG-2 (grouping on a) Shuffle by a Partial-AGG-1 (grouping on a) ``` This PR also includes large refactor for aggregation (reduce 500+ lines of code) cc yhuai nongli marmbrus Author: Davies Liu <davies@databricks.com> Closes #10228 from davies/single_distinct.
* [SPARK-12281][CORE] Fix a race condition when reporting ExecutorState in the ↵Shixiong Zhu2015-12-133-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | shutdown hook 1. Make sure workers and masters exit so that no worker or master will still be running when triggering the shutdown hook. 2. Set ExecutorState to FAILED if it's still RUNNING when executing the shutdown hook. This should fix the potential exceptions when exiting a local cluster ``` java.lang.AssertionError: assertion failed: executor 4 state transfer from RUNNING to RUNNING is illegal at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:260) at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) java.lang.IllegalStateException: Shutdown hooks cannot be modified during shutdown. at org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:246) at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:191) at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:180) at org.apache.spark.deploy.worker.ExecutorRunner.start(ExecutorRunner.scala:73) at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:474) at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ``` Author: Shixiong Zhu <shixiong@databricks.com> Closes #10269 from zsxwing/executor-state.
* [SPARK-12267][CORE] Store the remote RpcEnv address to send the correct ↵Shixiong Zhu2015-12-124-1/+65
| | | | | | | | disconnetion message Author: Shixiong Zhu <shixiong@databricks.com> Closes #10261 from zsxwing/SPARK-12267.
* [SPARK-12199][DOC] Follow-up: Refine example code in ml-features.mdXusen Yin2015-12-124-15/+15
| | | | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-12199 Follow-up PR of SPARK-11551. Fix some errors in ml-features.md mengxr Author: Xusen Yin <yinxusen@gmail.com> Closes #10193 from yinxusen/SPARK-12199.
* [SPARK-11193] Use Java ConcurrentHashMap instead of SynchronizedMap trait in ↵Jean-Baptiste Onofré2015-12-121-8/+8
| | | | | | | | order to avoid ClassCastException due to KryoSerializer in KinesisReceiver Author: Jean-Baptiste Onofré <jbonofre@apache.org> Closes #10203 from jbonofre/SPARK-11193.
* [SPARK-12158][SPARKR][SQL] Fix 'sample' functions that break R unit test casesgatorsmile2015-12-112-6/+15
| | | | | | | | | | | The existing sample functions miss the parameter `seed`, however, the corresponding function interface in `generics` has such a parameter. Thus, although the function caller can call the function with the 'seed', we are not using the value. This could cause SparkR unit tests failed. For example, I hit it in another PR: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47213/consoleFull Author: gatorsmile <gatorsmile@gmail.com> Closes #10160 from gatorsmile/sampleR.
* [SPARK-12298][SQL] Fix infinite loop in DataFrame.sortWithinPartitionsAnkur Dave2015-12-112-3/+3
| | | | | | | | Modifies the String overload to call the Column overload and ensures this is called in a test. Author: Ankur Dave <ankurdave@gmail.com> Closes #10271 from ankurdave/SPARK-12298.
* [SPARK-11978][ML] Move dataset_example.py to examples/ml and rename to ↵Yanbo Liang2015-12-112-26/+38
| | | | | | | | | | | | | | dataframe_example.py Since ```Dataset``` has a new meaning in Spark 1.6, we should rename it to avoid confusion. #9873 finished the work of Scala example, here we focus on the Python one. Move dataset_example.py to ```examples/ml``` and rename to ```dataframe_example.py```. BTW, fix minor missing issues of #9873. cc mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9957 from yanboliang/SPARK-11978.