aboutsummaryrefslogtreecommitdiff
path: root/docs
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-16505][YARN] Optionally propagate error during shuffle service startup.Marcelo Vanzin2016-07-142-12/+32
| | | | | | | | | | | This prevents the NM from starting when something is wrong, which would lead to later errors which are confusing and harder to debug. Added a unit test to verify startup fails if something is wrong. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #14162 from vanzin/SPARK-16505.
* [SPARKR][DOCS][MINOR] R programming guide to include csv data source exampleFelix Cheung2016-07-131-9/+18
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Minor documentation update for code example, code style, and missed reference to "sparkR.init" ## How was this patch tested? manual shivaram Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #14178 from felixcheung/rcsvprogrammingguide.
* [SPARK-16114][SQL] updated structured streaming guideJames Thomas2016-07-131-26/+23
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Updated structured streaming programming guide with new windowed example. ## How was this patch tested? Docs Author: James Thomas <jamesjoethomas@gmail.com> Closes #14183 from jjthomas/ss_docs_update.
* [SPARK-16438] Add Asynchronous Actions documentationsandy2016-07-131-0/+3
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Add Asynchronous Actions documentation inside action of programming guide ## How was this patch tested? check the documentation indentation and formatting with md preview. Author: sandy <phalodi@gmail.com> Closes #14104 from phalodi/SPARK-16438.
* [SPARK-16303][DOCS][EXAMPLES] Updated SQL programming guide and examplesaokolnychyi2016-07-131-537/+35
| | | | | | | | | | | | | | - Hard-coded Spark SQL sample snippets were moved into source files under examples sub-project. - Removed the inconsistency between Scala and Java Spark SQL examples - Scala and Java Spark SQL examples were updated The work is still in progress. All involved examples were tested manually. An additional round of testing will be done after the code review. ![image](https://cloud.githubusercontent.com/assets/6235869/16710314/51851606-462a-11e6-9fbe-0818daef65e4.png) Author: aokolnychyi <okolnychyyanton@gmail.com> Closes #14119 from aokolnychyi/spark_16303.
* [SPARK-15752][SQL] Optimize metadata only query that has an aggregate whose ↵Lianhui Wang2016-07-121-0/+12
| | | | | | | | | | | | | | | | children are deterministic project or filter operators. ## What changes were proposed in this pull request? when query only use metadata (example: partition key), it can return results based on metadata without scanning files. Hive did it in HIVE-1003. ## How was this patch tested? add unit tests Author: Lianhui Wang <lianhuiwang09@gmail.com> Author: Wenchen Fan <wenchen@databricks.com> Author: Lianhui Wang <lianhuiwang@users.noreply.github.com> Closes #13494 from lianhuiwang/metadata-only.
* [MINOR][STREAMING][DOCS] Minor changes on kinesis integrationXin Ren2016-07-111-13/+13
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Some minor changes for documentation page "Spark Streaming + Kinesis Integration". Moved "streaming-kinesis-arch.png" before the bullet list, not in between the bullets. ## How was this patch tested? Tested manually, on my local machine. Author: Xin Ren <iamshrek@126.com> Closes #14097 from keypointt/kinesisDoc.
* [SPARKR][DOC] SparkR ML user guides update for 2.0Yanbo Liang2016-07-111-18/+25
| | | | | | | | | | | | | ## What changes were proposed in this pull request? * Update SparkR ML section to make them consistent with SparkR API docs. * Since #13972 adds labelling support for the ```include_example``` Jekyll plugin, so that we can split the single ```ml.R``` example file into multiple line blocks with different labels, and include them in different algorithms/models in the generated HTML page. ## How was this patch tested? Only docs update, manually check the generated docs. Author: Yanbo Liang <ybliang8@gmail.com> Closes #14011 from yanboliang/r-user-guide-update.
* [SPARK-16477] Bump master version to 2.1.0-SNAPSHOTReynold Xin2016-07-111-2/+2
| | | | | | | | | | | | ## What changes were proposed in this pull request? After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number. ## How was this patch tested? N/A Author: Reynold Xin <rxin@databricks.com> Closes #14130 from rxin/SPARK-16477.
* [SPARK-16381][SQL][SPARKR] Update SQL examples and programming guide for R ↵Xin Ren2016-07-111-142/+13
| | | | | | | | | | | | | | | | | | | | | | | language binding https://issues.apache.org/jira/browse/SPARK-16381 ## What changes were proposed in this pull request? Update SQL examples and programming guide for R language binding. Here I just follow example https://github.com/apache/spark/compare/master...liancheng:example-snippet-extraction, created a separate R file to store all the example code. ## How was this patch tested? Manual test on my local machine. Screenshot as below: ![screen shot 2016-07-06 at 4 52 25 pm](https://cloud.githubusercontent.com/assets/3925641/16638180/13925a58-439a-11e6-8d57-8451a63dcae9.png) Author: Xin Ren <iamshrek@126.com> Closes #14082 from keypointt/SPARK-16381.
* [SPARK-11857][MESOS] Deprecate fine grainedMichael Gummelt2016-07-081-2/+7
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Documentation changes to indicate that fine-grained mode is now deprecated. No code changes were made, and all fine-grained mode instructions were left in place. We can remove all of that once the deprecation cycle completes (Does Spark have a standard deprecation cycle? One major version?) Blocked on https://github.com/apache/spark/pull/14059 ## How was this patch tested? Viewed in Github Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #14078 from mgummelt/deprecate-fine-grained.
* [MESOS] expand coarse-grained mode docsMichael Gummelt2016-07-061-26/+51
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? docs ## How was this patch tested? viewed the docs in github Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #14059 from mgummelt/coarse-grained.
* [DOC][SQL] update out-of-date code snippets using SQLContext in all documents.WeichenXu2016-07-062-20/+23
| | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? I search the whole documents directory using SQLContext, and update the following places: - docs/configuration.md, sparkR code snippets. - docs/streaming-programming-guide.md, several example code. ## How was this patch tested? N/A Author: WeichenXu <WeichenXu123@outlook.com> Closes #14025 from WeichenXu123/WIP_SQLContext_update.
* [MINOR][DOCS] Remove unused images; crush PNGs that could use it for good ↵Sean Owen2016-07-0425-0/+0
| | | | | | | | | | | | | | | | | | measure ## What changes were proposed in this pull request? Coincidentally, I discovered that a couple images were unused in `docs/`, and then searched and found more, and then realized some PNGs were pretty big and could be crushed, and before I knew it, had done the same for the ASF site (not committed yet). No functional change at all, just less superfluous image data. ## How was this patch tested? `jekyll serve` Author: Sean Owen <sowen@cloudera.com> Closes #14029 from srowen/RemoveCompressImages.
* [SPARK-16345][DOCUMENTATION][EXAMPLES][GRAPHX] Extract graphx programming ↵WeichenXu2016-07-021-127/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | guide example snippets from source files instead of hard code them ## What changes were proposed in this pull request? I extract 6 example programs from GraphX programming guide and replace them with `include_example` label. The 6 example programs are: - AggregateMessagesExample.scala - SSSPExample.scala - TriangleCountingExample.scala - ConnectedComponentsExample.scala - ComprehensiveExample.scala - PageRankExample.scala All the example code can run using `bin/run-example graphx.EXAMPLE_NAME` ## How was this patch tested? Manual. Author: WeichenXu <WeichenXu123@outlook.com> Closes #14015 from WeichenXu123/graphx_example_plugin.
* [GRAPHX][EXAMPLES] move graphx test data directory and update graphx documentWeichenXu2016-07-021-9/+9
| | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? There are two test data files used for graphx examples existing in directory "graphx/data" I move it into "data/" directory because the "graphx" directory is used for code files and other test data files (such as mllib, streaming test data) are all in there. I also update the graphx document where reference the data files which I move place. ## How was this patch tested? N/A Author: WeichenXu <WeichenXu123@outlook.com> Closes #14010 from WeichenXu123/move_graphx_data_dir.
* [SPARK-15643][DOC][ML] Add breaking changes to ML migration guideNick Pentreath2016-06-301-3/+101
| | | | | | | | | | | | This PR adds the breaking changes from [SPARK-14810](https://issues.apache.org/jira/browse/SPARK-14810) to the migration guide. ## How was this patch tested? Built docs locally. Author: Nick Pentreath <nickp@za.ibm.com> Closes #13924 from MLnick/SPARK-15643-migration-guide.
* [SPARK-16256][DOCS] Fix window operation diagramTathagata Das2016-06-304-1/+1
| | | | | | Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #14001 from tdas/SPARK-16256-2.
* [SPARK-16256][DOCS] Minor fixes on the Structured Streaming Programming GuideTathagata Das2016-06-291-21/+23
| | | | | | Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #13978 from tdas/SPARK-16256-1.
* [SPARK-16294][SQL] Labelling support for the include_example Jekyll pluginCheng Lian2016-06-292-41/+25
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR adds labelling support for the `include_example` Jekyll plugin, so that we may split a single source file into multiple line blocks with different labels, and include them in multiple code snippets in the generated HTML page. ## How was this patch tested? Manually tested. <img width="923" alt="screenshot at jun 29 19-53-21" src="https://cloud.githubusercontent.com/assets/230655/16451099/66a76db2-3e33-11e6-84fb-63104c2f0688.png"> Author: Cheng Lian <lian@databricks.com> Closes #13972 from liancheng/include-example-with-labels.
* [SPARK-16256][SQL][STREAMING] Added Structured Streaming Programming GuideTathagata Das2016-06-298-0/+1157
| | | | | | | | Title defines all. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #13945 from tdas/SPARK-16256.
* [SPARK-15990][YARN] Add rolling log aggregation support for Spark on yarnjerryshao2016-06-291-0/+24
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Yarn supports rolling log aggregation since 2.6, previously log will only be aggregated to HDFS after application is finished, it is quite painful for long running applications like Spark Streaming, thriftserver. Also out of disk problem will be occurred when log file is too large. So here propose to add support of rolling log aggregation for Spark on yarn. One limitation for this is that log4j should be set to change to file appender, now in Spark itself uses console appender by default, in which file will not be created again once removed after aggregation. But I think lots of production users should have changed their log4j configuration instead of default on, so this is not a big problem. ## How was this patch tested? Manually verified with Hadoop 2.7.1. Author: jerryshao <sshao@hortonworks.com> Closes #13712 from jerryshao/SPARK-15990.
* [SPARK-15643][DOC][ML] Update spark.ml and spark.mllib migration guide from ↵Yanbo Liang2016-06-282-19/+68
| | | | | | | | | | | | | | 1.6 to 2.0 ## What changes were proposed in this pull request? Update ```spark.ml``` and ```spark.mllib``` migration guide from 1.6 to 2.0. ## How was this patch tested? Docs update, no tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes #13378 from yanboliang/spark-13448.
* [SPARK-15863][SQL][DOC][FOLLOW-UP] Update SQL programming guide.Yin Huai2016-06-271-18/+16
| | | | | | | | | ## What changes were proposed in this pull request? This PR makes several updates to SQL programming guide. Author: Yin Huai <yhuai@databricks.com> Closes #13938 from yhuai/doc.
* [SPARK-15997][DOC][ML] Update user guide for HashingTF, QuantileVectorizer ↵GayathriMurali2016-06-241-12/+17
| | | | | | | | | | | | and CountVectorizer ## What changes were proposed in this pull request? Made changes to HashingTF,QuantileVectorizer and CountVectorizer Author: GayathriMurali <gayathri.m@intel.com> Closes #13745 from GayathriMurali/SPARK-15997.
* [SPARK-13723][YARN] Change behavior of --num-executors with dynamic allocation.Ryan Blue2016-06-232-1/+4
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This changes the behavior of --num-executors and spark.executor.instances when using dynamic allocation. Instead of turning dynamic allocation off, it uses the value for the initial number of executors. This changes was discussed on [SPARK-13723](https://issues.apache.org/jira/browse/SPARK-13723). I highly recommend using it while we can change the behavior for 2.0.0. In practice, the 1.x behavior causes unexpected behavior for users (it is not clear that it disables dynamic allocation) and wastes cluster resources because users rarely notice the log message. ## How was this patch tested? This patch updates tests and adds a test for Utils.getDynamicAllocationInitialExecutors. Author: Ryan Blue <blue@apache.org> Closes #13338 from rdblue/SPARK-13723-num-executors-with-dynamic-allocation.
* [SPARK-16088][SPARKR] update setJobGroup, cancelJobGroup, clearJobGroupFelix Cheung2016-06-231-0/+2
| | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Updated setJobGroup, cancelJobGroup, clearJobGroup to not require sc/SparkContext as parameter. Also updated roxygen2 doc and R programming guide on deprecations. ## How was this patch tested? unit tests Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #13838 from felixcheung/rjobgroup.
* [SPARK-15672][R][DOC] R programming guide updateKai Jiang2016-06-221-0/+77
| | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Guide for - UDFs with dapply, dapplyCollect - spark.lapply for running parallel R functions ## How was this patch tested? build locally <img width="654" alt="screen shot 2016-06-14 at 03 12 56" src="https://cloud.githubusercontent.com/assets/3419881/16039344/12a3b6a0-31de-11e6-8d77-fe23308075c0.png"> Author: Kai Jiang <jiangkai@gmail.com> Closes #13660 from vectorijk/spark-15672-R-guide-update.
* [SQL][DOC] SQL programming guide add deprecated methods in 2.0.0Felix Cheung2016-06-221-1/+5
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Doc changes ## How was this patch tested? manual liancheng Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #13827 from felixcheung/sqldocdeprecate.
* [SPARK-16045][ML][DOC] Spark 2.0 ML.feature: doc update for stopwords and ↵Yuhao Yang2016-06-211-6/+10
| | | | | | | | | | | | | | | | | | binarizer ## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-16045 2.0 Audit: Update document for StopWordsRemover and Binarizer. ## How was this patch tested? manual review for doc Author: Yuhao Yang <hhbyyh@gmail.com> Author: Yuhao Yang <yuhao.yang@intel.com> Closes #13375 from hhbyyh/stopdoc.
* [SPARK-15894][SQL][DOC] Update docs for controlling #partitionsTakeshi YAMAMURO2016-06-211-0/+17
| | | | | | | | | | | | ## What changes were proposed in this pull request? Update docs for two parameters `spark.sql.files.maxPartitionBytes` and `spark.sql.files.openCostInBytes ` in Other Configuration Options. ## How was this patch tested? N/A Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #13797 from maropu/SPARK-15894-2.
* [SPARK-15863][SQL][DOC][SPARKR] sql programming guide updates to include ↵Felix Cheung2016-06-212-19/+17
| | | | | | | | | | | | | | | | | | sparkSession in R ## What changes were proposed in this pull request? Update doc as per discussion in PR #13592 ## How was this patch tested? manual shivaram liancheng Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #13799 from felixcheung/rsqlprogrammingguide.
* [SPARK-16025][CORE] Document OFF_HEAP storage level in 2.0Eric Liang2016-06-201-0/+5
| | | | | | | | This has changed from 1.6, and now stores memory off-heap using spark's off-heap support instead of in tachyon. Author: Eric Liang <ekl@databricks.com> Closes #13744 from ericl/spark-16025.
* [SPARK-15863][SQL][DOC] Initial SQL programming guide update for Spark 2.0Cheng Lian2016-06-201-288/+317
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Initial SQL programming guide update for Spark 2.0. Contents like 1.6 to 2.0 migration guide are still incomplete. We may also want to add more examples for Scala/Java Dataset typed transformations. ## How was this patch tested? N/A Author: Cheng Lian <lian@databricks.com> Closes #13592 from liancheng/sql-programming-guide-2.0.
* [SPARK-15159][SPARKR] SparkSession roxygen2 doc, programming guide, example ↵Felix Cheung2016-06-201-51/+48
| | | | | | | | | | | | | | | | | updates ## What changes were proposed in this pull request? roxygen2 doc, programming guide, example updates ## How was this patch tested? manual checks shivaram Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #13751 from felixcheung/rsparksessiondoc.
* [SPARK-16040][MLLIB][DOC] spark.mllib PIC document extra line of refernecewm624@hotmail.com2016-06-191-4/+0
| | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? In the 2.0 document, Line "A full example that produces the experiment described in the PIC paper can be found under examples/." is redundant. There is already "Find full example code at "examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala" in the Spark repo.". We should remove the first line, which is consistent with other documents. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Manual test Author: wm624@hotmail.com <wm624@hotmail.com> Closes #13755 from wangmiao1981/doc.
* [SPARK-15129][R][DOC] R API changes in MLGayathriMurali2016-06-171-58/+19
| | | | | | | | | | ## What changes were proposed in this pull request? Make user guide changes to SparkR documentation for all changes that happened in 2.0 to Machine Learning APIs Author: GayathriMurali <gayathri.m@intel.com> Closes #13285 from GayathriMurali/SPARK-15129.
* [SPARK-15966][DOC] Add closing tag to fix rendering issue for Spark monitoringDhruve Ashar2016-06-161-1/+1
| | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Adds the missing closing tag for spark.ui.view.acls.groups ## How was this patch tested? I built the docs locally and verified the changed in browser. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) **Before:** ![image](https://cloud.githubusercontent.com/assets/7732317/16135005/49fc0724-33e6-11e6-9390-98711593fa5b.png) **After:** ![image](https://cloud.githubusercontent.com/assets/7732317/16135021/62b5c4a8-33e6-11e6-8118-b22fda5c66eb.png) Author: Dhruve Ashar <dhruveashar@gmail.com> Closes #13719 from dhruve/doc/SPARK-15966.
* [SPARK-15608][ML][EXAMPLES][DOC] add examples and documents of ml.isotonic ↵WeichenXu2016-06-161-0/+70
| | | | | | | | | | | | | | | | | | | | | | | | | regression ## What changes were proposed in this pull request? add ml doc for ml isotonic regression add scala example for ml isotonic regression add java example for ml isotonic regression add python example for ml isotonic regression modify scala example for mllib isotonic regression modify java example for mllib isotonic regression modify python example for mllib isotonic regression add data/mllib/sample_isotonic_regression_libsvm_data.txt delete data/mllib/sample_isotonic_regression_data.txt ## How was this patch tested? N/A Author: WeichenXu <WeichenXu123@outlook.com> Closes #13381 from WeichenXu123/add_isotonic_regression_doc.
* [SPARK-15796][CORE] Reduce spark.memory.fraction default to avoid ↵Sean Owen2016-06-162-4/+21
| | | | | | | | | | | | | | | | overrunning old gen in JVM default config ## What changes were proposed in this pull request? Reduce `spark.memory.fraction` default to 0.6 in order to make it fit within default JVM old generation size (2/3 heap). See JIRA discussion. This means a full cache doesn't spill into the new gen. CC andrewor14 ## How was this patch tested? Jenkins tests. Author: Sean Owen <sowen@cloudera.com> Closes #13618 from srowen/SPARK-15796.
* [SPARK-7848][STREAMING][UPDATE SPARKSTREAMING DOCS TO INCORPORATE IMPORTANT ↵Nirman Narang2016-06-151-0/+19
| | | | | | | | | | POINTS.] Updated the SparkStreaming Doc with some important points. Author: Nirman Narang <narang@us.ibm.com> Closes #11114 from nirmannarang/SPARK-7848.
* [DOCUMENTATION] fixed typos in python programming guideMortada Mehyar2016-06-141-3/+3
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? minor typo ## How was this patch tested? minor typo in the doc, should be self explanatory Author: Mortada Mehyar <mortada.mehyar@gmail.com> Closes #13639 from mortada/typo.
* [SPARK-15086][CORE][STREAMING] Deprecate old Java accumulator APISean Owen2016-06-122-6/+6
| | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? - Deprecate old Java accumulator API; should use Scala now - Update Java tests and examples - Don't bother testing old accumulator API in Java 8 (too) - (fix a misspelling too) ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #13606 from srowen/SPARK-15086.
* [SPARK-15806][DOCUMENTATION] update doc for SPARK_MASTER_IPbomeng2016-06-121-2/+2
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? SPARK_MASTER_IP is a deprecated environment variable. It is replaced by SPARK_MASTER_HOST according to MasterArguments.scala. ## How was this patch tested? Manually verified. Author: bomeng <bmeng@us.ibm.com> Closes #13543 from bomeng/SPARK-15806.
* [SPARK-15781][DOCUMENTATION] remove deprecated environment variable docbomeng2016-06-121-9/+0
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Like `SPARK_JAVA_OPTS` and `SPARK_CLASSPATH`, we will remove the document for `SPARK_WORKER_INSTANCES` to discourage user not to use them. If they are actually used, SparkConf will show a warning message as before. ## How was this patch tested? Manually tested. Author: bomeng <bmeng@us.ibm.com> Closes #13533 from bomeng/SPARK-15781.
* [SPARK-15883][MLLIB][DOCS] Fix broken links in mllib documentsDongjoon Hyun2016-06-118-29/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This issue fixes all broken links on Spark 2.0 preview MLLib documents. Also, this contains some editorial change. **Fix broken links** * mllib-data-types.md * mllib-decision-tree.md * mllib-ensembles.md * mllib-feature-extraction.md * mllib-pmml-model-export.md * mllib-statistics.md **Fix malformed section header and scala coding style** * mllib-linear-methods.md **Replace indirect forward links with direct one** * ml-classification-regression.md ## How was this patch tested? Manual tests (with `cd docs; jekyll build`.) Author: Dongjoon Hyun <dongjoon@apache.org> Closes #13608 from dongjoon-hyun/SPARK-15883.
* [SPARK-15879][DOCS][UI] Update logo in UI and docs to add "Apache"Sean Owen2016-06-115-0/+0
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Use new Spark logo including "Apache" (now, with crushed PNGs). Remove old unreferenced logo files. ## How was this patch tested? Manual check of generated HTML site and Spark UI. I searched for references to the deleted files to make sure they were not used. Author: Sean Owen <sowen@cloudera.com> Closes #13609 from srowen/SPARK-15879.
* [DOCUMENTATION] fixed groupby aggregation example for pysparkMortada Mehyar2016-06-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? fixing documentation for the groupby/agg example in python ## How was this patch tested? the existing example in the documentation dose not contain valid syntax (missing parenthesis) and is not using `Column` in the expression for `agg()` after the fix here's how I tested it: ``` In [1]: from pyspark.sql import Row In [2]: import pyspark.sql.functions as func In [3]: %cpaste Pasting code; enter '--' alone on the line to stop or use Ctrl-D. :records = [{'age': 19, 'department': 1, 'expense': 100}, : {'age': 20, 'department': 1, 'expense': 200}, : {'age': 21, 'department': 2, 'expense': 300}, : {'age': 22, 'department': 2, 'expense': 300}, : {'age': 23, 'department': 3, 'expense': 300}] :-- In [4]: df = sqlContext.createDataFrame([Row(**d) for d in records]) In [5]: df.groupBy("department").agg(df["department"], func.max("age"), func.sum("expense")).show() +----------+----------+--------+------------+ |department|department|max(age)|sum(expense)| +----------+----------+--------+------------+ | 1| 1| 20| 300| | 2| 2| 22| 600| | 3| 3| 23| 300| +----------+----------+--------+------------+ Author: Mortada Mehyar <mortada.mehyar@gmail.com> Closes #13587 from mortada/groupby_agg_doc_fix.
* [DOCUMENTATION] Fixed target JAR pathprabs2016-06-081-2/+2
| | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Mentioned Scala version in the sbt configuration file is 2.11, so the path of the target JAR should be `/target/scala-2.11/simple-project_2.11-1.0.jar` ## How was this patch tested? n/a Author: prabs <prabsmails@gmail.com> Author: Prabeesh K <prabsmails@gmail.com> Closes #13554 from prabeesh/master.
* [SPARK-13590][ML][DOC] Document spark.ml LiR, LoR and AFTSurvivalRegression ↵Yanbo Liang2016-06-071-0/+6
| | | | | | | | | | | | | | | | | | | | behavior difference ## What changes were proposed in this pull request? When fitting ```LinearRegressionModel```(by "l-bfgs" solver) and ```LogisticRegressionModel``` w/o intercept on dataset with constant nonzero column, spark.ml produce same model as R glmnet but different from LIBSVM. When fitting ```AFTSurvivalRegressionModel``` w/o intercept on dataset with constant nonzero column, spark.ml produce different model compared with R survival::survreg. We should output a warning message and clarify in document for this condition. ## How was this patch tested? Document change, no unit test. cc mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #12731 from yanboliang/spark-13590.