aboutsummaryrefslogtreecommitdiff
path: root/docs/building-spark.md
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wikiSean Owen2016-05-171-139/+156
| | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include: - Copying in / merging text from the "Useful Developer Tools" wiki, in areas of - Docker - R - Running one test - standardizing on ./build/mvn not mvn, and likewise for ./build/sbt - correcting some typos - standardizing code block formatting No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki ## How was this patch tested? Jekyll doc build and inspection of resulting HTML in browser. Author: Sean Owen <sowen@cloudera.com> Closes #13124 from srowen/SPARK-15333.
* [SPARK-13382][DOCS][PYSPARK] Update pyspark testing notes in build docsHolden Karau2016-05-101-0/+15
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? The current build documents don't specify that for PySpark tests we need to include Hive in the assembly otherwise the ORC tests fail. ## How was the this patch tested? Manually built the docs locally. Ran the provided build command follow by the PySpark SQL tests. ![pyspark2](https://cloud.githubusercontent.com/assets/59893/13190008/8829cde4-d70f-11e5-8ff5-a88b7894d2ad.png) Author: Holden Karau <holden@us.ibm.com> Closes #11278 from holdenk/SPARK-13382-update-pyspark-testing-notes-r2.
* [SPARK-14738][BUILD] Separate docker integration tests from main buildLuciano Resende2016-05-061-0/+12
| | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Create a maven profile for executing the docker integration tests using maven Remove docker integration tests from main sbt build Update documentation on how to run docker integration tests from sbt ## How was this patch tested? Manual test of the docker integration tests as in : mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.11 compile test ## Other comments Note that the the DB2 Docker Tests are still disabled as there is a kernel version issue on the AMPLab Jenkins slaves and we would need to get them on the right level before enabling those tests. They do run ok locally with the updates from PR #12348 Author: Luciano Resende <lresende@apache.org> Closes #12508 from lresende/docker.
* [SPARK-14601][DOC] Minor doc/usage changes related to removal of Spark assemblyMark Grover2016-04-141-1/+1
| | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Removing references to assembly jar in documentation. Adding an additional (previously undocumented) usage of spark-submit to run examples. ## How was this patch tested? Ran spark-submit usage to ensure formatting was fine. Ran examples using SparkSubmit. Author: Mark Grover <mark@apache.org> Closes #12365 from markgrover/spark-14601.
* [SPARK-14424][BUILD][DOCS] Update the build docs to switch from assembly to ↵Holden Karau2016-04-061-10/+3
| | | | | | | | | | | | | | | | | package and add a no… ## What changes were proposed in this pull request? Change our build docs & shell scripts to that developers are aware of the change from "assembly" to "package" ## How was this patch tested? Manually ran ./bin/spark-shell after ./build/sbt assembly and verified error message printed, ran new suggested build target and verified ./bin/spark-shell runs after this. Author: Holden Karau <holden@pigscanfly.ca> Author: Holden Karau <holden@us.ibm.com> Closes #12197 from holdenk/SPARK-1424-spark-class-broken-fix-build-docs.
* [SPARK-14281][TESTS] Fix java8-tests and simplify their buildJosh Rosen2016-03-311-4/+4
| | | | | | | | This patch fixes a compilation / build break in Spark's `java8-tests` and refactors their POM to simplify the build. See individual commit messages for more details. Author: Josh Rosen <joshrosen@databricks.com> Closes #12073 from JoshRosen/fix-java8-tests.
* [MINOR][DOCS] Update build descriptions and commandsDongjoon Hyun2016-03-181-4/+7
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR updates Scala and Hadoop versions in the build description and commands in `Building Spark` documents. ## How was this patch tested? N/A Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11838 from dongjoon-hyun/fix_doc_building_spark.
* [MINOR][DOC] Fix supported hive version in docDongjoon Hyun2016-03-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Today, Spark 1.6.1 and updated docs are release. Unfortunately, there is obsolete hive version information on docs: [Building Spark](http://spark.apache.org/docs/latest/building-spark.html#building-with-hive-and-jdbc-support). This PR fixes the following two lines. ``` -By default Spark will build with Hive 0.13.1 bindings. +By default Spark will build with Hive 1.2.1 bindings. -# Apache Hadoop 2.4.X with Hive 13 support +# Apache Hadoop 2.4.X with Hive 1.2.1 support ``` `sql/README.md` file also describe ## How was this patch tested? Manual. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11639 from dongjoon-hyun/fix_doc_hive_version.
* [SPARK-13596][BUILD] Move misc top-level build files into appropriate subdirsSean Owen2016-03-071-3/+3
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Move many top-level files in dev/ or other appropriate directory. In particular, put `make-distribution.sh` in `dev` and update docs accordingly. Remove deprecated `sbt/sbt`. I was (so far) unable to figure out how to move `tox.ini`. `scalastyle-config.xml` should be movable but edits to the project `.sbt` files didn't work; config file location is updatable for compile but not test scope. ## How was this patch tested? `./dev/run-tests` to verify RAT and checkstyle work. Jenkins tests for the rest. Author: Sean Owen <sowen@cloudera.com> Closes #11522 from srowen/SPARK-13596.
* [SPARK-13324][CORE][BUILD] Update plugin, test, example dependencies for 2.xSean Owen2016-02-171-1/+1
| | | | | | | | Phase 1: update plugin versions, test dependencies, some example and third-party versions Author: Sean Owen <sowen@cloudera.com> Closes #11206 from srowen/SPARK-13324.
* [SPARK-6363][BUILD] Make Scala 2.11 the default Scala versionJosh Rosen2016-01-301-6/+4
| | | | | | | | | | | | This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds). The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance). After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break. Author: Josh Rosen <joshrosen@databricks.com> Closes #10608 from JoshRosen/SPARK-6363.
* [SPARK-12842][TEST-HADOOP2.7] Add Hadoop 2.7 build profileJosh Rosen2016-01-151-1/+2
| | | | | | | | | | This patch adds a Hadoop 2.7 build profile in order to let us automate tests against that version. /cc rxin srowen Author: Josh Rosen <joshrosen@databricks.com> Closes #10775 from JoshRosen/add-hadoop-2.7-profile.
* [SPARK-11807] Remove support for Hadoop < 2.2Reynold Xin2015-12-211-14/+4
| | | | | | | | i.e. Hadoop 1 and Hadoop 2.0 Author: Reynold Xin <rxin@databricks.com> Closes #10404 from rxin/SPARK-11807.
* [SPARK-6152] Use shaded ASM5 to support closure cleaning of Java 8 compiled ↵Josh Rosen2015-11-111-0/+4
| | | | | | | | | | | | | | | | classes This patch modifies Spark's closure cleaner (and a few other places) to use ASM 5, which is necessary in order to support cleaning of closures that were compiled by Java 8. In order to avoid ASM dependency conflicts, Spark excludes ASM from all of its dependencies and uses a shaded version of ASM 4 that comes from `reflectasm` (see [SPARK-782](https://issues.apache.org/jira/browse/SPARK-782) and #232). This patch updates Spark to use a shaded version of ASM 5.0.4 that was published by the Apache XBean project; the POM used to create the shaded artifact can be found at https://github.com/apache/geronimo-xbean/blob/xbean-4.4/xbean-asm5-shaded/pom.xml. http://movingfulcrum.tumblr.com/post/80826553604/asm-framework-50-the-missing-migration-guide was a useful resource while upgrading the code to use the new ASM5 opcodes. I also added a new regression tests in the `java8-tests` subproject; the existing tests were insufficient to catch this bug, which only affected Scala 2.11 user code which was compiled targeting Java 8. Author: Josh Rosen <joshrosen@databricks.com> Closes #9512 from JoshRosen/SPARK-6152.
* [SPARK-11318] Include hive profile in make-distribution.sh commandtedyu2015-10-291-1/+1
| | | | | | Author: tedyu <yuzhihong@gmail.com> Closes #9281 from tedyu/master.
* [SPARK-11056] Improve documentation of SBT build.Kay Ousterhout2015-10-121-0/+5
| | | | | | | | | | | | | | This commit improves the documentation around building Spark to (1) recommend using SBT interactive mode to avoid the overhead of launching SBT and (2) refer to the wiki page that documents using SPARK_PREPEND_CLASSES to avoid creating the assembly jar for each compile. cc srowen Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #9068 from kayousterhout/SPARK-11056.
* [SPARK-10883] Add a note about how to build Spark sub-modules (reactor)Jean-Baptiste Onofré2015-10-081-0/+11
| | | | | | Author: Jean-Baptiste Onofré <jbonofre@apache.org> Closes #8993 from jbonofre/SPARK-10883-2.
* Docs small fixesJacek Laskowski2015-09-081-12/+11
| | | | | | Author: Jacek Laskowski <jacek@japila.pl> Closes #8629 from jaceklaskowski/docs-fixes.
* [SPARK-9521] [DOCS] Addendum. Require Maven 3.3.3+ in the buildSean Owen2015-08-041-1/+2
| | | | | | | | | | Follow on for #7852: Building Spark doc needs to refer to new Maven requirement too Author: Sean Owen <sowen@cloudera.com> Closes #7905 from srowen/SPARK-9521.2 and squashes the following commits: 73285df [Sean Owen] Follow on for #7852: Building Spark doc needs to refer to new Maven requirement too
* [SPARK-8401] [BUILD] Scala version switching build enhancementsMichael Allman2015-07-211-1/+1
| | | | | | | | | | | | | | | | | | | | | These commits address a few minor issues in the Scala cross-version support in the build: 1. Correct two missing `${scala.binary.version}` pom file substitutions. 2. Don't update `scala.binary.version` in parent POM. This property is set through profiles. 3. Update the source of the generated scaladocs in `docs/_plugins/copy_api_dirs.rb`. 4. Factor common code out of `dev/change-version-to-*.sh` and add some validation. We also test `sed` to see if it's GNU sed and try `gsed` as an alternative if not. This prevents the script from running with a non-GNU sed. This is my original work and I license this work to the Spark project under the Apache License. Author: Michael Allman <michael@videoamp.com> Closes #6832 from mallman/scala-versions and squashes the following commits: cde2f17 [Michael Allman] Delete dev/change-version-to-*.sh, replacing them with single dev/change-scala-version.sh script that takes a version as argument 02296f2 [Michael Allman] Make the scala version change scripts cross-platform by restricting ourselves to POSIX sed syntax instead of looking for GNU sed ad9b40a [Michael Allman] Factor change-scala-version.sh out of change-version-to-*.sh, adding command line argument validation and testing for GNU sed bdd20bf [Michael Allman] Update source of scaladocs when changing Scala version 475088e [Michael Allman] Replace jackson-module-scala_2.10 with jackson-module-scala_${scala.binary.version}
* [SPARK-7733] [CORE] [BUILD] Update build, code to use Java 7 for 1.5.0+Sean Owen2015-06-071-5/+1
| | | | | | | | | | Update build to use Java 7, and remove some comments and special-case support for Java 6. Author: Sean Owen <sowen@cloudera.com> Closes #6265 from srowen/SPARK-7733 and squashes the following commits: 59bda4e [Sean Owen] Update build to use Java 7, and remove some comments and special-case support for Java 6
* [DOCS] [MINOR] Update for the Hadoop versions table with hadoop-2.6Taka Shinagawa2015-05-301-0/+1
| | | | | | | | | | | Updated the doc for the hadoop-2.6 profile, which is new to Spark 1.4 Author: Taka Shinagawa <taka.epsilon@gmail.com> Closes #6450 from mrt/docfix2 and squashes the following commits: db1c43b [Taka Shinagawa] Updated the hadoop versions for hadoop-2.6 profile 323710e [Taka Shinagawa] The hadoop-2.6 profile is added to the Hadoop versions table
* [SPARK-7890] [DOCS] Document that Spark 2.11 now supports KafkaSean Owen2015-05-301-3/+1
| | | | | | | | | | | | Remove caveat about Kafka / JDBC not being supported for Scala 2.11 Author: Sean Owen <sowen@cloudera.com> Closes #6470 from srowen/SPARK-7890 and squashes the following commits: 4652634 [Sean Owen] One more rewording 7b7f3c8 [Sean Owen] Restore note about JDBC component 126744d [Sean Owen] Remove caveat about Kafka / JDBC not being supported for Scala 2.11
* [DOCS] Fixing broken "IDE setup" link in the Building Spark documentation.Mike Dusenberry2015-05-281-1/+1
| | | | | | | | | | The location of the IDE setup information has changed, so this just updates the link on the Building Spark page. Author: Mike Dusenberry <dusenberrymw@gmail.com> Closes #6467 from dusenberrymw/Fix_Broken_Link_On_Building_Spark_Doc and squashes the following commits: 75c533a [Mike Dusenberry] Fixing broken "IDE setup" link in the Building Spark documentation by pointing to new location.
* [SPARK-7850][BUILD] Hive 0.12.0 profile in POM should be removedCheolsoo Park2015-05-271-5/+1
| | | | | | | | | | | | I grep'ed hive-0.12.0 in the source code and removed all the profiles and doc references. Author: Cheolsoo Park <cheolsoop@netflix.com> Closes #6393 from piaozhexiu/SPARK-7850 and squashes the following commits: fb429ce [Cheolsoo Park] Remove hive-0.13.1 profile 82bf09a [Cheolsoo Park] Remove hive 0.12.0 shim code f3722da [Cheolsoo Park] Remove hive-0.12.0 profile and references from POM and build docs
* [SPARK-4556] [BUILD] binary distribution assembly can't run in local modeSean Owen2015-05-161-0/+10
| | | | | | | | | | | Add note on building a runnable distribution with make-distribution.sh Author: Sean Owen <sowen@cloudera.com> Closes #6186 from srowen/SPARK-4556 and squashes the following commits: 4002966 [Sean Owen] Add pointer to --help flag 9fa7883 [Sean Owen] Add note on building a runnable distribution with make-distribution.sh
* [SPARK-7249] Updated Hadoop dependencies due to inconsistency in the versionsFavioVazquez2015-05-141-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons. Changes proposed by vanzin resulting from previous pull-request https://github.com/apache/spark/pull/5783 that did not fixed the problem correctly. Please let me know if this is the correct way of doing this, the comments of vanzin are in the pull-request mentioned. Author: FavioVazquez <favio.vazquezp@gmail.com> Closes #5786 from FavioVazquez/update-hadoop-dependencies and squashes the following commits: 11670e5 [FavioVazquez] - Added missing instance of -Phadoop-2.2 in create-release.sh 379f50d [FavioVazquez] - Added instances of -Phadoop-2.2 in create-release.sh, run-tests, scalastyle and building-spark.md - Reconstructed docs to not ask users to rely on default behavior 3f9249d [FavioVazquez] Merge branch 'master' of https://github.com/apache/spark into update-hadoop-dependencies 31bdafa [FavioVazquez] - Added missing instances in -Phadoop-1 in create-release.sh, run-tests and in the building-spark documentation cbb93e8 [FavioVazquez] - Added comment related to SPARK-3710 about hadoop-yarn-server-tests in Hadoop 2.2 that fails to pull some needed dependencies 83dc332 [FavioVazquez] - Cleaned up the main POM concerning the yarn profile - Erased hadoop-2.2 profile from yarn/pom.xml and its content was integrated into yarn/pom.xml 93f7624 [FavioVazquez] - Deleted unnecessary comments and <activation> tag on the YARN profile in the main POM 668d126 [FavioVazquez] - Moved <dependencies> <activation> and <properties> sections of the hadoop-2.2 profile in the YARN POM to the YARN profile in the root POM - Erased unnecessary hadoop-2.2 profile from the YARN POM fda6a51 [FavioVazquez] - Updated hadoop1 releases in create-release.sh due to changes in the default hadoop version set - Erased unnecessary instance of -Dyarn.version=2.2.0 in create-release.sh - Prettify comment in yarn/pom.xml 0470587 [FavioVazquez] - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in create-release.sh - Updated how the releases are made in the create-release.sh no that the default hadoop version is the 2.2.0 - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in scalastyle - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in run-tests - Better example given in the hadoop-third-party-distributions.md now that the default hadoop version is 2.2.0 a650779 [FavioVazquez] - Default value of avro.mapred.classifier has been set to hadoop2 in pom.xml - Cleaned up hadoop-2.3 and 2.4 profiles due to change in the default set in avro.mapred.classifier in pom.xml 199f40b [FavioVazquez] - Erased unnecessary CDH5-specific note in docs/building-spark.md - Remove example of instance -Phadoop-2.2 -Dhadoop.version=2.2.0 in docs/building-spark.md - Enabled hadoop-2.2 profile when the Hadoop version is 2.2.0, which is now the default .Added comment in the yarn/pom.xml to specify that. 88a8b88 [FavioVazquez] - Simplified Hadoop profiles due to new setting of global properties in the pom.xml file - Added comment to specify that the hadoop-2.2 profile is now the default hadoop profile in the pom.xml file - Erased hadoop-2.2 from related hadoop profiles now that is a no-op in the make-distribution.sh file 70b8344 [FavioVazquez] - Fixed typo in the make-distribution.sh file and added hadoop-1 in the Related profiles 287fa2f [FavioVazquez] - Updated documentation about specifying the hadoop version in building-spark. Now is clear that Spark will build against Hadoop 2.2.0 by default. - Added Cloudera CDH 5.3.3 without MapReduce example in the building-spark doc. 1354292 [FavioVazquez] - Fixed hadoop-1 version to match jenkins build profile in hadoop1.0 tests and documentation 6b4bfaf [FavioVazquez] - Cleanup in hadoop-2.x profiles since they contained mostly redundant stuff. 7e9955d [FavioVazquez] - Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons 660decc [FavioVazquez] - Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons ec91ce3 [FavioVazquez] - Updated protobuf-java version of com.google.protobuf dependancy to fix blocking error when connecting to HDFS via the Hadoop Cloudera HDFS CDH5 (fix for 2.5.0-cdh5.3.3 version)
* [SPARK-7302] [DOCS] SPARK building documentation still mentions building for ↵Sean Owen2015-05-031-4/+0
| | | | | | | | | | | | | | | | | | yarn 0.23 Remove references to Hadoop 0.23 CC tgravescs Is this what you had in mind? basically all refs to 0.23? We don't support YARN 0.23, but also don't support Hadoop 0.23 anymore AFAICT. There are no builds or releases for it. In fact, on a related note, refs to CDH3 (Hadoop 0.20.2) should be removed as this certainly isn't supported either. Author: Sean Owen <sowen@cloudera.com> Closes #5863 from srowen/SPARK-7302 and squashes the following commits: 42f5d1e [Sean Owen] Remove CDH3 (Hadoop 0.20.2) refs too dad02e3 [Sean Owen] Remove references to Hadoop 0.23
* [docs] [SPARK-4820] Spark build encounters "File name too long" on some ↵Theodore Vasiloudis2015-03-171-0/+12
| | | | | | | | | | | | encrypted filesystems Added a note instructing users how to build Spark in an encrypted file system. Author: Theodore Vasiloudis <tvas@sics.se> Closes #5041 from thvasilo/patch-2 and squashes the following commits: 09d890b [Theodore Vasiloudis] Workaroung for buiding in an encrypted filesystem
* SPARK-1911 [DOCS] Warn users if their assembly jars are not built with Java 6Sean Owen2015-03-031-0/+4
| | | | | | | | | | | | Add warning about building with Java 7+ and running the JAR on early Java 6. CC andrewor14 Author: Sean Owen <sowen@cloudera.com> Closes #4874 from srowen/SPARK-1911 and squashes the following commits: 79fa2f6 [Sean Owen] Add warning about building with Java 7+ and running the JAR on early Java 6.
* SPARK-5850: Remove experimental label for Scala 2.11 and FlumePollingStreamPatrick Wendell2015-02-161-3/+3
| | | | | | | | Author: Patrick Wendell <patrick@databricks.com> Closes #4638 from pwendell/SPARK-5850 and squashes the following commits: 386126f [Patrick Wendell] SPARK-5850: Remove experimental label for Scala 2.11 and FlumePollingStream.
* SPARK-5727 [BUILD] Remove Debian packagingSean Owen2015-02-121-10/+0
| | | | | | | | | | (for master / 1.4 only) Author: Sean Owen <sowen@cloudera.com> Closes #4526 from srowen/SPARK-5727.2 and squashes the following commits: 83ba49c [Sean Owen] Remove Debian packaging
* SPARK-5727 [BUILD] Deprecate Debian packagingSean Owen2015-02-111-0/+2
| | | | | | | | | | | This just adds a deprecation message. It's intended for backporting to branch 1.3 but can go in master too, to be followed by another PR that removes it for 1.4. Author: Sean Owen <sowen@cloudera.com> Closes #4516 from srowen/SPARK-5727.1 and squashes the following commits: d48989f [Sean Owen] Refer to Spark 1.4 6c1c8b3 [Sean Owen] Deprecate Debian packaging
* [Doc] Minor: Fixes several formatting issuesCheng Lian2015-02-021-1/+4
| | | | | | | | | | | | | | | | Fixes several minor formatting issues in the [Continuous Compilation] [1] section. [1]: http://spark.apache.org/docs/latest/building-spark.html#continuous-compilation <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4316) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #4316 from liancheng/fix-build-instruction-docs and squashes the following commits: 0a92e01 [Cheng Lian] Fixes several formatting issues
* SPARK-5136 [DOCS] Improve documentation around setting up Spark IntelliJ projectSean Owen2015-01-091-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | This PR simply points to the IntelliJ wiki page instead of also including IntelliJ notes in the docs. The intent however is to also update the wiki page with updated tips. This is the text I propose for the IntelliJ section on the wiki. I realize it omits some of the existing instructions on the wiki, about enabling Hive, but I think those are actually optional. ------ IntelliJ supports both Maven- and SBT-based projects. It is recommended, however, to import Spark as a Maven project. Choose "Import Project..." from the File menu, and select the `pom.xml` file in the Spark root directory. It is fine to leave all settings at their default values in the Maven import wizard, with two caveats. First, it is usually useful to enable "Import Maven projects automatically", sincchanges to the project structure will automatically update the IntelliJ project. Second, note the step that prompts you to choose active Maven build profiles. As documented above, some build configuration require specific profiles to be enabled. The same profiles that are enabled with `-P[profile name]` above may be enabled on this screen. For example, if developing for Hadoop 2.4 with YARN support, enable profiles `yarn` and `hadoop-2.4`. These selections can be changed later by accessing the "Maven Projects" tool window from the View menu, and expanding the Profiles section. "Rebuild Project" can fail the first time the project is compiled, because generate source files are not automatically generated. Try clicking the "Generate Sources and Update Folders For All Projects" button in the "Maven Projects" tool window to manually generate these sources. Compilation may fail with an error like "scalac: bad option: -P:/home/jakub/.m2/repository/org/scalamacros/paradise_2.10.4/2.0.1/paradise_2.10.4-2.0.1.jar". If so, go to Preferences > Build, Execution, Deployment > Scala Compiler and clear the "Additional compiler options" field. It will work then although the option will come back when the project reimports. Author: Sean Owen <sowen@cloudera.com> Closes #3952 from srowen/SPARK-5136 and squashes the following commits: f3baa66 [Sean Owen] Point to new IJ / Eclipse wiki link 016b7df [Sean Owen] Point to IntelliJ wiki page instead of also including IntelliJ notes in the docs
* [SPARK-4501][Core] - Create build/mvn to automatically download ↵Brennon York2014-12-271-14/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | maven/zinc/scalac Creates a top level directory script (as `build/mvn`) to automatically download zinc and the specific version of scala used to easily build spark. This will also download and install maven if the user doesn't already have it and all packages are hosted under the `build/` directory. Tested on both Linux and OSX OS's and both work. All commands pass through to the maven binary so it acts exactly as a traditional maven call would. Author: Brennon York <brennon.york@capitalone.com> Closes #3707 from brennonyork/SPARK-4501 and squashes the following commits: 0e5a0e4 [Brennon York] minor incorrect doc verbage (with -> this) 9b79e38 [Brennon York] fixed merge conflicts with dev/run-tests, properly quoted args in sbt/sbt, fixed bug where relative paths would fail if passed in from build/mvn d2d41b6 [Brennon York] added blurb about leverging zinc with build/mvn b979c58 [Brennon York] updated the merge conflict c5634de [Brennon York] updated documentation to overview build/mvn, updated all points where sbt/sbt was referenced with build/sbt b8437ba [Brennon York] set progress bars for curl and wget when not run on jenkins, no progress bar when run on jenkins, moved sbt script to build/sbt, wrote stub and warning under sbt/sbt which calls build/sbt, modified build/sbt to use the correct directory, fixed bug in build/sbt-launch-lib.bash to correctly pull the sbt version be11317 [Brennon York] added switch to silence download progress only if AMPLAB_JENKINS is set 28d0a99 [Brennon York] updated to remove the python dependency, uses grep instead 7e785a6 [Brennon York] added silent and quiet flags to curl and wget respectively, added single echo output to denote start of a download if download is needed 14a5da0 [Brennon York] removed unnecessary zinc output on startup 1af4a94 [Brennon York] fixed bug with uppercase vs lowercase variable 3e8b9b3 [Brennon York] updated to properly only restart zinc if it was freshly installed a680d12 [Brennon York] Added comments to functions and tested various mvn calls bb8cc9d [Brennon York] removed package files ef017e6 [Brennon York] removed OS complexities, setup generic install_app call, removed extra file complexities, removed help, removed forced install (defaults now), removed double-dash from cli 07bf018 [Brennon York] Updated to specifically handle pulling down the correct scala version f914dea [Brennon York] Beginning final portions of localized scala home 69c4e44 [Brennon York] working linux and osx installers for purely local mvn build 4a1609c [Brennon York] finalizing working linux install for maven to local ./build/apache-maven folder cbfcc68 [Brennon York] Changed the default sbt/sbt to build/sbt and added a build/mvn which will automatically download, install, and execute maven with zinc for easier build capability
* [SPARK-4953][Doc] Fix the description of building Spark with YARNKousuke Saruta2014-12-251-1/+1
| | | | | | | | | | | | At the section "Specifying the Hadoop Version" In building-spark.md, there is description about building with YARN with Hadoop 0.23. Spark 1.3.0 will not support Hadoop 0.23 so we should fix the description. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3787 from sarutak/SPARK-4953 and squashes the following commits: ee9c355 [Kousuke Saruta] Removed description related to a specific vendor 9ab0c24 [Kousuke Saruta] Fix the description about building SPARK with YARN
* [SPARK-4668] Fix some documentation typos.Ryan Williams2014-12-151-1/+15
| | | | | | | | | | | | | | | | | Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #3523 from ryan-williams/tweaks and squashes the following commits: d2eddaa [Ryan Williams] code review feedback ce27fc1 [Ryan Williams] CoGroupedRDD comment nit c6cfad9 [Ryan Williams] remove unnecessary if statement b74ea35 [Ryan Williams] comment fix b0221f0 [Ryan Williams] fix a gendered pronoun c71ffed [Ryan Williams] use names on a few boolean parameters 89954aa [Ryan Williams] clarify some comments in {Security,Shuffle}Manager e465dac [Ryan Williams] Saved building-spark.md with Dillinger.io 83e8358 [Ryan Williams] fix pom.xml typo dc4662b [Ryan Williams] typo fixes in tuning.md, configuration.md
* SPARK-4338. [YARN] Ditch yarn-alpha.Sandy Ryza2014-12-091-23/+2
| | | | | | | | | | | Sorry if this is a little premature with 1.2 still not out the door, but it will make other work like SPARK-4136 and SPARK-2089 a lot easier. Author: Sandy Ryza <sandy@cloudera.com> Closes #3215 from sryza/sandy-spark-4338 and squashes the following commits: 1c5ac08 [Sandy Ryza] Update building Spark docs and remove unnecessary newline 9c1421c [Sandy Ryza] SPARK-4338. Ditch yarn-alpha.
* [DOCS][BUILD] Add instruction to use change-version-to-2.11.sh in 'Building ↵Takuya UESHIN2014-11-301-0/+1
| | | | | | | | | | | | for Scala 2.11'. To build with Scala 2.11, we have to execute `change-version-to-2.11.sh` before Maven execute, otherwise inter-module dependencies are broken. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #3361 from ueshin/docs/building-spark_2.11 and squashes the following commits: 1d29126 [Takuya UESHIN] Add instruction to use change-version-to-2.11.sh in 'Building for Scala 2.11'.
* [DOC][Build] Wrong cmd for build spark with apache hadoop 2.4.X and hive 12wangfei2014-11-241-1/+1
| | | | | | | | | Author: wangfei <wangfei1@huawei.com> Closes #3335 from scwf/patch-10 and squashes the following commits: d343113 [wangfei] add '-Phive' 60d595e [wangfei] [DOC] Wrong cmd for build spark with apache hadoop 2.4.X and Hive 12 support
* SPARK-4457. Document how to build for Hadoop versions greater than 2.4Sandy Ryza2014-11-241-2/+5
| | | | | | | | | | Author: Sandy Ryza <sandy@cloudera.com> Closes #3322 from sryza/sandy-spark-4457 and squashes the following commits: 5e72b77 [Sandy Ryza] Feedback 0cf05c1 [Sandy Ryza] Caveat be8084b [Sandy Ryza] SPARK-4457. Document how to build for Hadoop versions greater than 2.4
* SPARK-4375. no longer require -Pscala-2.10Sandy Ryza2014-11-141-2/+2
| | | | | | | | | | | | It seems like the winds might have moved away from this approach, but wanted to post the PR anyway because I got it working and to show what it would look like. Author: Sandy Ryza <sandy@cloudera.com> Closes #3239 from sryza/sandy-spark-4375 and squashes the following commits: 0ffbe95 [Sandy Ryza] Enable -Dscala-2.11 in sbt cd42d94 [Sandy Ryza] Update doc f6644c3 [Sandy Ryza] SPARK-4375 take 2
* Support cross building for Scala 2.11Prashant Sharma2014-11-111-11/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's give this another go using a version of Hive that shades its JLine dependency. Author: Prashant Sharma <prashant.s@imaginea.com> Author: Patrick Wendell <pwendell@gmail.com> Closes #3159 from pwendell/scala-2.11-prashant and squashes the following commits: e93aa3e [Patrick Wendell] Restoring -Phive-thriftserver profile and cleaning up build script. f65d17d [Patrick Wendell] Fixing build issue due to merge conflict a8c41eb [Patrick Wendell] Reverting dev/run-tests back to master state. 7a6eb18 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into scala-2.11-prashant 583aa07 [Prashant Sharma] REVERT ME: removed hive thirftserver 3680e58 [Prashant Sharma] Revert "REVERT ME: Temporarily removing some Cli tests." 935fb47 [Prashant Sharma] Revert "Fixed by disabling a few tests temporarily." 925e90f [Prashant Sharma] Fixed by disabling a few tests temporarily. 2fffed3 [Prashant Sharma] Exclude groovy from sbt build, and also provide a way for such instances in future. 8bd4e40 [Prashant Sharma] Switched to gmaven plus, it fixes random failures observer with its predecessor gmaven. 5272ce5 [Prashant Sharma] SPARK_SCALA_VERSION related bugs. 2121071 [Patrick Wendell] Migrating version detection to PySpark b1ed44d [Patrick Wendell] REVERT ME: Temporarily removing some Cli tests. 1743a73 [Patrick Wendell] Removing decimal test that doesn't work with Scala 2.11 f5cad4e [Patrick Wendell] Add Scala 2.11 docs 210d7e1 [Patrick Wendell] Revert "Testing new Hive version with shaded jline" 48518ce [Patrick Wendell] Remove association of Hive and Thriftserver profiles. e9d0a06 [Patrick Wendell] Revert "Enable thritfserver for Scala 2.10 only" 67ec364 [Patrick Wendell] Guard building of thriftserver around Scala 2.10 check 8502c23 [Patrick Wendell] Enable thritfserver for Scala 2.10 only e22b104 [Patrick Wendell] Small fix in pom file ec402ab [Patrick Wendell] Various fixes 0be5a9d [Patrick Wendell] Testing new Hive version with shaded jline 4eaec65 [Prashant Sharma] Changed scripts to ignore target. 5167bea [Prashant Sharma] small correction a4fcac6 [Prashant Sharma] Run against scala 2.11 on jenkins. 80285f4 [Prashant Sharma] MAven equivalent of setting spark.executor.extraClasspath during tests. 034b369 [Prashant Sharma] Setting test jars on executor classpath during tests from sbt. d4874cb [Prashant Sharma] Fixed Python Runner suite. null check should be first case in scala 2.11. 6f50f13 [Prashant Sharma] Fixed build after rebasing with master. We should use ${scala.binary.version} instead of just 2.10 e56ca9d [Prashant Sharma] Print an error if build for 2.10 and 2.11 is spotted. 937c0b8 [Prashant Sharma] SCALA_VERSION -> SPARK_SCALA_VERSION cb059b0 [Prashant Sharma] Code review 0476e5e [Prashant Sharma] Scala 2.11 support with repl and all build changes.
* [SPARK-4177][Doc]update build doc since JDBC/CLI support hive 13 nowwangfei2014-11-021-10/+7
| | | | | | | | | Author: wangfei <wangfei1@huawei.com> Closes #3042 from scwf/patch-9 and squashes the following commits: 3784ed1 [wangfei] remove 'TODO' 1891553 [wangfei] update build doc since JDBC/CLI support hive 13
* [SPARK-4032] Deprecate YARN alpha support in Spark 1.2Prashant Sharma2014-10-271-1/+3
| | | | | | | | | | Author: Prashant Sharma <prashant.s@imaginea.com> Closes #2878 from ScrapCodes/SPARK-4032/deprecate-yarn-alpha and squashes the following commits: 17e9857 [Prashant Sharma] added deperecated comment to Client and ExecutorRunnable. 3a34b1e [Prashant Sharma] Updated docs... 4608dea [Prashant Sharma] [SPARK-4032] Deprecate YARN alpha support in Spark 1.2
* [SPARK-2706][SQL] Enable Spark to support Hive 0.13Zhan Zhang2014-10-241-9/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Given that a lot of users are trying to use hive 0.13 in spark, and the incompatibility between hive-0.12 and hive-0.13 on the API level I want to propose following approach, which has no or minimum impact on existing hive-0.12 support, but be able to jumpstart the development of hive-0.13 and future version support. Approach: Introduce “hive-version” property, and manipulate pom.xml files to support different hive version at compiling time through shim layer, e.g., hive-0.12.0 and hive-0.13.1. More specifically, 1. For each different hive version, there is a very light layer of shim code to handle API differences, sitting in sql/hive/hive-version, e.g., sql/hive/v0.12.0 or sql/hive/v0.13.1 2. Add a new profile hive-default active by default, which picks up all existing configuration and hive-0.12.0 shim (v0.12.0) if no hive.version is specified. 3. If user specifies different version (currently only 0.13.1 by -Dhive.version = 0.13.1), hive-versions profile will be activated, which pick up hive-version specific shim layer and configuration, mainly the hive jars and hive-version shim, e.g., v0.13.1. 4. With this approach, nothing is changed with current hive-0.12 support. No change by default: sbt/sbt -Phive For example: sbt/sbt -Phive -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 assembly To enable hive-0.13: sbt/sbt -Dhive.version=0.13.1 For example: sbt/sbt -Dhive.version=0.13.1 -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 assembly Note that in hive-0.13, hive-thriftserver is not enabled, which should be fixed by other Jira, and we don’t need -Phive with -Dhive.version in building (probably we should use -Phive -Dhive.version=xxx instead after thrift server is also supported in hive-0.13.1). Author: Zhan Zhang <zhazhan@gmail.com> Author: zhzhan <zhazhan@gmail.com> Author: Patrick Wendell <pwendell@gmail.com> Closes #2241 from zhzhan/spark-2706 and squashes the following commits: 3ece905 [Zhan Zhang] minor fix 410b668 [Zhan Zhang] solve review comments cbb4691 [Zhan Zhang] change run-test for new options 0d4d2ed [Zhan Zhang] rebase 497b0f4 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 8fad1cf [Zhan Zhang] change the pom file and make hive-0.13.1 as the default ab028d1 [Zhan Zhang] rebase 4a2e36d [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 4cb1b93 [zhzhan] Merge pull request #1 from pwendell/pr-2241 b0478c0 [Patrick Wendell] Changes to simplify the build of SPARK-2706 2b50502 [Zhan Zhang] rebase a72c0d4 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark cb22863 [Zhan Zhang] correct the typo 20f6cf7 [Zhan Zhang] solve compatability issue f7912a9 [Zhan Zhang] rebase and solve review feedback 301eb4a [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 10c3565 [Zhan Zhang] address review comments 6bc9204 [Zhan Zhang] rebase and remove temparory repo d3aa3f2 [Zhan Zhang] Merge branch 'master' into spark-2706 cedcc6f [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 3ced0d7 [Zhan Zhang] rebase d9b981d [Zhan Zhang] rebase and fix error due to rollback adf4924 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 3dd50e8 [Zhan Zhang] solve conflicts and remove unnecessary implicts d10bf00 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark dc7bdb3 [Zhan Zhang] solve conflicts 7e0cc36 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark d7c3e1e [Zhan Zhang] Merge branch 'master' into spark-2706 68deb11 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark d48bd18 [Zhan Zhang] address review comments 3ee3b2b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 57ea52e [Zhan Zhang] Merge branch 'master' into spark-2706 2b0d513 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 9412d24 [Zhan Zhang] address review comments f4af934 [Zhan Zhang] rebase 1ccd7cc [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 128b60b [Zhan Zhang] ignore 0.12.0 test cases for the time being af9feb9 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 5f5619f [Zhan Zhang] restructure the directory and different hive version support 05d3683 [Zhan Zhang] solve conflicts e4c1982 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 94b4fdc [Zhan Zhang] Spark-2706: hive-0.13.1 support on spark 87ebf3b [Zhan Zhang] Merge branch 'master' into spark-2706 921e914 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark f896b2a [Zhan Zhang] Merge branch 'master' into spark-2706 789ea21 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark cb53a2c [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark f6a8a40 [Zhan Zhang] revert ba14f28 [Zhan Zhang] test dbedff3 [Zhan Zhang] Merge remote-tracking branch 'upstream/master' 70964fe [Zhan Zhang] revert fe0f379 [Zhan Zhang] Merge branch 'master' of https://github.com/zhzhan/spark 70ffd93 [Zhan Zhang] revert 42585ec [Zhan Zhang] test 7d5fce2 [Zhan Zhang] test
* [SPARK-3765][Doc] Add test information to sbt build docsscwf2014-10-051-0/+15
| | | | | | | | | | Add testing with sbt to doc ```building-spark.md``` Author: scwf <wangfei1@huawei.com> Closes #2629 from scwf/sbt-doc and squashes the following commits: fd9cf29 [scwf] add testing with sbt to docs
* [SPARK-3763] The example of building with sbt should be "sbt assembly" ↵Kousuke Saruta2014-10-031-1/+1
| | | | | | | | | | | | instead of "sbt compile" In building-spark.md, there are some examples for making assembled package with maven but the example for building with sbt is only about for compiling. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #2627 from sarutak/SPARK-3763 and squashes the following commits: fadb990 [Kousuke Saruta] Modified the example to build with sbt in building-spark.md
* SPARK-3069 [DOCS] Build instructions in README are outdatedSean Owen2014-09-161-0/+180
Here's my crack at Bertrand's suggestion. The Github `README.md` contains build info that's outdated. It should just point to the current online docs, and reflect that Maven is the primary build now. (Incidentally, the stanza at the end about contributions of original work should go in https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark too. It won't hurt to be crystal clear about the agreement to license, given that ICLAs are not required of anyone here.) Author: Sean Owen <sowen@cloudera.com> Closes #2014 from srowen/SPARK-3069 and squashes the following commits: 501507e [Sean Owen] Note that Zinc is for Maven builds too db2bd97 [Sean Owen] sbt -> sbt/sbt and add note about zinc be82027 [Sean Owen] Fix additional occurrences of building-with-maven -> building-spark 91c921f [Sean Owen] Move building-with-maven to building-spark and create a redirect. Update doc links to building-spark.html Add jekyll-redirect-from plugin and make associated config changes (including fixing pygments deprecation). Add example of SBT to README.md 999544e [Sean Owen] Change "Building Spark with Maven" title to "Building Spark"; reinstate tl;dr info about dev/run-tests in README.md; add brief note about building with SBT c18d140 [Sean Owen] Optionally, remove the copy of contributing text from main README.md 8e83934 [Sean Owen] Add CONTRIBUTING.md to trigger notice on new pull request page b1c04a1 [Sean Owen] Refer to current online documentation for building, and remove slightly outdated copy in README.md