spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-15	1	-2/+2
\|
*	[maven-release-plugin] prepare release v1.0.0-rc7	Patrick Wendell	2014-05-15	1	-2/+2
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc6"	Patrick Wendell	2014-05-14	1	-2/+2
\| \| \| \|	This reverts commit 54133abdce0246f6643a1112a5204afb2c4caa82.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-05-14	1	-2/+2
\| \| \| \|	This reverts commit e480bcfbd269ae1d7a6a92cfb50466cf192fe1fb.
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-14	1	-2/+2
\|
*	[maven-release-plugin] prepare release v1.0.0-rc6	Patrick Wendell	2014-05-14	1	-2/+2
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc5"	Patrick Wendell	2014-05-14	1	-2/+2
\| \| \| \|	This reverts commit 18f062303303824139998e8fc8f4158217b0dbc3.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-05-14	1	-2/+2
\| \| \| \|	This reverts commit d08e9604fc9958b7c768e91715c8152db2ed6fd0.
*	Fix dep exclusion: avro-ipc, not avro, depends on netty.	Marcelo Vanzin	2014-05-14	1	-6/+4
\| \| \| \| \| \| \| \|	Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #763 from vanzin/netty-dep-hell and squashes the following commits: dfb6ce2 [Marcelo Vanzin] Fix dep exclusion: avro-ipc, not avro, depends on netty.
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-13	1	-2/+2
\|
*	[maven-release-plugin] prepare release v1.0.0-rc5	Patrick Wendell	2014-05-13	1	-2/+2
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc4"	Patrick Wendell	2014-05-12	1	-2/+2
\| \| \| \|	This reverts commit 3d0a44833ab50360bf9feccc861cb5e8c44a4866.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-05-12	1	-2/+2
\| \| \| \|	This reverts commit 9772d85c6f3893d42044f4bab0e16f8b6287613a.
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-13	1	-2/+2
\|
*	[maven-release-plugin] prepare release v1.0.0-rc4	Patrick Wendell	2014-05-13	1	-2/+2
\|
*	SPARK-1802. (Addendium) Audit dependency graph when Spark is built with -Pyarn	Sean Owen	2014-05-12	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Following on a few more items from SPARK-1802 -- The first commit touches up a few similar problems remaining with the YARN profile. I think this is worth cherry-picking. The second commit is more of the same for hadoop-client, although the fix is a little more complex. It may or may not be worth bothering with. Author: Sean Owen <sowen@cloudera.com> Closes #746 from srowen/SPARK-1802.2 and squashes the following commits: 52aeb41 [Sean Owen] Add more commons-logging, servlet excludes to avoid conflicts in assembly when building for YARN (cherry picked from commit 4b31f4ec7efab8eabf956284a99bfd96a58b79f7) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	Rollback versions for 1.0.0-rc4	Patrick Wendell	2014-05-12	1	-1/+1
\|
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-05-12	1	-2/+2
\|
*	[maven-release-plugin] prepare release v1.0.0-rc4	Patrick Wendell	2014-05-12	1	-4/+3
\|
*	SPARK-1802. Audit dependency graph when Spark is built with -Phive	Sean Owen	2014-05-12	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This initial commit resolves the conflicts in the Hive profiles as noted in https://issues.apache.org/jira/browse/SPARK-1802 . Most of the fix was to note that Hive drags in Avro, and so if the hive module depends on Spark's version of the `avro-*` dependencies, it will pull in our exclusions as needed too. But I found we need to copy some exclusions between the two Avro dependencies to get this right. And then had to squash some commons-logging intrusions. This turned up another annoying find, that `hive-exec` is basically an "assembly" artifact that _also_ packages all of its transitive dependencies. This means the final assembly shows lots of collisions between itself and its dependencies, and even other project dependencies. I have a TODO to examine whether that is going to be a deal-breaker or not. In the meantime I'm going to tack on a second commit to this PR that will also fix some similar, last collisions in the YARN profile. Author: Sean Owen <sowen@cloudera.com> Closes #744 from srowen/SPARK-1802 and squashes the following commits: a856604 [Sean Owen] Resolve JAR version conflicts specific to Hive profile (cherry picked from commit 8586bf564fe010dfc19ef26874472a6f85e355fb) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	SPARK-1798. Tests should clean up temp files	Sean Owen	2014-05-12	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Three issues related to temp files that tests generate – these should be touched up for hygiene but are not urgent. Modules have a log4j.properties which directs the unit-test.log output file to a directory like `[module]/target/unit-test.log`. But this ends up creating `[module]/[module]/target/unit-test.log` instead of former. The `work/` directory is not deleted by "mvn clean", in the parent and in modules. Neither is the `checkpoint/` directory created under the various external modules. Many tests create a temp directory, which is not usually deleted. This can be largely resolved by calling `deleteOnExit()` at creation and trying to call `Utils.deleteRecursively` consistently to clean up, sometimes in an `@After` method. _If anyone seconds the motion, I can create a more significant change that introduces a new test trait along the lines of `LocalSparkContext`, which provides management of temp directories for subclasses to take advantage of._ Author: Sean Owen <sowen@cloudera.com> Closes #732 from srowen/SPARK-1798 and squashes the following commits: 5af578e [Sean Owen] Try to consistently delete test temp dirs and files, and set deleteOnExit() for each b21b356 [Sean Owen] Remove work/ and checkpoint/ dirs with mvn clean bdd0f41 [Sean Owen] Remove duplicate module dir in log4j.properties output path for tests (cherry picked from commit 7120a2979d0a9f0f54a88b2416be7ca10e74f409) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	SPARK-1806: Upgrade Mesos dependency to 0.18.1	Bernardo Gomez Palacio	2014-05-12	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enabled Mesos (0.18.1) dependency with shaded protobuf Why is this needed? Avoids any protobuf version collision between Mesos and any other dependency in Spark e.g. Hadoop HDFS 2.2+ or 1.0.4. Ticket: https://issues.apache.org/jira/browse/SPARK-1806 * Should close https://issues.apache.org/jira/browse/SPARK-1433 Author berngp Author: Bernardo Gomez Palacio <bernardo.gomezpalacio@gmail.com> Closes #741 from berngp/feature/SPARK-1806 and squashes the following commits: 5d70646 [Bernardo Gomez Palacio] SPARK-1806: Upgrade Mesos dependency to 0.18.1
*	SPARK-1789. Multiple versions of Netty dependencies cause FlumeStreamSuite ↵	Sean Owen	2014-05-10	1	-32/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	failure TL;DR is there is a bit of JAR hell trouble with Netty, that can be mostly resolved and will resolve a test failure. I hit the error described at http://apache-spark-user-list.1001560.n3.nabble.com/SparkContext-startup-time-out-td1753.html while running FlumeStreamingSuite, and have for a short while (is it just me?) velvia notes: "I have found a workaround. If you add akka 2.2.4 to your dependencies, then everything works, probably because akka 2.2.4 brings in newer version of Jetty." There are at least 3 versions of Netty in play in the build: - the new Flume 1.4.0 dependency brings in io.netty:netty:3.4.0.Final, and that is the immediate problem - the custom version of akka 2.2.3 depends on io.netty:netty:3.6.6. - but, Spark Core directly uses io.netty:netty-all:4.0.17.Final The POMs try to exclude other versions of netty, but are excluding org.jboss.netty:netty, when in fact older versions of io.netty:netty (not netty-all) are also an issue. The org.jboss.netty:netty excludes are largely unnecessary. I replaced many of them with io.netty:netty exclusions until everything agreed on io.netty:netty-all:4.0.17.Final. But this didn't work, since Akka 2.2.3 doesn't work with Netty 4.x. Down-grading to 3.6.6.Final across the board made some Spark code not compile. If the build keeps io.netty:netty:3.6.6.Final as well, everything seems to work. Part of the reason seems to be that Netty 3.x used the old `org.jboss.netty` packages. This is less than ideal, but is no worse than the current situation. So this PR resolves the issue and improves the JAR hell, even if it leaves the existing theoretical Netty 3-vs-4 conflict: - Remove org.jboss.netty excludes where possible, for clarity; they're not needed except with Hadoop artifacts - Add io.netty:netty excludes where needed -- except, let akka keep its io.netty:netty - Change a bit of test code that actually depended on Netty 3.x, to use 4.x equivalent - Update SBT build accordingly A better change would be to update Akka far enough such that it agrees on Netty 4.x, but I don't know if that's feasible. Author: Sean Owen <sowen@cloudera.com> Closes #723 from srowen/SPARK-1789 and squashes the following commits: 43661b7 [Sean Owen] Update and add Netty excludes to prevent some JAR conflicts that cause test issues (cherry picked from commit 2b7bd29eb6ee5baf739eec143044ecfc296b9b1f) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[SQL] Upgrade parquet library.	Michael Armbrust	2014-05-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I think we are hitting this issue in some perf tests: https://github.com/Parquet/parquet-mr/commit/6aed5288fd4a1398063a5a219b2ae4a9f71b02cf Credit to @aarondav ! Author: Michael Armbrust <michael@databricks.com> Closes #684 from marmbrus/upgradeParquet and squashes the following commits: e10a619 [Michael Armbrust] Upgrade parquet library. (cherry picked from commit 4d6055329846f5e09472e5f844127a5ab5880e15) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	SPARK-1474: Spark on yarn assembly doesn't include AmIpFilter	Thomas Graves	2014-05-06	1	-1/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We use org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter in spark on yarn but are not included it in the assembly jar. I tested this on yarn cluster by removing the yarn jars from the classpath and spark runs fine now. Author: Thomas Graves <tgraves@apache.org> Closes #406 from tgravescs/SPARK-1474 and squashes the following commits: 1548bf9 [Thomas Graves] SPARK-1474: Spark on yarn assembly doesn't include org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter (cherry picked from commit 1e829905c791fbf1dfd8e0c1caa62ead7354605e) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	SPARK-1556. jets3t dep doesn't update properly with newer Hadoop versions	Sean Owen	2014-05-05	1	-34/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	See related discussion at https://github.com/apache/spark/pull/468 This PR may still overstep what you have in mind, but let me put it on the table to start. Besides fixing the issue, it has one substantive change, and that is to manage Hadoop-specific things only in Hadoop-related profiles. This does _not_ remove `yarn.version`. - Moves the YARN and Hadoop profiles together in pom.xml. Sorry that this makes the diff a little hard to grok but the changes are only as follows. - Removes `hadoop.major.version` - Introduce `hadoop-2.2` and `hadoop-2.3` profiles to control Hadoop-specific changes: - like the protobuf version issue - this was only 'solved' now by enabling YARN for 2.2+, which is really an orthogonal issue - like the jets3t version issue now - Hadoop profiles set an appropriate default `hadoop.version`, that can be overridden - _(YARN profiles in the parent now only exist to add the sub-module)_ - Fixes the jets3t dependency issue - and makes it a runtime dependency - and centralizes config of this guy in the parent pom - Updates build docs - Updates SBT build too - and fixes a regex problem along the way Author: Sean Owen <sowen@cloudera.com> Closes #629 from srowen/SPARK-1556 and squashes the following commits: c3fa967 [Sean Owen] Fix hadoop-2.4 profile typo in doc a2105fd [Sean Owen] Add hadoop-2.4 profile and don't set hadoop.version in profiles 274f4f9 [Sean Owen] Make jets3t a runtime dependency, and bring its exclusion up into parent config bbed826 [Sean Owen] Use jets3t 0.9.0 for Hadoop 2.3+ (and correct similar regex issue in SBT build) f21f356 [Sean Owen] Build changes to set up for jets3t fix (cherry picked from commit 73b0cbcc241cca3d318ff74340e80b02f884acbd) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	SPARK-1693: Most of the tests throw a java.lang.SecurityException when s...	witgo	2014-05-04	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	...park built for hadoop 2.3.0 , 2.4.0 Author: witgo <witgo@qq.com> Closes #628 from witgo/SPARK-1693_new and squashes the following commits: e3af968 [witgo] Merge branch 'master' of https://github.com/apache/spark into SPARK-1693_new dc63905 [witgo] SPARK-1693: Most of the tests throw a java.lang.SecurityException when spark built for hadoop 2.3.0 , 2.4.0 (cherry picked from commit d940e4c16aaa7b60daf1229a99bc4d3455c0240d) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	SPARK-1629. Addendum: Depend on commons lang3 (already used by tachyon) as ↵	Sean Owen	2014-05-04	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	it's used in ReplSuite, and return to use lang3 utility in Utils.scala For consideration. This was proposed in related discussion: https://github.com/apache/spark/pull/569 Author: Sean Owen <sowen@cloudera.com> Closes #635 from srowen/SPARK-1629.2 and squashes the following commits: a442b98 [Sean Owen] Depend on commons lang3 (already used by tachyon) as it's used in ReplSuite, and return to use lang3 utility in Utils.scala (cherry picked from commit f5041579ff573f988b673c2506fa4edc32f5ad84) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	The default version of yarn is equal to the hadoop version	witgo	2014-05-03	1	-6/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a part of [PR 590](https://github.com/apache/spark/pull/590) Author: witgo <witgo@qq.com> Closes #626 from witgo/yarn_version and squashes the following commits: c390631 [witgo] restore the yarn dependency declarations f8a4ad8 [witgo] revert remove the dependency of avro in yarn-alpha 2df6cf5 [witgo] review commit a1d876a [witgo] review commit 20e7e3e [witgo] review commit c76763b [witgo] The default value of yarn.version is equal to hadoop.version (cherry picked from commit fb0543224bcedb8ae3aab4a7ddcc6111a03378fe) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-04-29	1	-2/+2
\|
*	[maven-release-plugin] prepare release v1.0.0-rc3	Patrick Wendell	2014-04-29	1	-3/+3
\|
*	Manual revert of rc2 version changes.	Patrick Wendell	2014-04-28	1	-1/+1
\|
*	Improved build configuration	witgo	2014-04-28	1	-6/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1, Fix SPARK-1441: compile spark core error with hadoop 0.23.x 2, Fix SPARK-1491: maven hadoop-provided profile fails to build 3, Fix org.scala-lang: * ,org.apache.avro:* inconsistent versions dependency 4, A modified on the sql/catalyst/pom.xml,sql/hive/pom.xml,sql/core/pom.xml (Four spaces formatted into two spaces) Author: witgo <witgo@qq.com> Closes #480 from witgo/format_pom and squashes the following commits: 03f652f [witgo] review commit b452680 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom bee920d [witgo] revert fix SPARK-1629: Spark Core missing commons-lang dependence 7382a07 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom 6902c91 [witgo] fix SPARK-1629: Spark Core missing commons-lang dependence 0da4bc3 [witgo] merge master d1718ed [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom e345919 [witgo] add avro dependency to yarn-alpha 77fad08 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom 62d0862 [witgo] Fix org.scala-lang: * inconsistent versions dependency 1a162d7 [witgo] Merge branch 'master' of https://github.com/apache/spark into format_pom 934f24d [witgo] review commit cf46edc [witgo] exclude jruby 06e7328 [witgo] Merge branch 'SparkBuild' into format_pom 99464d2 [witgo] fix maven hadoop-provided profile fails to build 0c6c1fc [witgo] Fix compile spark core error with hadoop 0.23.x 6851bec [witgo] Maintain consistent SparkBuild.scala, pom.xml (cherry picked from commit 030f2c2126d5075576cd6d83a1ee7462c48b953b) Conflicts: sql/catalyst/pom.xml sql/core/pom.xml sql/hive/pom.xml
*	SPARK-1621 Upgrade Chill to 0.3.6	Matei Zaharia	2014-04-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	It registers more Scala classes, including things like Ranges that we had to register manually before. See https://github.com/twitter/chill/releases for Chill's change log. Author: Matei Zaharia <matei@databricks.com> Closes #543 from mateiz/chill-0.3.6 and squashes the following commits: a1dc5e0 [Matei Zaharia] Upgrade Chill to 0.3.6 and remove our special registration of Ranges (cherry picked from commit a24d918c71f6ac4adbe3ae363ef69f4658118938) Signed-off-by: Matei Zaharia <matei@databricks.com>
*	Fix [SPARK-1078]: Remove the Unnecessary lift-json dependency	Sandeep	2014-04-24	1	-16/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Remove the Unnecessary lift-json dependency from pom.xml Author: Sandeep <sandeep@techaddict.me> Closes #536 from techaddict/FIX-SPARK-1078 and squashes the following commits: bd0fd1d [Sandeep] Fix [SPARK-1078]: Replace lift-json with json4s-jackson. Remove the Unnecessary lift-json dependency from pom.xml (cherry picked from commit 095b5182536a43e2ae738be93294ee5215d86581) Signed-off-by: Reynold Xin <rxin@apache.org>
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-04-24	1	-2/+2
\|
*	[maven-release-plugin] prepare release v1.0.0-rc2	Patrick Wendell	2014-04-24	1	-2/+2
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0-rc1"	Patrick Wendell	2014-04-21	1	-2/+2
\| \| \| \|	This reverts commit 6cc698fc378256fee9111f66c691ced27f54e973.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-04-21	1	-2/+2
\| \| \| \|	This reverts commit 188f7c3f68e93b3e9347ec02e21f5943874b4741.
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-04-21	1	-2/+2
\|
*	[maven-release-plugin] prepare release v1.0.0-rc1	Patrick Wendell	2014-04-21	1	-2/+2
\|
*	Revert "[maven-release-plugin] prepare release v1.0.0"	Patrick Wendell	2014-04-21	1	-2/+2
\| \| \| \|	This reverts commit c3c6ea05d9d02a38b97388d583828ca82c5181db.
*	Revert "[maven-release-plugin] prepare for next development iteration"	Patrick Wendell	2014-04-21	1	-2/+2
\| \| \| \|	This reverts commit 0b49305297033b70cbb525bef54b70a14deeb238.
*	[maven-release-plugin] prepare for next development iteration	Patrick Wendell	2014-04-21	1	-2/+2
\|
*	[maven-release-plugin] prepare release v1.0.0	Patrick Wendell	2014-04-21	1	-2/+2
\|
*	[SPARK-1520] remove fastutil from dependencies	Xiangrui Meng	2014-04-18	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A quick fix for https://issues.apache.org/jira/browse/SPARK-1520 By excluding fastutil, we bring the number of files in the assembly jar back under 65536, so Java 7 won't create the assembly jar in zip64 format, which cannot be read by Java 6. With this change, the assembly jar now has about 60000 entries (58000 files), tested with both sbt and maven. Author: Xiangrui Meng <meng@databricks.com> Closes #437 from mengxr/remove-fastutil and squashes the following commits: 00f9beb [Xiangrui Meng] remove fastutil from dependencies (cherry picked from commit aa17f022c59af02b04b977da9017671ef14d664a) Signed-off-by: Reynold Xin <rxin@apache.org>
*	SPARK-1374: PySpark API for SparkSQL	Ahir Reddy	2014-04-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An initial API that exposes SparkSQL functionality in PySpark. A PythonRDD composed of dictionaries, with string keys and primitive values (boolean, float, int, long, string) can be converted into a SchemaRDD that supports sql queries. ``` from pyspark.context import SQLContext sqlCtx = SQLContext(sc) rdd = sc.parallelize([{"field1" : 1, "field2" : "row1"}, {"field1" : 2, "field2": "row2"}, {"field1" : 3, "field2": "row3"}]) srdd = sqlCtx.applySchema(rdd) sqlCtx.registerRDDAsTable(srdd, "table1") srdd2 = sqlCtx.sql("SELECT field1 AS f1, field2 as f2 from table1") srdd2.collect() ``` The last line yields ```[{"f1" : 1, "f2" : "row1"}, {"f1" : 2, "f2": "row2"}, {"f1" : 3, "f2": "row3"}]``` Author: Ahir Reddy <ahirreddy@gmail.com> Author: Michael Armbrust <michael@databricks.com> Closes #363 from ahirreddy/pysql and squashes the following commits: 0294497 [Ahir Reddy] Updated log4j properties to supress Hive Warns 307d6e0 [Ahir Reddy] Style fix 6f7b8f6 [Ahir Reddy] Temporary fix MIMA checker. Since we now assemble Spark jar with Hive, we don't want to check the interfaces of all of our hive dependencies 3ef074a [Ahir Reddy] Updated documentation because classes moved to sql.py 29245bf [Ahir Reddy] Cache underlying SchemaRDD instead of generating and caching PythonRDD f2312c7 [Ahir Reddy] Moved everything into sql.py a19afe4 [Ahir Reddy] Doc fixes 6d658ba [Ahir Reddy] Remove the metastore directory created by the HiveContext tests in SparkSQL 521ff6d [Ahir Reddy] Trying to get spark to build with hive ab95eba [Ahir Reddy] Set SPARK_HIVE=true on jenkins ded03e7 [Ahir Reddy] Added doc test for HiveContext 22de1d4 [Ahir Reddy] Fixed maven pyrolite dependency e4da06c [Ahir Reddy] Display message if hive is not built into spark 227a0be [Michael Armbrust] Update API links. Fix Hive example. 58e2aa9 [Michael Armbrust] Build Docs for pyspark SQL Api. Minor fixes. 4285340 [Michael Armbrust] Fix building of Hive API Docs. 38a92b0 [Michael Armbrust] Add note to future non-python developers about python docs. 337b201 [Ahir Reddy] Changed com.clearspring.analytics stream version from 2.4.0 to 2.5.1 to match SBT build, and added pyrolite to maven build 40491c9 [Ahir Reddy] PR Changes + Method Visibility 1836944 [Michael Armbrust] Fix comments. e00980f [Michael Armbrust] First draft of python sql programming guide. b0192d3 [Ahir Reddy] Added Long, Double and Boolean as usable types + unit test f98a422 [Ahir Reddy] HiveContexts 79621cf [Ahir Reddy] cleaning up cruft b406ba0 [Ahir Reddy] doctest formatting 20936a5 [Ahir Reddy] Added tests and documentation e4d21b4 [Ahir Reddy] Added pyrolite dependency 79f739d [Ahir Reddy] added more tests 7515ba0 [Ahir Reddy] added more tests :) d26ec5e [Ahir Reddy] added test e9f5b8d [Ahir Reddy] adding tests 906d180 [Ahir Reddy] added todo explaining cost of creating Row object in python 251f99d [Ahir Reddy] for now only allow dictionaries as input 09b9980 [Ahir Reddy] made jrdd explicitly lazy c608947 [Ahir Reddy] SchemaRDD now has all RDD operations 725c91e [Ahir Reddy] awesome row objects 55d1c76 [Ahir Reddy] return row objects 4fe1319 [Ahir Reddy] output dictionaries correctly be079de [Ahir Reddy] returning dictionaries works cd5f79f [Ahir Reddy] Switched to using Scala SQLContext e948bd9 [Ahir Reddy] yippie 4886052 [Ahir Reddy] even better c0fb1c6 [Ahir Reddy] more working 043ca85 [Ahir Reddy] working 5496f9f [Ahir Reddy] doesn't crash b8b904b [Ahir Reddy] Added schema rdd class 67ba875 [Ahir Reddy] java to python, and python to java bcc0f23 [Ahir Reddy] Java to python ab6025d [Ahir Reddy] compiling (cherry picked from commit c99bcb7feaa761c5826f2e1d844d0502a3b79538) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	SPARK-1488. Resolve scalac feature warnings during build	Sean Owen	2014-04-14	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For your consideration: scalac currently notes a number of feature warnings during compilation: ``` [warn] there were 65 feature warning(s); re-run with -feature for details ``` Warnings are like: ``` [warn] /Users/srowen/Documents/spark/core/src/main/scala/org/apache/spark/SparkContext.scala:1261: implicit conversion method rddToPairRDDFunctions should be enabled [warn] by making the implicit value scala.language.implicitConversions visible. [warn] This can be achieved by adding the import clause 'import scala.language.implicitConversions' [warn] or by setting the compiler option -language:implicitConversions. [warn] See the Scala docs for value scala.language.implicitConversions for a discussion [warn] why the feature should be explicitly enabled. [warn] implicit def rddToPairRDDFunctions[K: ClassTag, V: ClassTag](rdd: RDD[(K, V)]) = [warn] ^ ``` scalac is suggesting that it's just best practice to explicitly enable certain language features by importing them where used. This PR simply adds the imports it suggests (and squashes one other Java warning along the way). This leaves just deprecation warnings in the build. Author: Sean Owen <sowen@cloudera.com> Closes #404 from srowen/SPARK-1488 and squashes the following commits: 8598980 [Sean Owen] Quiet scalac warnings about language features by explicitly importing language features. 39bc831 [Sean Owen] Enable -feature in scalac to emit language feature warnings (cherry picked from commit 0247b5c5467ca1b0d03ba929a78fa4d805582d84) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	SPARK-1057 (alternative) Remove fastutil	Sean Owen	2014-04-11	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(This is for discussion at this point -- I'm not suggesting this should be committed.) This is what removing fastutil looks like. Much of it is straightforward, like using `java.io` buffered stream classes, and Guava for murmurhash3. Uses of the `FastByteArrayOutputStream` were a little trickier. In only one case though do I think the change to use `java.io` actually entails an extra array copy. The rest is using `OpenHashMap` and `OpenHashSet`. These are now written in terms of more scala-like operations. `OpenHashMap` is where I made three non-trivial changes to make it work, and they need review: - It is no longer private - The key must be a `ClassTag` - Unless a lot of other code changes, the key type can't enforce being a supertype of `Null` It all works and tests pass, and I think there is reason to believe it's OK from a speed perspective. But what about those last changes? Author: Sean Owen <sowen@cloudera.com> Closes #266 from srowen/SPARK-1057-alternate and squashes the following commits: 2601129 [Sean Owen] Fix Map return type error not previously caught ec65502 [Sean Owen] Updates from matei's review 00bc81e [Sean Owen] Remove use of fastutil and replace with use of java.io, spark.util and Guava classes (cherry picked from commit 165e06a74c3d75e6b7341c120943add8b035b96a) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
*	Revert "SPARK-1433: Upgrade Mesos dependency to 0.17.0"	Patrick Wendell	2014-04-10	1	-3/+3
\| \| \| \|	This reverts commit 12c077d5aa0b76a808a55db625c9677a52bd43f9.