aboutsummaryrefslogtreecommitdiff
path: root/project/SparkBuild.scala
diff options
context:
space:
mode:
authorYin Huai <huai@cse.ohio-state.edu>2014-06-17 19:14:59 -0700
committerReynold Xin <rxin@apache.org>2014-06-17 19:14:59 -0700
commitd2f4f30b12f99358953e2781957468e2cfe3c916 (patch)
tree405b949a2968dba2c73874bd2fefc9d10206e731 /project/SparkBuild.scala
parentb2ebf429e24566c29850c570f8d76943151ad78c (diff)
downloadspark-d2f4f30b12f99358953e2781957468e2cfe3c916.tar.gz
spark-d2f4f30b12f99358953e2781957468e2cfe3c916.tar.bz2
spark-d2f4f30b12f99358953e2781957468e2cfe3c916.zip
[SPARK-2060][SQL] Querying JSON Datasets with SQL and DSL in Spark SQL
JIRA: https://issues.apache.org/jira/browse/SPARK-2060 Programming guide: http://yhuai.github.io/site/sql-programming-guide.html Scala doc of SQLContext: http://yhuai.github.io/site/api/scala/index.html#org.apache.spark.sql.SQLContext Author: Yin Huai <huai@cse.ohio-state.edu> Closes #999 from yhuai/newJson and squashes the following commits: 227e89e [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson ce8eedd [Yin Huai] rxin's comments. bc9ac51 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson 94ffdaa [Yin Huai] Remove "get" from method names. ce31c81 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson e2773a6 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson 79ea9ba [Yin Huai] Fix typos. 5428451 [Yin Huai] Newline 1f908ce [Yin Huai] Remove extra line. d7a005c [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson 7ea750e [Yin Huai] marmbrus's comments. 6a5f5ef [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson 83013fb [Yin Huai] Update Java Example. e7a6c19 [Yin Huai] SchemaRDD.javaToPython should convert a field with the StructType to a Map. 6d20b85 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson 4fbddf0 [Yin Huai] Programming guide. 9df8c5a [Yin Huai] Python API. 7027634 [Yin Huai] Java API. cff84cc [Yin Huai] Use a SchemaRDD for a JSON dataset. d0bd412 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson ab810b0 [Yin Huai] Make JsonRDD private. 6df0891 [Yin Huai] Apache header. 8347f2e [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson 66f9e76 [Yin Huai] Update docs and use the entire dataset to infer the schema. 8ffed79 [Yin Huai] Update the example. a5a4b52 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson 4325475 [Yin Huai] If a sampled dataset is used for schema inferring, update the schema of the JsonTable after first execution. 65b87f0 [Yin Huai] Fix sampling... 8846af5 [Yin Huai] API doc. 52a2275 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson 0387523 [Yin Huai] Address PR comments. 666b957 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson a2313a6 [Yin Huai] Address PR comments. f3ce176 [Yin Huai] After type conflict resolution, if a NullType is found, StringType is used. 0576406 [Yin Huai] Add Apache license header. af91b23 [Yin Huai] Merge remote-tracking branch 'upstream/master' into newJson f45583b [Yin Huai] Infer the schema of a JSON dataset (a text file with one JSON object per line or a RDD[String] with one JSON object per string) and returns a SchemaRDD. f31065f [Yin Huai] A query plan or a SchemaRDD can print out its schema.
Diffstat (limited to 'project/SparkBuild.scala')
-rw-r--r--project/SparkBuild.scala22
1 files changed, 18 insertions, 4 deletions
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 2d60a44f04..7bb39dc771 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -76,7 +76,7 @@ object SparkBuild extends Build {
lazy val catalyst = Project("catalyst", file("sql/catalyst"), settings = catalystSettings) dependsOn(core)
- lazy val sql = Project("sql", file("sql/core"), settings = sqlCoreSettings) dependsOn(core, catalyst)
+ lazy val sql = Project("sql", file("sql/core"), settings = sqlCoreSettings) dependsOn(core) dependsOn(catalyst % "compile->compile;test->test")
lazy val hive = Project("hive", file("sql/hive"), settings = hiveSettings) dependsOn(sql)
@@ -501,9 +501,23 @@ object SparkBuild extends Build {
def sqlCoreSettings = sharedSettings ++ Seq(
name := "spark-sql",
libraryDependencies ++= Seq(
- "com.twitter" % "parquet-column" % parquetVersion,
- "com.twitter" % "parquet-hadoop" % parquetVersion
- )
+ "com.twitter" % "parquet-column" % parquetVersion,
+ "com.twitter" % "parquet-hadoop" % parquetVersion,
+ "com.fasterxml.jackson.core" % "jackson-databind" % "2.3.0" // json4s-jackson 3.2.6 requires jackson-databind 2.3.0.
+ ),
+ initialCommands in console :=
+ """
+ |import org.apache.spark.sql.catalyst.analysis._
+ |import org.apache.spark.sql.catalyst.dsl._
+ |import org.apache.spark.sql.catalyst.errors._
+ |import org.apache.spark.sql.catalyst.expressions._
+ |import org.apache.spark.sql.catalyst.plans.logical._
+ |import org.apache.spark.sql.catalyst.rules._
+ |import org.apache.spark.sql.catalyst.types._
+ |import org.apache.spark.sql.catalyst.util._
+ |import org.apache.spark.sql.execution
+ |import org.apache.spark.sql.test.TestSQLContext._
+ |import org.apache.spark.sql.parquet.ParquetTestData""".stripMargin
)
// Since we don't include hive in the main assembly this project also acts as an alternative