aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-5806] re-organize sections in mllib-clustering.mdXiangrui Meng2015-02-132-87/+77
| | | | | | | | | | | | | Put example code close to the algorithm description. Author: Xiangrui Meng <meng@databricks.com> Closes #4598 from mengxr/SPARK-5806 and squashes the following commits: a137872 [Xiangrui Meng] re-organize sections in mllib-clustering.md (cherry picked from commit cc56c8729a76af85aa6eb5d2f99787cca5e5b38f) Signed-off-by: Xiangrui Meng <meng@databricks.com>
* [SPARK-5789][SQL]Throw a better error message if JsonRDD.parseJson ↵Yin Huai2015-02-131-0/+4
| | | | | | | | | | | | | | encounters unrecoverable parsing errors. Author: Yin Huai <yhuai@databricks.com> Closes #4582 from yhuai/jsonErrorMessage and squashes the following commits: 152dbd4 [Yin Huai] Update error message. 1466256 [Yin Huai] Throw a better error message when a JSON object in the input dataset span multiple records (lines for files or strings for an RDD of strings). (cherry picked from commit 2e0c084528409e1c565e6945521a33c0835ebbee) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-5642] [SQL] Apply column pruning on unused aggregation fieldsDaoyuan Wang2015-02-132-2/+44
| | | | | | | | | | | | | | | | | select k from (select key k, max(value) v from src group by k) t Author: Daoyuan Wang <daoyuan.wang@intel.com> Author: Michael Armbrust <michael@databricks.com> Closes #4415 from adrian-wang/groupprune and squashes the following commits: 5d2d8a3 [Daoyuan Wang] address Michael's comments 61f8ef7 [Daoyuan Wang] add a unit test 80ddcc6 [Daoyuan Wang] keep project b69d385 [Daoyuan Wang] add a prune rule for grouping set (cherry picked from commit 2cbb3e433ae334d5c318f05b987af314c854fbcc) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [HOTFIX] Fix build break in MesosSchedulerBackendSuiteAndrew Or2015-02-131-1/+2
|
* SPARK-5805 Fixed the type error in documentation.Emre Sevinç2015-02-131-31/+31
| | | | | | | | | | | | | Fixes SPARK-5805 : Fix the type error in the final example given in MLlib - Clustering documentation. Author: Emre Sevinç <emre.sevinc@gmail.com> Closes #4596 from emres/SPARK-5805 and squashes the following commits: 1029f66 [Emre Sevinç] SPARK-5805 Fixed the type error in documentation. (cherry picked from commit 9f31db061019414a964aac432e946eac61f8307c) Signed-off-by: Xiangrui Meng <meng@databricks.com>
* [SPARK-5735] Replace uses of EasyMock with MockitoJosh Rosen2015-02-136-251/+207
| | | | | | | | | | | | | | | | | | | | | | This patch replaces all uses of EasyMock with Mockito. There are two motivations for this: 1. We should use a single mocking framework in our tests in order to keep things consistent. 2. EasyMock may be responsible for non-deterministic unit test failures due to its Objensis dependency (see SPARK-5626 for more details). Most of these changes are fairly mechanical translations of Mockito code to EasyMock, although I made a small change that strengthens the assertions in one test in KinesisReceiverSuite. Author: Josh Rosen <joshrosen@databricks.com> Closes #4578 from JoshRosen/SPARK-5735-remove-easymock and squashes the following commits: 0ab192b [Josh Rosen] Import sorting plus two minor changes to more closely match old semantics. 977565b [Josh Rosen] Remove EasyMock from build. fae1d8f [Josh Rosen] Remove EasyMock usage in KinesisReceiverSuite. 7cca486 [Josh Rosen] Remove EasyMock usage in MesosSchedulerBackendSuite fc5e94d [Josh Rosen] Remove EasyMock in CacheManagerSuite (cherry picked from commit 077eec2d9dba197f51004ee4a322d0fa71424ea0) Signed-off-by: Andrew Or <andrew@databricks.com>
* [SPARK-5783] Better eventlog-parsing error messagesRyan Williams2015-02-134-7/+11
| | | | | | | | | | | Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #4573 from ryan-williams/history and squashes the following commits: a8647ec [Ryan Williams] fix test calls to .replay() 98aa3fe [Ryan Williams] include filename in history-parsing error message 8deecf0 [Ryan Williams] add line number to history-parsing error message b668b52 [Ryan Williams] add log info line to history-eventlog parsing
* [SPARK-5503][MLLIB] Example code for Power Iteration Clusteringsboeschhuawei2015-02-131-0/+160
| | | | | | | | | | | | | | | | | | | | Author: sboeschhuawei <stephen.boesch@huawei.com> Closes #4495 from javadba/picexamples and squashes the following commits: 3c84b14 [sboeschhuawei] PIC Examples updates from Xiangrui's comments round 5 2878675 [sboeschhuawei] Fourth round with xiangrui on PICExample d7ac350 [sboeschhuawei] Updates to PICExample from Xiangrui's comments round 3 d7f0cba [sboeschhuawei] Updates to PICExample from Xiangrui's comments round 3 cef28f4 [sboeschhuawei] Further updates to PICExample from Xiangrui's comments f7ff43d [sboeschhuawei] Update to PICExample from Xiangrui's comments efeec45 [sboeschhuawei] Update to PICExample from Xiangrui's comments 03e8de4 [sboeschhuawei] Added PICExample c509130 [sboeschhuawei] placeholder for pic examples 5864d4a [sboeschhuawei] placeholder for pic examples (cherry picked from commit e1a1ff8108463ca79299ec0eb555a0c8db9dffa0) Signed-off-by: Xiangrui Meng <meng@databricks.com>
* [SPARK-5732][CORE]:Add an option to print the spark version in spark script.uncleGen2015-02-132-3/+19
| | | | | | | | | | | | | | | | | Naturally, we may need to add an option to print the spark version in spark script. It is pretty common in script tool. ![9](https://cloud.githubusercontent.com/assets/7402327/6183331/cab1b74e-b38e-11e4-9daa-e26e6015cff3.JPG) Author: uncleGen <hustyugm@gmail.com> Author: genmao.ygm <genmao.ygm@alibaba-inc.com> Closes #4522 from uncleGen/master-clean-150211 and squashes the following commits: 9f2127c [genmao.ygm] revert the behavior of "-v" 015ddee [uncleGen] minor changes 463f02c [uncleGen] minor changes (cherry picked from commit c0ccd2564182695ea5771524840bf1a99d5aa842) Signed-off-by: Andrew Or <andrew@databricks.com>
* [SPARK-4832][Deploy]some other processes might take the daemon pidWangTaoTheTonic2015-02-131-9/+11
| | | | | | | | | | | | | | | | | | | Some other processes might use the pid saved in pid file. In that case we should ignore it and launch daemons. JIRA is down for maintenance. I will file one once it return. Author: WangTaoTheTonic <barneystinson@aliyun.com> Author: WangTaoTheTonic <wangtao111@huawei.com> Closes #3683 from WangTaoTheTonic/otherproc and squashes the following commits: daa86a1 [WangTaoTheTonic] some bash style fix 8befee7 [WangTaoTheTonic] handle the mistake scenario cf4ecc6 [WangTaoTheTonic] remove redundant condition f36cfb4 [WangTaoTheTonic] some other processes might take the pid (cherry picked from commit 1768bd51438670c493ca3ca02988aee3ae31e87e) Signed-off-by: Sean Owen <sowen@cloudera.com>
* [SQL] Fix docs of SQLContext.tablesYin Huai2015-02-121-6/+6
| | | | | | | | | | | Author: Yin Huai <yhuai@databricks.com> Closes #4579 from yhuai/tablesDoc and squashes the following commits: 7f8964c [Yin Huai] Fix doc. (cherry picked from commit 2aea892ebd4d6c802defeef35ef7ebfe42c06eba) Signed-off-by: Cheng Lian <lian@databricks.com>
* [SPARK-3365][SQL]Wrong schema generated for List typetianyi2015-02-122-15/+20
| | | | | | | | | | | | | | | | | | | | | | | | | This PR fix the issue SPARK-3365. The reason is Spark generated wrong schema for the type `List` in `ScalaReflection.scala` for example: the generated schema for type `Seq[String]` is: ``` {"name":"x","type":{"type":"array","elementType":"string","containsNull":true},"nullable":true,"metadata":{}}` ``` the generated schema for type `List[String]` is: ``` {"name":"x","type":{"type":"struct","fields":[]},"nullable":true,"metadata":{}}` ``` Author: tianyi <tianyi.asiainfo@gmail.com> Closes #4581 from tianyi/SPARK-3365 and squashes the following commits: a097e86 [tianyi] change the order of resolution in ScalaReflection.scala (cherry picked from commit 1c8633f3fe9d814c83384e339b958740c250c00c) Signed-off-by: Cheng Lian <lian@databricks.com>
* [SPARK-3299][SQL]Public API in SQLContext to list tablesYin Huai2015-02-126-0/+265
| | | | | | | | | | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-3299 Author: Yin Huai <yhuai@databricks.com> Closes #4547 from yhuai/tables and squashes the following commits: 6c8f92e [Yin Huai] Add tableNames. acbb281 [Yin Huai] Update Python test. 7793dcb [Yin Huai] Fix scala test. 572870d [Yin Huai] Address comments. aba2e88 [Yin Huai] Format. 12c86df [Yin Huai] Add tables() to SQLContext to return a DataFrame containing existing tables. (cherry picked from commit 1d0596a16e1d3add2631f5d8169aeec2876a1362) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SQL] Move SaveMode to SQL package.Yin Huai2015-02-1212-14/+9
| | | | | | | | | | | Author: Yin Huai <yhuai@databricks.com> Closes #4542 from yhuai/moveSaveMode and squashes the following commits: 65a4425 [Yin Huai] Move SaveMode to sql package. (cherry picked from commit c025a468826e9b9f62032e207daa9d42d9dba3ca) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-5335] Fix deletion of security groups within a VPCVladimir Grigor2015-02-121-3/+4
| | | | | | | | | | | | | | | | | | | Please see https://issues.apache.org/jira/browse/SPARK-5335. The fix itself is in e58a8b01a8bedcbfbbc6d04b1c1489255865cf87 commit. Two earlier commits are fixes of another VPC related bug waiting to be merged. I should have created former bug fix in own branch then this fix would not have former fixes. :( This code is released under the project's license. Author: Vladimir Grigor <vladimir@kiosked.com> Author: Vladimir Grigor <vladimir@voukka.com> Closes #4122 from voukka/SPARK-5335_delete_sg_vpc and squashes the following commits: 090dca9 [Vladimir Grigor] fixes as per review: removed printing of group_id and added comment 730ec05 [Vladimir Grigor] fix for SPARK-5335: Destroying cluster in VPC with "--delete-groups" fails to remove security groups (cherry picked from commit ada993e954e2825c0fe13326fc23b0e1a567cd55) Signed-off-by: Sean Owen <sowen@cloudera.com>
* [SPARK-5755] [SQL] remove unnecessary AddDaoyuan Wang2015-02-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | explain extended select +key from src; before: == Parsed Logical Plan == 'Project [(0 + 'key) AS _c0#8] 'UnresolvedRelation [src], None == Analyzed Logical Plan == Project [(0 + key#10) AS _c0#8] MetastoreRelation test, src, None == Optimized Logical Plan == Project [(0 + key#10) AS _c0#8] MetastoreRelation test, src, None == Physical Plan == Project [(0 + key#10) AS _c0#8] HiveTableScan [key#10], (MetastoreRelation test, src, None), None after this patch: == Parsed Logical Plan == 'Project ['key] 'UnresolvedRelation [src], None == Analyzed Logical Plan == Project [key#10] MetastoreRelation test, src, None == Optimized Logical Plan == Project [key#10] MetastoreRelation test, src, None == Physical Plan == HiveTableScan [key#10], (MetastoreRelation test, src, None), None Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #4551 from adrian-wang/positive and squashes the following commits: 0821ae4 [Daoyuan Wang] remove unnecessary Add (cherry picked from commit d5fc51491808630d0328a5937dbf349e00de361f) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-5573][SQL] Add explode to dataframesMichael Armbrust2015-02-125-2/+119
| | | | | | | | | | | | | | | | | Author: Michael Armbrust <michael@databricks.com> Closes #4546 from marmbrus/explode and squashes the following commits: eefd33a [Michael Armbrust] whitespace a8d496c [Michael Armbrust] Merge remote-tracking branch 'apache/master' into explode 4af740e [Michael Armbrust] Merge remote-tracking branch 'origin/master' into explode dc86a5c [Michael Armbrust] simple version d633d01 [Michael Armbrust] add scala specific 950707a [Michael Armbrust] fix comments ba8854c [Michael Armbrust] [SPARK-5573][SQL] Add explode to dataframes (cherry picked from commit ee04a8b19be8330bfc48f470ef365622162c915f) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-5758][SQL] Use LongType as the default type for integers in JSON ↵Yin Huai2015-02-123-13/+17
| | | | | | | | | | | | | schema inference. Author: Yin Huai <yhuai@databricks.com> Closes #4544 from yhuai/jsonUseLongTypeByDefault and squashes the following commits: 6e2ffc2 [Yin Huai] Use LongType as the default type for integers in JSON schema inference. (cherry picked from commit c352ffbdb9112714c176a747edff6115e9369e58) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-5780] [PySpark] Mute the logging during unit testsDavies Liu2015-02-121-1/+1
| | | | | | | | | | | | | | | There a bunch of logging coming from driver and worker, it's noisy and scaring, and a lots of exception in it, people are confusing about the tests are failing or not. This PR will mute the logging during tests, only show them if any one failed. Author: Davies Liu <davies@databricks.com> Closes #4572 from davies/mute and squashes the following commits: 1e9069c [Davies Liu] mute the logging during python tests (cherry picked from commit 0bf031582588723dd5a4ca42e6f9f36bc2da1a0b) Signed-off-by: Andrew Or <andrew@databricks.com>
* SPARK-5747: Fix wordsplitting bugs in make-distribution.shDavid Y. Ross2015-02-121-10/+10
| | | | | | | | | | | | | The `$MVN` command variable may have spaces, so when referring to it, must wrap in quotes. Author: David Y. Ross <dyross@gmail.com> Closes #4540 from dyross/dyr-fix-make-distribution2 and squashes the following commits: 5a41596 [David Y. Ross] SPARK-5747: Fix wordsplitting bugs in make-distribution.sh (cherry picked from commit 26c816e7388eaa336a59183029f86548f1cc279c) Signed-off-by: Andrew Or <andrew@databricks.com>
* [SPARK-5759][Yarn]ExecutorRunnable should catch YarnException while NMClient ↵lianhuiwang2015-02-121-2/+8
| | | | | | | | | | | | | | | | | | start contain... some time since some reasons, it lead to some exception while NMClient start some containers.example:we do not config spark_shuffle on some machines, so it will throw a exception: java.lang.Error: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist. because YarnAllocator use ThreadPoolExecutor to start Container, so we can not find which container or hostname throw exception. I think we should catch YarnException in ExecutorRunnable when start container. if there are some exceptions, we can know the container id or hostname of failed container. Author: lianhuiwang <lianhuiwang09@gmail.com> Closes #4554 from lianhuiwang/SPARK-5759 and squashes the following commits: caf5a99 [lianhuiwang] use SparkException to warp exception c02140f [lianhuiwang] ExecutorRunnable should catch YarnException while NMClient start container (cherry picked from commit 947b8bd82ec0f4c45910e6d781df4661f56e4587) Signed-off-by: Andrew Or <andrew@databricks.com>
* [SPARK-5760][SPARK-5761] Fix standalone rest protocol corner cases + revamp ↵Andrew Or2015-02-123-239/+589
| | | | | | | | | | | | | | | | | | | | | tests The changes are summarized in the commit message. Test or test-related code accounts for 90% of the lines changed. Author: Andrew Or <andrew@databricks.com> Closes #4557 from andrewor14/rest-tests and squashes the following commits: b4dc980 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest-tests b55e40f [Andrew Or] Add test for unknown fields cc96993 [Andrew Or] private[spark] -> private[rest] 578cf45 [Andrew Or] Clean up test code a little d82d971 [Andrew Or] v1 -> serverVersion ea48f65 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest-tests 00999a8 [Andrew Or] Revamp tests + fix a few corner cases (cherry picked from commit 1d5663e92cdaaa3dabfa58fdd7aede7e4fa4ec63) Signed-off-by: Andrew Or <andrew@databricks.com>
* [SPARK-5762] Fix shuffle write time for sort-based shuffleKay Ousterhout2015-02-121-0/+3
| | | | | | | | | | | | | | | | | mateiz was excluding the time to write this final file from the shuffle write time intentional? Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #4559 from kayousterhout/SPARK-5762 and squashes the following commits: 5c6f3d9 [Kay Ousterhout] Use foreach 94e4237 [Kay Ousterhout] Removed open time metrics added inadvertently ace156c [Kay Ousterhout] Moved metrics to finally block d773276 [Kay Ousterhout] Use nano time 5a59906 [Kay Ousterhout] [SPARK-5762] Fix shuffle write time for sort-based shuffle (cherry picked from commit 47c73d410ab533c3196184d2b6004081e79daeaa) Signed-off-by: Andrew Or <andrew@databricks.com>
* [SPARK-5765][Examples]Fixed word split problem in run-example and ↵Venkata Ramana Gollamudi2015-02-122-4/+4
| | | | | | | | | | | | | | | compute-classpath Author: Venkata Ramana G <ramana.gollamudihuawei.com> Author: Venkata Ramana Gollamudi <ramana.gollamudi@huawei.com> Closes #4561 from gvramana/word_split and squashes the following commits: 285c8d4 [Venkata Ramana Gollamudi] Fixed word split problem in run-example and compute-classpath (cherry picked from commit 629d0143eeb3c153dac9c65e7b556723c6b4bfc7) Signed-off-by: Andrew Or <andrew@databricks.com>
* [SPARK-5645] Added local read bytes/time to task metricsKay Ousterhout2015-02-1213-28/+125
| | | | | | | | | | | | | | | | | | | | | | | | | | | | ksakellis I stumbled on your JIRA for this yesterday; I know it's assigned to you but I'd already done this for my own uses a while ago so thought I could help save you the work of doing it! Hopefully this doesn't duplicate any work you've already done. Here's a screenshot of what the UI looks like: ![image](https://cloud.githubusercontent.com/assets/1108612/6135352/c03e7276-b11c-11e4-8f11-c6aefe1f35b9.png) Based on a discussion with pwendell, I put the data read remotely in as an additional metric rather than showing it in brackets as you'd suggested, Kostas. The assumption here is that the average user doesn't care about the differentiation between local / remote data, so it's better not to pollute the UI. I also added data about the local read time, which I've found very helpful for debugging, but I didn't put it in the UI because I think it's probably something not a ton of people will need to use. With this change, the total read time and total write time shown in the UI will be equal, fixing a long-term source of user confusion: ![image](https://cloud.githubusercontent.com/assets/1108612/6135399/25f14490-b11d-11e4-8086-20be5f4002e6.png) Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #4510 from kayousterhout/SPARK-5645 and squashes the following commits: 4a0182c [Kay Ousterhout] oops 5f5da1b [Kay Ousterhout] Small style fix 5da04cf [Kay Ousterhout] Addressed more comments from Kostas ba05149 [Kay Ousterhout] Remove parens a9dc685 [Kay Ousterhout] Kostas comment, test fix 33d2e2d [Kay Ousterhout] Merge remote-tracking branch 'upstream/master' into SPARK-5645 347e2cd [Kay Ousterhout] [SPARK-5645] Added local read bytes/time to task metrics (cherry picked from commit 893d6fd7049daf3c4d01eb6a960801cd064d5f73) Signed-off-by: Andrew Or <andrew@databricks.com>
* [SQL] Improve error messagesMichael Armbrust2015-02-1214-103/+164
| | | | | | | | | | | | | | | | | | | | Author: Michael Armbrust <michael@databricks.com> Author: wangfei <wangfei1@huawei.com> Closes #4558 from marmbrus/errorMessages and squashes the following commits: 5e5ab50 [Michael Armbrust] Merge pull request #15 from scwf/errorMessages fa38881 [wangfei] fix for grouping__id f279a71 [wangfei] make right references for ScriptTransformation d29fbde [Michael Armbrust] extra case 1a797b4 [Michael Armbrust] comments d4e9015 [Michael Armbrust] add comment af9e668 [Michael Armbrust] no braces 34eb3a4 [Michael Armbrust] more work 6197cd5 [Michael Armbrust] [SQL] Better error messages for analysis failures (cherry picked from commit aa4ca8b873fd83e64e5faea6f7febcc830e30b02) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SQL][DOCS] Update sql documentationAntonio Navarro Perez2015-02-121-84/+84
| | | | | | | | | | | | | | | Updated examples using the new api and added DataFrame concept Author: Antonio Navarro Perez <ajnavarro@users.noreply.github.com> Closes #4560 from ajnavarro/ajnavarro-doc-sql-update and squashes the following commits: 82ebcf3 [Antonio Navarro Perez] Changed a missing JavaSQLContext to SQLContext. 8d5376a [Antonio Navarro Perez] fixed typo 8196b6b [Antonio Navarro Perez] [SQL][DOCS] Update sql documentation (cherry picked from commit 6a1be026cf37e4c8bf39133dfb4a73f7caedcc26) Signed-off-by: Reynold Xin <rxin@databricks.com>
* [SPARK-5757][MLLIB] replace SQL JSON usage in model import/export by json4sXiangrui Meng2015-02-1215-127/+92
| | | | | | | | | | | | | This PR detaches MLlib model import/export code from SQL's JSON support, and hence unblocks #4544 . yhuai Author: Xiangrui Meng <meng@databricks.com> Closes #4555 from mengxr/SPARK-5757 and squashes the following commits: b0415e8 [Xiangrui Meng] replace SQL JSON usage by json4s (cherry picked from commit 99bd5006650bb15ec5465ffee1ebaca81354a3df) Signed-off-by: Xiangrui Meng <meng@databricks.com>
* [SPARK-5655] Don't chmod700 application files if running in YARNAndrew Rowson2015-02-121-8/+3
| | | | | | | | | | | | | | | | [Was previously PR4507] As per SPARK-5655, recently committed code chmod 700s all application files created on the local fs by a spark executor. This is both unnecessary and broken on YARN, where files created in the nodemanager's working directory are already owned by the user running the job and the 'yarn' group. Group read permission is also needed for the auxiliary shuffle service to be able to read the files, as this is running as the 'yarn' user. Author: Andrew Rowson <github@growse.com> Closes #4509 from growse/master and squashes the following commits: 7ca993c [Andrew Rowson] Moved chmod700 functionality into Utils.getOrCreateLocalRootDirs f57ce6b [Andrew Rowson] [SPARK-5655] Don't chmod700 application files if running in a YARN container (cherry picked from commit 466b1f671b21f575d28f9c103f51765790914fe3) Signed-off-by: Sean Owen <sowen@cloudera.com>
* [SQL] Make dataframe more tolerant of being serializedMichael Armbrust2015-02-114-4/+15
| | | | | | | | | | | | | | Eases use in the spark-shell. Author: Michael Armbrust <michael@databricks.com> Closes #4545 from marmbrus/serialization and squashes the following commits: 04748e6 [Michael Armbrust] @scala.annotation.varargs b36e219 [Michael Armbrust] moreFixes (cherry picked from commit a38e23c30fb5d12f8f46a119d91a0620036e6800) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SQL] Two DataFrame fixes.Reynold Xin2015-02-115-57/+119
| | | | | | | | | | | | | | - Removed DataFrame.apply for projection & filtering since they are extremely confusing. - Added implicits for RDD[Int], RDD[Long], and RDD[String] Author: Reynold Xin <rxin@databricks.com> Closes #4543 from rxin/df-cleanup and squashes the following commits: 81ec915 [Reynold Xin] [SQL] More DataFrame fixes. (cherry picked from commit d931b01dcaaf009dcf68dcfe83428bd7f9e857cc) Signed-off-by: Reynold Xin <rxin@databricks.com>
* [SPARK-3688][SQL] More inline comments for LogicalPlan.Reynold Xin2015-02-115-42/+115
| | | | | | | | | | | | | | As a follow-up to https://github.com/apache/spark/pull/4524 Author: Reynold Xin <rxin@databricks.com> Closes #4539 from rxin/SPARK-3688 and squashes the following commits: 5ac56c7 [Reynold Xin] exists da8eea4 [Reynold Xin] [SPARK-3688][SQL] More inline comments for LogicalPlan. (cherry picked from commit fa6bdc6e819f9338248b952ec578bcd791ddbf6d) Signed-off-by: Reynold Xin <rxin@databricks.com>
* [SPARK-3688][SQL]LogicalPlan can't resolve column correctllytianyi2015-02-117-18/+42
| | | | | | | | | | | | | | | | | | This PR fixed the resolving problem described in https://issues.apache.org/jira/browse/SPARK-3688 ``` CREATE TABLE t1(x INT); CREATE TABLE t2(a STRUCT<x: INT>, k INT); SELECT a.x FROM t1 a JOIN t2 b ON a.x = b.k; ``` Author: tianyi <tianyi.asiainfo@gmail.com> Closes #4524 from tianyi/SPARK-3688 and squashes the following commits: 237a256 [tianyi] resolve a name with table.column pattern first. (cherry picked from commit 44b2311d946981c8251cb7807d70c8e99db5bbed) Signed-off-by: Michael Armbrust <michael@databricks.com>
* [SPARK-5454] More robust handling of self joinsMichael Armbrust2015-02-117-30/+40
| | | | | | | | | | | | | | | | Also I fix a bunch of bad output in test cases. Author: Michael Armbrust <michael@databricks.com> Closes #4520 from marmbrus/selfJoin and squashes the following commits: 4f4a85c [Michael Armbrust] comments 49c8e26 [Michael Armbrust] fix tests 6fc38de [Michael Armbrust] fix style 55d64b3 [Michael Armbrust] fix dataframe selfjoins (cherry picked from commit a60d2b70adff3a8fb3bdfac226b1d86fdb443da4) Signed-off-by: Michael Armbrust <michael@databricks.com>
* Remove outdated remark about take(n).Daniel Darabos2015-02-111-1/+1
| | | | | | | | | | | | | | | Looking at the code, I believe this remark about `take(n)` computing partitions on the driver is no longer correct. Apologies if I'm wrong. This came up in http://stackoverflow.com/q/28436559/3318517. Author: Daniel Darabos <darabos.daniel@gmail.com> Closes #4533 from darabos/patch-2 and squashes the following commits: cc80f3a [Daniel Darabos] Remove outdated remark about take(n). (cherry picked from commit 03bf704bf442ac7dd960795295b51957ce972491) Signed-off-by: Sean Owen <sowen@cloudera.com>
* [SPARK-5677] [SPARK-5734] [SQL] [PySpark] Python DataFrame API remaining tasksDavies Liu2015-02-115-50/+155
| | | | | | | | | | | | | | | | | | | | | | | 1. DataFrame.renameColumn 2. DataFrame.show() and _repr_ 3. Use simpleString() rather than jsonValue in DataFrame.dtypes 4. createDataFrame from local Python data, including pandas.DataFrame Author: Davies Liu <davies@databricks.com> Closes #4528 from davies/df3 and squashes the following commits: 014acea [Davies Liu] fix typo 6ba526e [Davies Liu] fix tests 46f5f95 [Davies Liu] address comments 6cbc154 [Davies Liu] dataframe.show() and improve dtypes 6f94f25 [Davies Liu] create DataFrame from local Python data (cherry picked from commit b694eb9c2fefeaa33891d3e61f9bea369bc09984) Signed-off-by: Reynold Xin <rxin@databricks.com>
* [SPARK-5733] Error Link in Pagination of HistroyPage when showing Incomplete ↵guliangliang2015-02-111-4/+7
| | | | | | | | | | | | | | | | | Applications The links in pagination of HistroyPage is wrong when showing Incomplete Applications. If "2" is click on the following page "http://history-server:18080/?page=1&showIncomplete=true", it will go to "http://history-server:18080/?page=2" instead of "http://history-server:18080/?page=2&showIncomplete=true". Author: guliangliang <guliangliang@qiyi.com> Closes #4523 from marsishandsome/Spark5733 and squashes the following commits: 9d7b593 [guliangliang] [SPARK-5733] Error Link in Pagination of HistroyPage when showing Incomplete Applications (cherry picked from commit 1ac099e3e00ddb01af8e6e3a84c70f8363f04b5c) Signed-off-by: Sean Owen <sowen@cloudera.com>
* SPARK-5727 [BUILD] Deprecate Debian packagingSean Owen2015-02-112-0/+20
| | | | | | | | | | | | | | This just adds a deprecation message. It's intended for backporting to branch 1.3 but can go in master too, to be followed by another PR that removes it for 1.4. Author: Sean Owen <sowen@cloudera.com> Closes #4516 from srowen/SPARK-5727.1 and squashes the following commits: d48989f [Sean Owen] Refer to Spark 1.4 6c1c8b3 [Sean Owen] Deprecate Debian packaging (cherry picked from commit bd0d6e0cc3a329c4a1c08451a6d8a9281a422958) Signed-off-by: Sean Owen <sowen@cloudera.com>
* SPARK-5728 [STREAMING] MQTTStreamSuite leaves behind ActiveMQ database filesSean Owen2015-02-111-0/+1
| | | | | | | | | | | | | Use temp dir for ActiveMQ database Author: Sean Owen <sowen@cloudera.com> Closes #4517 from srowen/SPARK-5728 and squashes the following commits: 1d3aeb8 [Sean Owen] Use temp dir for ActiveMQ database (cherry picked from commit da89720bf4023392436e75b6ed5e10ed8588a132) Signed-off-by: Sean Owen <sowen@cloudera.com>
* [SPARK-4964] [Streaming] refactor createRDD to take leaders via map instead ↵cody koeninger2015-02-114-66/+287
| | | | | | | | | | | | | | | | | of array Author: cody koeninger <cody@koeninger.org> Closes #4511 from koeninger/kafkaRdd-leader-to-broker and squashes the following commits: f7151d4 [cody koeninger] [SPARK-4964] test refactoring 6f8680b [cody koeninger] [SPARK-4964] add test of the scala api for KafkaUtils.createRDD f81e016 [cody koeninger] [SPARK-4964] leave KafkaStreamSuite host and port as private 5173f3f [cody koeninger] [SPARK-4964] test the Java variations of createRDD e9cece4 [cody koeninger] [SPARK-4964] pass leaders as a map to ensure 1 leader per TopicPartition (cherry picked from commit 658687b25491047f30ee8558733d11e5a0572070) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
* Preparing development version 1.3.1-SNAPSHOTPatrick Wendell2015-02-1128-28/+28
|
* Preparing Spark release v1.3.0-snapshot1Patrick Wendell2015-02-1128-28/+28
|
* Revert "Preparing Spark release v1.3.0-snapshot1"Patrick Wendell2015-02-1028-28/+28
| | | | This reverts commit 53068f56f40bf03b7fc52e5980fb7e205903fc8b.
* Revert "Preparing development version 1.3.1-SNAPSHOT"Patrick Wendell2015-02-1028-28/+28
| | | | This reverts commit ba12b793f1f4f432e71439e2a7ebacce74d9c472.
* HOTFIX: Adding Junit to Hive tests for Maven buildPatrick Wendell2015-02-101-0/+5
|
* Preparing development version 1.3.1-SNAPSHOTPatrick Wendell2015-02-1128-28/+28
|
* Preparing Spark release v1.3.0-snapshot1Patrick Wendell2015-02-1128-28/+28
|
* HOTFIX: Java 6 compilation error in Spark SQLPatrick Wendell2015-02-102-2/+2
|
* Revert "Preparing Spark release v1.3.0-snapshot1"Patrick Wendell2015-02-1028-28/+28
| | | | This reverts commit c2e4001030cfb881ff33d448fc0aeaf4f05dad0f.
* Revert "Preparing development version 1.3.1-SNAPSHOT"Patrick Wendell2015-02-1028-28/+28
| | | | This reverts commit db80d0fe21daa3202ff217cbefb999ce77c5aa9e.