aboutsummaryrefslogtreecommitdiff
path: root/external/flume
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-17807][CORE] split test-tags into test-JARRyan Williams2016-12-211-0/+12
| | | | | | | | | | Remove spark-tag's compile-scope dependency (and, indirectly, spark-core's compile-scope transitive-dependency) on scalatest by splitting test-oriented tags into spark-tags' test JAR. Alternative to #16303. Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #16311 from ryan-williams/tt.
* [SPARK-18695] Bump master branch version to 2.2.0-SNAPSHOTReynold Xin2016-12-021-1/+1
| | | | | | | | | | | | ## What changes were proposed in this pull request? This patch bumps master branch version to 2.2.0-SNAPSHOT. ## How was this patch tested? N/A Author: Reynold Xin <rxin@databricks.com> Closes #16126 from rxin/SPARK-18695.
* [SPARK-3359][DOCS] Make javadoc8 working for unidoc/genjavadoc compatibility ↵hyukjinkwon2016-11-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in Java API documentation ## What changes were proposed in this pull request? This PR make `sbt unidoc` complete with Java 8. This PR roughly includes several fixes as below: - Fix unrecognisable class and method links in javadoc by changing it from `[[..]]` to `` `...` `` ```diff - * A column that will be computed based on the data in a [[DataFrame]]. + * A column that will be computed based on the data in a `DataFrame`. ``` - Fix throws annotations so that they are recognisable in javadoc - Fix URL links to `<a href="http..."></a>`. ```diff - * [[http://en.wikipedia.org/wiki/Decision_tree_learning Decision tree]] model for regression. + * <a href="http://en.wikipedia.org/wiki/Decision_tree_learning"> + * Decision tree (Wikipedia)</a> model for regression. ``` ```diff - * see http://en.wikipedia.org/wiki/Receiver_operating_characteristic + * see <a href="http://en.wikipedia.org/wiki/Receiver_operating_characteristic"> + * Receiver operating characteristic (Wikipedia)</a> ``` - Fix < to > to - `greater than`/`greater than or equal to` or `less than`/`less than or equal to` where applicable. - Wrap it with `{{{...}}}` to print them in javadoc or use `{code ...}` or `{literal ..}`. Please refer https://github.com/apache/spark/pull/16013#discussion_r89665558 - Fix `</p>` complaint ## How was this patch tested? Manually tested by `jekyll build` with Java 7 and 8 ``` java version "1.7.0_80" Java(TM) SE Runtime Environment (build 1.7.0_80-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode) ``` ``` java version "1.8.0_45" Java(TM) SE Runtime Environment (build 1.8.0_45-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode) ``` Author: hyukjinkwon <gurwls223@gmail.com> Closes #16013 from HyukjinKwon/SPARK-3359-errors-more.
* [SPARK-16535][BUILD] In pom.xml, remove groupId which is redundant ↵Xin Ren2016-07-191-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | definition and inherited from the parent https://issues.apache.org/jira/browse/SPARK-16535 ## What changes were proposed in this pull request? When I scan through the pom.xml of sub projects, I found this warning as below and attached screenshot ``` Definition of groupId is redundant, because it's inherited from the parent ``` ![screen shot 2016-07-13 at 3 13 11 pm](https://cloud.githubusercontent.com/assets/3925641/16823121/744f893e-4916-11e6-8a52-042f83b9db4e.png) I've tried to remove some of the lines with groupId definition, and the build on my local machine is still ok. ``` <groupId>org.apache.spark</groupId> ``` As I just find now `<maven.version>3.3.9</maven.version>` is being used in Spark 2.x, and Maven-3 supports versionless parent elements: Maven 3 will remove the need to specify the parent version in sub modules. THIS is great (in Maven 3.1). ref: http://stackoverflow.com/questions/3157240/maven-3-worth-it/3166762#3166762 ## How was this patch tested? I've tested by re-building the project, and build succeeded. Author: Xin Ren <iamshrek@126.com> Closes #14189 from keypointt/SPARK-16535.
* [SPARK-16477] Bump master version to 2.1.0-SNAPSHOTReynold Xin2016-07-111-1/+1
| | | | | | | | | | | | ## What changes were proposed in this pull request? After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number. ## How was this patch tested? N/A Author: Reynold Xin <rxin@databricks.com> Closes #14130 from rxin/SPARK-16477.
* [MINOR] Fix Typos 'an -> a'Zheng RuiFeng2016-06-061-1/+1
| | | | | | | | | | | | | | | ## What changes were proposed in this pull request? `an -> a` Use cmds like `find . -name '*.R' | xargs -i sh -c "grep -in ' an [^aeiou]' {} && echo {}"` to generate candidates, and review them one by one. ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #13515 from zhengruifeng/an_a.
* [SPARK-15290][BUILD] Move annotations, like @Since / @DeveloperApi, into ↵Sean Owen2016-05-171-1/+1
| | | | | | | | | | | | | | | | | | spark-tags ## What changes were proposed in this pull request? (See https://github.com/apache/spark/pull/12416 where most of this was already reviewed and committed; this is just the module structure and move part. This change does not move the annotations into test scope, which was the apparently problem last time.) Rename `spark-test-tags` -> `spark-tags`; move common annotations like `Since` to `spark-tags` ## How was this patch tested? Jenkins tests. Author: Sean Owen <sowen@cloudera.com> Closes #13074 from srowen/SPARK-15290.
* [SPARK-14936][BUILD][TESTS] FlumePollingStreamSuite is slowXin Ren2016-05-102-8/+22
| | | | | | | | | | | | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-14936 ## What changes were proposed in this pull request? FlumePollingStreamSuite contains two tests which run for a minute each. This seems excessively slow and we should speed it up if possible. In this PR, instead of creating `StreamingContext` directly from `conf`, here an underlying `SparkContext` is created before all and it is used to create each`StreamingContext`. Running time is reduced by avoiding multiple `SparkContext` creations and destroys. ## How was this patch tested? Tested on my local machine running `testOnly *.FlumePollingStreamSuite` Author: Xin Ren <iamshrek@126.com> Closes #12845 from keypointt/SPARK-14936.
* Revert "[SPARK-14613][ML] Add @Since into the matrix and vector classes in ↵Yin Huai2016-04-281-1/+1
| | | | | | spark-mllib-local" This reverts commit dae538a4d7c36191c1feb02ba87ffc624ab960dc.
* [SPARK-14613][ML] Add @Since into the matrix and vector classes in ↵Pravin Gadakh2016-04-281-1/+1
| | | | | | | | | | | | | | | | spark-mllib-local ## What changes were proposed in this pull request? This PR adds `since` tag into the matrix and vector classes in spark-mllib-local. ## How was this patch tested? Scala-style checks passed. Author: Pravin Gadakh <prgadakh@in.ibm.com> Closes #12416 from pravingadakh/SPARK-14613.
* [SPARK-14868][BUILD] Enable NewLineAtEofChecker in checkstyle and fix ↵Dongjoon Hyun2016-04-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | lint-java errors ## What changes were proposed in this pull request? Spark uses `NewLineAtEofChecker` rule in Scala by ScalaStyle. And, most Java code also comply with the rule. This PR aims to enforce the same rule `NewlineAtEndOfFile` by CheckStyle explicitly. Also, this fixes lint-java errors since SPARK-14465. The followings are the items. - Adds a new line at the end of the files (19 files) - Fixes 25 lint-java errors (12 RedundantModifier, 6 **ArrayTypeStyle**, 2 LineLength, 2 UnusedImports, 2 RegexpSingleline, 1 ModifierOrder) ## How was this patch tested? After the Jenkins test succeeds, `dev/lint-java` should pass. (Currently, Jenkins dose not run lint-java.) ```bash $ dev/lint-java Using `mvn` from path: /usr/local/bin/mvn Checkstyle checks passed. ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12632 from dongjoon-hyun/SPARK-14868.
* [MINOR][SQL] Remove extra anonymous closure within functional transformationshyukjinkwon2016-04-142-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR removes extra anonymous closure within functional transformations. For example, ```scala .map(item => { ... }) ``` which can be just simply as below: ```scala .map { item => ... } ``` ## How was this patch tested? Related unit tests and `sbt scalastyle`. Author: hyukjinkwon <gurwls223@gmail.com> Closes #12382 from HyukjinKwon/minor-extra-closers.
* [SPARK-14134][CORE] Change the package name used for shading classes.Marcelo Vanzin2016-04-061-1/+1
| | | | | | | | | | | | | | | The current package name uses a dash, which is a little weird but seemed to work. That is, until a new test tried to mock a class that references one of those shaded types, and then things started failing. Most changes are just noise to fix the logging configs. For reference, SPARK-8815 also raised this issue, although at the time it did not cause any issues in Spark, so it was not addressed. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11941 from vanzin/SPARK-14134.
* [SPARK-14444][BUILD] Add a new scalastyle `NoScalaDoc` to prevent ↵Dongjoon Hyun2016-04-061-2/+4
| | | | | | | | | | | | | | | | | | | | | ScalaDoc-style multiline comments ## What changes were proposed in this pull request? According to the [Spark Code Style Guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-Indentation), this PR adds a new scalastyle rule to prevent the followings. ``` /** In Spark, we don't use the ScalaDoc style so this * is not correct. */ ``` ## How was this patch tested? Pass the Jenkins tests (including `lint-scala`). Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12221 from dongjoon-hyun/SPARK-14444.
* [SPARK-14355][BUILD] Fix typos in Exception/Testcase/Comments and static ↵Dongjoon Hyun2016-04-031-5/+6
| | | | | | | | | | | | | | | | | | | | | analysis results ## What changes were proposed in this pull request? This PR contains the following 5 types of maintenance fix over 59 files (+94 lines, -93 lines). - Fix typos(exception/log strings, testcase name, comments) in 44 lines. - Fix lint-java errors (MaxLineLength) in 6 lines. (New codes after SPARK-14011) - Use diamond operators in 40 lines. (New codes after SPARK-13702) - Fix redundant semicolon in 5 lines. - Rename class `InferSchemaSuite` to `CSVInferSchemaSuite` in CSVInferSchemaSuite.scala. ## How was this patch tested? Manual and pass the Jenkins tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12139 from dongjoon-hyun/SPARK-14355.
* [MINOR][DOCS] Use multi-line JavaDoc comments in Scala code.Dongjoon Hyun2016-04-021-7/+8
| | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR aims to fix all Scala-Style multiline comments into Java-Style multiline comments in Scala codes. (All comment-only changes over 77 files: +786 lines, −747 lines) ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12130 from dongjoon-hyun/use_multiine_javadoc_comments.
* [SPARK-14073][STREAMING][TEST-MAVEN] Move flume back to SparkShixiong Zhu2016-03-2517-0/+1758
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR moves flume back to Spark as per the discussion in the dev mail-list. ## How was this patch tested? Existing Jenkins tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11895 from zsxwing/move-flume-back.
* [SPARK-13843][STREAMING] Remove streaming-flume, streaming-mqtt, ↵Shixiong Zhu2016-03-1417-1756/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | streaming-zeromq, streaming-akka, streaming-twitter to Spark packages ## What changes were proposed in this pull request? Currently there are a few sub-projects, each for integrating with different external sources for Streaming. Now that we have better ability to include external libraries (spark packages) and with Spark 2.0 coming up, we can move the following projects out of Spark to https://github.com/spark-packages - streaming-flume - streaming-akka - streaming-mqtt - streaming-zeromq - streaming-twitter They are just some ancillary packages and considering the overhead of maintenance, running tests and PR failures, it's better to maintain them out of Spark. In addition, these projects can have their different release cycles and we can release them faster. I have already copied these projects to https://github.com/spark-packages ## How was this patch tested? Jenkins tests Author: Shixiong Zhu <shixiong@databricks.com> Closes #11672 from zsxwing/remove-external-pkg.
* [MINOR][DOCS] Fix more typos in comments/strings.Dongjoon Hyun2016-03-141-3/+3
| | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR fixes 135 typos over 107 files: * 121 typos in comments * 11 typos in testcase name * 3 typos in log messages ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11689 from dongjoon-hyun/fix_more_typos.
* [SPARK-13823][CORE][STREAMING][SQL] Always specify Charset in String <-> ↵Sean Owen2016-03-132-4/+5
| | | | | | | | | | | | | | | | | | | | byte[] conversions (and remaining Coverity items) ## What changes were proposed in this pull request? - Fixes calls to `new String(byte[])` or `String.getBytes()` that rely on platform default encoding, to use UTF-8 - Same for `InputStreamReader` and `OutputStreamWriter` constructors - Standardizes on UTF-8 everywhere - Standardizes specifying the encoding with `StandardCharsets.UTF-8`, not the Guava constant or "UTF-8" (which means handling `UnuspportedEncodingException`) - (also addresses the other remaining Coverity scan issues, which are pretty trivial; these are separated into commit https://github.com/srowen/spark/commit/1deecd8d9ca986d8adb1a42d315890ce5349d29c ) ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #11657 from srowen/SPARK-13823.
* [SPARK-13583][CORE][STREAMING] Remove unused imports and add checkstyle ruleDongjoon Hyun2016-03-031-1/+0
| | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? After SPARK-6990, `dev/lint-java` keeps Java code healthy and helps PR review by saving much time. This issue aims remove unused imports from Java/Scala code and add `UnusedImports` checkstyle rule to help developers. ## How was this patch tested? ``` ./dev/lint-java ./build/sbt compile ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11438 from dongjoon-hyun/SPARK-13583.
* [SPARK-13165][STREAMING] Replace deprecated synchronizedBuffer in streamingHolden Karau2016-02-093-17/+17
| | | | | | | | | | | | Building with Scala 2.11 results in the warning trait SynchronizedBuffer in package mutable is deprecated: Synchronization via traits is deprecated as it is inherently unreliable. Consider java.util.concurrent.ConcurrentLinkedQueue as an alternative - we already use ConcurrentLinkedQueue elsewhere so lets replace it. Some notes about how behaviour is different for reviewers: The Seq from a SynchronizedBuffer that was implicitly converted would continue to receive updates - however when we do the same conversion explicitly on the ConcurrentLinkedQueue this isn't the case. Hence changing some of the (internal & test) APIs to pass an Iterable. toSeq is safe to use if there are no more updates. Author: Holden Karau <holden@us.ibm.com> Author: tedyu <yuzhihong@gmail.com> Closes #11067 from holdenk/SPARK-13165-replace-deprecated-synchronizedBuffer-in-streaming.
* [SPARK-6363][BUILD] Make Scala 2.11 the default Scala versionJosh Rosen2016-01-301-2/+2
| | | | | | | | | | | | This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds). The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance). After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break. Author: Josh Rosen <joshrosen@databricks.com> Closes #10608 from JoshRosen/SPARK-6363.
* [SPARK-12692][BUILD][STREAMING] Scala style: Fix the style violation (Space ↵Kousuke Saruta2016-01-111-9/+9
| | | | | | | | | | | before "," or ":") Fix the style violation (space before , and :). This PR is a followup for #10643. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10685 from sarutak/SPARK-12692-followup-streaming.
* [SPARK-3873][TESTS] Import ordering fixes.Marcelo Vanzin2016-01-052-5/+5
| | | | | | Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10582 from vanzin/SPARK-3873-tests.
* [SPARK-3873][STREAMING] Import order fixes for streaming.Marcelo Vanzin2015-12-316-19/+16
| | | | | | | | Also included a few miscelaneous other modules that had very few violations. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10532 from vanzin/SPARK-3873-streaming.
* Bump master version to 2.0.0-SNAPSHOT.Reynold Xin2015-12-191-1/+1
| | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #10387 from rxin/version-bump.
* [SPARK-12084][CORE] Fix codes that uses ByteBuffer.array incorrectlyShixiong Zhu2015-12-043-7/+7
| | | | | | | | | | `ByteBuffer` doesn't guarantee all contents in `ByteBuffer.array` are valid. E.g, a ByteBuffer returned by `ByteBuffer.slice`. We should not use the whole content of `ByteBuffer` unless we know that's correct. This patch fixed all places that use `ByteBuffer.array` incorrectly. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10083 from zsxwing/bytebuffer-array.
* [SPARK-11706][STREAMING] Fix the bug that Streaming Python tests cannot ↵Shixiong Zhu2015-11-134-8/+10
| | | | | | | | | | report failures This PR just checks the test results and returns 1 if the test fails, so that `run-tests.py` can mark it fail. Author: Shixiong Zhu <shixiong@databricks.com> Closes #9669 from zsxwing/streaming-python-tests.
* [SPARK-11361][STREAMING] Show scopes of RDD operations inside ↵Tathagata Das2015-11-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DStream.foreachRDD and DStream.transform in DAG viz Currently, when a DStream sets the scope for RDD generated by it, that scope is not allowed to be overridden by the RDD operations. So in case of `DStream.foreachRDD`, all the RDDs generated inside the foreachRDD get the same scope - `foreachRDD <time>`, as set by the `ForeachDStream`. So it is hard to debug generated RDDs in the RDD DAG viz in the Spark UI. This patch allows the RDD operations inside `DStream.transform` and `DStream.foreachRDD` to append their own scopes to the earlier DStream scope. I have also slightly tweaked how callsites are set such that the short callsite reflects the RDD operation name and line number. This tweak is necessary as callsites are not managed through scopes (which support nesting and overriding) and I didnt want to add another local property to control nesting and overriding of callsites. ## Before: ![image](https://cloud.githubusercontent.com/assets/663212/10808548/fa71c0c4-7da9-11e5-9af0-5737793a146f.png) ## After: ![image](https://cloud.githubusercontent.com/assets/663212/10808659/37bc45b6-7dab-11e5-8041-c20be6a9bc26.png) The code that was used to generate this is: ``` val lines = ssc.socketTextStream(args(0), args(1).toInt, StorageLevel.MEMORY_AND_DISK_SER) val words = lines.flatMap(_.split(" ")) val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _) wordCounts.foreachRDD { rdd => val temp = rdd.map { _ -> 1 }.reduceByKey( _ + _) val temp2 = temp.map { _ -> 1}.reduceByKey(_ + _) val count = temp2.count println(count) } ``` Note - The inner scopes of the RDD operations map/reduceByKey inside foreachRDD is visible - The short callsites of stages refers to the line number of the RDD ops rather than the same line number of foreachRDD in all three cases. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #9315 from tdas/SPARK-11361.
* [SPARK-11019] [STREAMING] [FLUME] Gracefully shutdown Flume receiver th…Hari Shreedharan2015-10-081-2/+6
| | | | | | | | | | …reads. Wait for a minute for the receiver threads to shutdown before interrupting them. Author: Hari Shreedharan <hshreedharan@apache.org> Closes #9041 from harishreedharan/flume-graceful-shutdown.
* [SPARK-10300] [BUILD] [TESTS] Add support for test tags in run-tests.py.Marcelo Vanzin2015-10-071-8/+2
| | | | | | Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8775 from vanzin/SPARK-10300.
* Revert "[SPARK-10300] [BUILD] [TESTS] Add support for test tags in ↵Marcelo Vanzin2015-09-151-0/+10
| | | | | | run-tests.py." This reverts commit 8abef21dac1a6538c4e4e0140323b83d804d602b.
* [SPARK-10300] [BUILD] [TESTS] Add support for test tags in run-tests.py.Marcelo Vanzin2015-09-151-10/+0
| | | | | | | | | | | | | | | This change does two things: - tag a few tests and adds the mechanism in the build to be able to disable those tags, both in maven and sbt, for both junit and scalatest suites. - add some logic to run-tests.py to disable some tags depending on what files have changed; that's used to disable expensive tests when a module hasn't explicitly been changed, to speed up testing for changes that don't directly affect those modules. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8437 from vanzin/test-tags.
* Update version to 1.6.0-SNAPSHOT.Reynold Xin2015-09-151-1/+1
| | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #8350 from rxin/1.6.
* [SPARK-10227] fatal warnings with sbt on Scala 2.11Luc Bourlier2015-09-092-2/+2
| | | | | | | | | | | The bulk of the changes are on `transient` annotation on class parameter. Often the compiler doesn't generate a field for this parameters, so the the transient annotation would be unnecessary. But if the class parameter are used in methods, then fields are created. So it is safer to keep the annotations. The remainder are some potential bugs, and deprecated syntax. Author: Luc Bourlier <luc.bourlier@typesafe.com> Closes #8433 from skyluc/issue/sbt-2.11.
* [SPARK-9613] [CORE] Ban use of JavaConversions and migrate all existing uses ↵Sean Owen2015-08-259-35/+29
| | | | | | | | | | | | to JavaConverters Replace `JavaConversions` implicits with `JavaConverters` Most occurrences I've seen so far are necessary conversions; a few have been avoidable. None are in critical code as far as I see, yet. Author: Sean Owen <sowen@cloudera.com> Closes #8033 from srowen/SPARK-9613.
* [SPARK-9791] [PACKAGE] Change private class to private class to prevent ↵Tathagata Das2015-08-241-1/+1
| | | | | | | | | | | | unnecessary classes from showing up in the docs In addition, some random cleanup of import ordering Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #8387 from tdas/SPARK-9791 and squashes the following commits: 67f3ee9 [Tathagata Das] Change private class to private[package] class to prevent them from showing up in the docs
* [SPARK-8378] [STREAMING] Add the Python API for Flumezsxwing2015-07-015-224/+456
| | | | | | | | | | | | | | | | | | | | | | | Author: zsxwing <zsxwing@gmail.com> Closes #6830 from zsxwing/flume-python and squashes the following commits: 78dfdac [zsxwing] Fix the compile error in the test code f1bf3c0 [zsxwing] Address TD's comments 0449723 [zsxwing] Add sbt goal streaming-flume-assembly/assembly e93736b [zsxwing] Fix the test case for determine_modules_to_test 9d5821e [zsxwing] Fix pyspark_core dependencies f9ee681 [zsxwing] Merge branch 'master' into flume-python 7a55837 [zsxwing] Add streaming_flume_assembly to run-tests.py b96b0de [zsxwing] Merge branch 'master' into flume-python ce85e83 [zsxwing] Fix incompatible issues for Python 3 01cbb3d [zsxwing] Add import sys 152364c [zsxwing] Fix the issue that StringIO doesn't work in Python 3 14ba0ff [zsxwing] Add flume-assembly for sbt building b8d5551 [zsxwing] Merge branch 'master' into flume-python 4762c34 [zsxwing] Fix the doc 0336579 [zsxwing] Refactor Flume unit tests and also add tests for Python API 9f33873 [zsxwing] Add the Python API for Flume
* [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0Patrick Wendell2015-06-031-1/+1
| | | | | | | | | | | | | Author: Patrick Wendell <patrick@databricks.com> Closes #6328 from pwendell/spark-1.5-update and squashes the following commits: 2f42d02 [Patrick Wendell] A few more excludes 4bebcf0 [Patrick Wendell] Update to RC4 61aaf46 [Patrick Wendell] Using new release candidate 55f1610 [Patrick Wendell] Another exclude 04b4f04 [Patrick Wendell] More issues with transient 1.4 changes 36f549b [Patrick Wendell] [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0
* [SPARK-3850] Trim trailing spaces for examples/streaming/yarn.Reynold Xin2015-05-313-7/+7
| | | | | | | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #6530 from rxin/trim-whitespace-1 and squashes the following commits: 7b7b3a0 [Reynold Xin] Reset again. dc14597 [Reynold Xin] Reset scalastyle. cd556c4 [Reynold Xin] YARN, Kinesis, Flume. 4223fe1 [Reynold Xin] [SPARK-3850] Trim trailing spaces for examples/streaming.
* [SPARK-7558] Demarcate tests in unit-tests.logAndrew Or2015-05-293-6/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now `unit-tests.log` are not of much value because we can't tell where the test boundaries are easily. This patch adds log statements before and after each test to outline the test boundaries, e.g.: ``` ===== TEST OUTPUT FOR o.a.s.serializer.KryoSerializerSuite: 'kryo with parallelize for primitive arrays' ===== 15/05/27 12:36:39.596 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO SparkContext: Starting job: count at KryoSerializerSuite.scala:230 15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Got job 3 (count at KryoSerializerSuite.scala:230) with 4 output partitions (allowLocal=false) 15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Final stage: ResultStage 3(count at KryoSerializerSuite.scala:230) 15/05/27 12:36:39.596 dag-scheduler-event-loop INFO DAGScheduler: Parents of final stage: List() 15/05/27 12:36:39.597 dag-scheduler-event-loop INFO DAGScheduler: Missing parents: List() 15/05/27 12:36:39.597 dag-scheduler-event-loop INFO DAGScheduler: Submitting ResultStage 3 (ParallelCollectionRDD[5] at parallelize at KryoSerializerSuite.scala:230), which has no missing parents ... 15/05/27 12:36:39.624 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO DAGScheduler: Job 3 finished: count at KryoSerializerSuite.scala:230, took 0.028563 s 15/05/27 12:36:39.625 pool-1-thread-1-ScalaTest-running-KryoSerializerSuite INFO KryoSerializerSuite: ***** FINISHED o.a.s.serializer.KryoSerializerSuite: 'kryo with parallelize for primitive arrays' ***** ... ``` Author: Andrew Or <andrew@databricks.com> Closes #6441 from andrewor14/demarcate-tests and squashes the following commits: 879b060 [Andrew Or] Fix compile after rebase d622af7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests 017c8ba [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests 7790b6c [Andrew Or] Fix tests after logical merge conflict c7460c0 [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests c43ffc4 [Andrew Or] Fix tests? 8882581 [Andrew Or] Fix tests ee22cda [Andrew Or] Fix log message fa9450e [Andrew Or] Merge branch 'master' of github.com:apache/spark into demarcate-tests 12d1e1b [Andrew Or] Various whitespace changes (minor) 69cbb24 [Andrew Or] Make all test suites extend SparkFunSuite instead of FunSuite bbce12e [Andrew Or] Fix manual things that cannot be covered through automation da0b12f [Andrew Or] Add core tests as dependencies in all modules f7d29ce [Andrew Or] Introduce base abstract class for all test suites
* [SPARK-7929] Turn whitespace checker on for more token types.Reynold Xin2015-05-281-1/+1
| | | | | | | | | | | | | | | | | | | This is the last batch of changes to complete SPARK-7929. Previous related PRs: https://github.com/apache/spark/pull/6480 https://github.com/apache/spark/pull/6478 https://github.com/apache/spark/pull/6477 https://github.com/apache/spark/pull/6476 https://github.com/apache/spark/pull/6475 https://github.com/apache/spark/pull/6474 https://github.com/apache/spark/pull/6473 Author: Reynold Xin <rxin@databricks.com> Closes #6487 from rxin/whitespace-lint and squashes the following commits: b33d43d [Reynold Xin] [SPARK-7929] Turn whitespace checker on for more token types.
* [SPARK-7356] [STREAMING] Fix flakey tests in FlumePollingStreamSuite using ↵Hari Shreedharan2015-05-131-59/+51
| | | | | | | | | | | | | SparkSink's batch CountDownLatch. This is meant to make the FlumePollingStreamSuite deterministic. Now we basically count the number of batches that have been completed - and then verify the results rather than sleeping for random periods of time. Author: Hari Shreedharan <hshreedharan@apache.org> Closes #5918 from harishreedharan/flume-test-fix and squashes the following commits: 93f24f3 [Hari Shreedharan] Add an eventually block to ensure that all received data is processed. Refactor the dstream creation and remove redundant code. 1108804 [Hari Shreedharan] [SPARK-7356][STREAMING] Fix flakey tests in FlumePollingStreamSuite using SparkSink's batch CountDownLatch.
* [HOTFIX] [TEST] Ignoring flaky testsTathagata Das2015-05-051-2/+2
| | | | | | | | | | | | | | | | | org.apache.spark.DriverSuite.driver should exit after finishing without cleanup (SPARK-530) https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/2267/ org.apache.spark.deploy.SparkSubmitSuite.includes jars passed in through --jars https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/2271/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=centos/testReport/ org.apache.spark.streaming.flume.FlumePollingStreamSuite.flume polling test https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/2269/ Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #5901 from tdas/ignore-flaky-tests and squashes the following commits: 9cd8667 [Tathagata Das] Ignoring tests.
* [SPARK-6758]block the right jetty package in logWangTaoTheTonic2015-04-091-1/+1
| | | | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-6758 I am not sure if it is ok to block them in test resources too (as we shade jetty in assembly?). Author: WangTaoTheTonic <wangtao111@huawei.com> Closes #5406 from WangTaoTheTonic/SPARK-6758 and squashes the following commits: e09605b [WangTaoTheTonic] block the right jetty package
* [SPARK-6765] Fix test code style for streaming.Reynold Xin2015-04-082-16/+17
| | | | | | | | | | So we can turn style checker on for test code. Author: Reynold Xin <rxin@databricks.com> Closes #5409 from rxin/test-style-streaming and squashes the following commits: 7aea69b [Reynold Xin] [SPARK-6765] Fix test code style for streaming.
* [SPARK-6428] Turn on explicit type checking for public methods.Reynold Xin2015-04-031-7/+5
| | | | | | | | | | | | | | This builds on my earlier pull requests and turns on the explicit type checking in scalastyle. Author: Reynold Xin <rxin@databricks.com> Closes #5342 from rxin/SPARK-6428 and squashes the following commits: 7b531ab [Reynold Xin] import ordering 2d9a8a5 [Reynold Xin] jl e668b1c [Reynold Xin] override 9b9e119 [Reynold Xin] Parenthesis. 82e0cf5 [Reynold Xin] [SPARK-6428] Turn on explicit type checking for public methods.
* [SPARK-5559] [Streaming] [Test] Remove oppotunity we met flakiness when ↵Kousuke Saruta2015-03-241-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | running FlumeStreamSuite When we run FlumeStreamSuite on Jenkins, sometimes we get error like as follows. sbt.ForkMain$ForkError: The code passed to eventually never returned normally. Attempted 52 times over 10.094849836 seconds. Last failure message: Error connecting to localhost/127.0.0.1:23456. at org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:420) at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:438) at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478) at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:307) at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478) at org.apache.spark.streaming.flume.FlumeStreamSuite.writeAndVerify(FlumeStreamSuite.scala:116) at org.apache.spark.streaming.flume.FlumeStreamSuite.org$apache$spark$streaming$flume$FlumeStreamSuite$$testFlumeStream(FlumeStreamSuite.scala:74) at org.apache.spark.streaming.flume.FlumeStreamSuite$$anonfun$3.apply$mcV$sp(FlumeStreamSuite.scala:66) at org.apache.spark.streaming.flume.FlumeStreamSuite$$anonfun$3.apply(FlumeStreamSuite.scala:66) at org.apache.spark.streaming.flume.FlumeStreamSuite$$anonfun$3.apply(FlumeStreamSuite.scala:66) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) This error is caused by check-then-act logic when it find free-port . /** Find a free port */ private def findFreePort(): Int = { Utils.startServiceOnPort(23456, (trialPort: Int) => { val socket = new ServerSocket(trialPort) socket.close() (null, trialPort) }, conf)._2 } Removing the check-then-act is not easy but we can reduce the chance of having the error by choosing random value for initial port instead of 23456. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #4337 from sarutak/SPARK-5559 and squashes the following commits: 16f109f [Kousuke Saruta] Added `require` to Utils#startServiceOnPort c39d8b6 [Kousuke Saruta] Merge branch 'SPARK-5559' of github.com:sarutak/spark into SPARK-5559 1610ba2 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5559 33357e3 [Kousuke Saruta] Changed "findFreePort" method in MQTTStreamSuite and FlumeStreamSuite so that it can choose valid random port a9029fe [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5559 9489ef9 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5559 8212e42 [Kousuke Saruta] Modified default port used in FlumeStreamSuite from 23456 to random value
* [SPARK-6371] [build] Update version to 1.4.0-SNAPSHOT.Marcelo Vanzin2015-03-201-1/+1
| | | | | | | | | | | | | | | | Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #5056 from vanzin/SPARK-6371 and squashes the following commits: 63220df [Marcelo Vanzin] Merge branch 'master' into SPARK-6371 6506f75 [Marcelo Vanzin] Use more fine-grained exclusion. 178ba71 [Marcelo Vanzin] Oops. 75b2375 [Marcelo Vanzin] Exclude VertexRDD in MiMA. a45a62c [Marcelo Vanzin] Work around MIMA warning. 1d8a670 [Marcelo Vanzin] Re-group jetty exclusion. 0e8e909 [Marcelo Vanzin] Ignore ml, don't ignore graphx. cef4603 [Marcelo Vanzin] Indentation. 296cf82 [Marcelo Vanzin] [SPARK-6371] [build] Update version to 1.4.0-SNAPSHOT.