| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
|
|
|
| |
code examples.
|
|\
| |
| |
| |
| | |
Conflicts:
streaming/src/main/scala/org/apache/spark/streaming/DStreamGraph.scala
|
| | |
|
| |
| |
| |
| |
| |
| | |
This programatically sets the log level to WARN by default for streaming
tests. If the user has already specified a log4j.properties file,
the user's file will take precedence over this default.
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Conflicts:
core/src/main/scala/org/apache/spark/SparkContext.scala
core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala
examples/src/main/java/org/apache/spark/streaming/examples/JavaNetworkWordCount.java
streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Refactored the streaming project to separate external libraries like Twitter, Kafka, Flume, etc.
At a high level, these are the following changes.
1. All the external code was put in `SPARK_HOME/external/` as separate SBT projects and Maven modules. Their artifact names are `spark-streaming-twitter`, `spark-streaming-kafka`, etc. Both SparkBuild.scala and pom.xml files have been updated. References to external libraries and repositories have been removed from the settings of root and streaming projects/modules.
2. To avail the external functionality (say, creating a Twitter stream), the developer has to `import org.apache.spark.streaming.twitter._` . For Scala API, the developer has to call `TwitterUtils.createStream(streamingContext, ...)`. For the Java API, the developer has to call `TwitterUtils.createStream(javaStreamingContext, ...)`.
3. Each external project has its own scala and java unit tests. Note the unit tests of each external library use classes of the streaming unit tests (`TestSuiteBase`, `LocalJavaStreamingContext`, etc.). To enable this code sharing among test classes, `dependsOn(streaming % "compile->compile,test->test")` was used in the SparkBuild.scala . In the streaming/pom.xml, an additional `maven-jar-plugin` was necessary to capture this dependency (see comment inside the pom.xml for more information).
4. Jars of the external projects have been added to examples project but not to the assembly project.
5. In some files, imports have been rearrange to conform to the Spark coding guidelines.
|
| | |
| | |
| | |
| | | |
for creating XYZ streams.
|
| | |\
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Conflicts:
examples/src/main/java/org/apache/spark/streaming/examples/JavaFlumeEventCount.java
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java
streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala
|
| | | |
| | | |
| | | |
| | | | |
package. Also fixed packages of Flume and MQTT tests.
|
| | | |
| | | |
| | | |
| | | | |
their own self-contained scala API, java API, scala unit tests and java unit tests. Updated examples to use the external projects.
|
| | | | |
|
| |\ \ \
| | | |/
| | |/| |
|
| | | | |
|
| | |/ |
|
| |/
| |
| |
| | |
encapsulation and in some cases performance
|
| | |
|
|/
|
|
| |
JavaStreamingContext.getOrCreate.
|
|\ |
|
| | |
|
|/ |
|
| |
|
|\
| |
| | |
Refactor SGD options into a new class.
|
| |
| |
| |
| | |
Also remove java-specific constructor for LabeledPoint.
|
| |\
| | |
| | |
| | |
| | | |
Conflicts:
mllib/src/main/scala/spark/mllib/util/MLUtils.scala
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This change adds Java examples and unit tests for all GLM algorithms
to make sure the MLLib interface works from Java. Changes include
- Introduce LabeledPoint and avoid using Doubles in train arguments
- Rename train to run in class methods
- Make the optimizer a member variable of GLM to make sure the builder
pattern works
|
| | | |
|
|\ \ \
| | | |
| | | | |
Java fixes, tests and examples for ALS, KMeans
|
| | | |
| | | |
| | | |
| | | |
| | | | |
The scala constructor works for native type java types. Modify examples
to match this.
|
| | |/
| |/|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
- Changes ALS to accept RDD[Rating] instead of (Int, Int, Double) making it
easier to call from Java
- Renames class methods from `train` to `run` to enable static methods to be
called from Java.
- Add unit tests which check if both static / class methods can be called.
- Also add examples which port the main() function in ALS, KMeans to the
examples project.
Couple of minor changes to existing code:
- Add a toJavaRDD method in RDD to convert scala RDD to java RDD easily
- Workaround a bug where using double[] from Java leads to class cast exception in
KMeans init
|
|/ / |
|
| | |
|
| | |
|
| | |
|
|/ |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
networkStream as a way to create streams from arbitrary network receiver.
|
|
|
|
| |
and fixed logging in NetworkInputTracker to highlight errors when receiver deregisters/shuts down.
|
|
|
|
|
|
|
| |
- Added a StorageLevels class for easy access to StorageLevel constants
in Java
- Added doc comments on Function classes in Java
- Updated Accumulator and HadoopWriter docs slightly
|
| |
|
| |
|
|
|
|
|
|
| |
- Add override keywords.
- Cache RDDs and counts in TC example.
- Clean up JavaRDDLike's abstract methods.
|