| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
| |
remove "-XX:+UseCompressedStrings" option from tuning guide since jdk7 no longer supports this.
|
|\
| |
| |
| | |
Rename VertexID -> VertexId in GraphX
|
| | |
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Fixed the flaky tests by making SparkConf not serializable
SparkConf was being serialized with CoGroupedRDD and Aggregator, which somehow caused OptionalJavaException while being deserialized as part of a ShuffleMapTask. SparkConf should not even be serializable (according to conversation with Matei). This change fixes that.
@mateiz @pwendell
|
| |\ \ |
|
| | | |
| | | |
| | | |
| | | | |
in log4j.properties of external modules.
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Fixed SVDPlusPlusSuite in Maven build.
This should go into 0.9.0 also.
|
| | |/ /
| |/| | |
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Additional edits for clarity in the graphx programming guide.
Added an overview of the Graph and GraphOps functions and fixed numerous typos.
|
| | | | | |
|
|\ \ \ \ \
| |_|/ / /
|/| | | /
| | |_|/
| |/| | |
Describe caching and uncaching in GraphX programming guide
|
|/ / / |
|
|\ \ \
| | | |
| | | |
| | | | |
Don't clone records for text files
|
| | | | |
|
| | | | |
|
|\ \ \ \
| | | | |
| | | | |
| | | | | |
Add GraphX dependency to examples/pom.xml
|
| | |/ /
| |/| | |
|
|\ \ \ \
| | | | |
| | | | |
| | | | | |
Deprecate rather than remove old combineValuesByKey function
|
| | | | | |
|
| |/ / / |
|
|\ \ \ \
| |/ / /
|/| | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
API doc update & make Broadcast public
In #413 Broadcast was mistakenly made private[spark]. I changed it to public again. Also exposing id in public given the R frontend requires that.
Copied some of the documentation from the programming guide to API Doc for Broadcast and Accumulator.
This should be cherry picked into branch-0.9 as well for 0.9.0 release.
|
| | | | |
|
| | | |
| | | |
| | | |
| | | | |
java.io.Serializable for Broadcast and Accumulator.
|
| | | | |
|
| | | | |
|
| | | | |
|
|/ / /
| | |
| | |
| | |
| | | |
Note that previously Broadcast class was accidentally marked as private[spark]. It needs to be public
for broadcast variables to work. Also exposing the broadcast varaible id.
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | | |
Improving the graphx-programming-guide
This PR will track a few minor improvements to the content and formatting of the graphx-programming-guide.
|
|/ / / |
|
|\ \ \
| | | |
| | | |
| | | | |
Add missing header files
|
|/ / / |
|
|\ \ \
| | |/
| |/|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Removed unnecessary DStream operations and updated docs
Removed StreamingContext.registerInputStream and registerOutputStream - they were useless. InputDStream has been made to register itself, and just registering a DStream as output stream cause RDD objects to be created but the RDDs will not be computed at all.. Also made DStream.register() private[streaming] for the same reasons.
Updated docs, specially added package documentation for streaming package.
Also, changed NetworkWordCount's input storage level to use MEMORY_ONLY, replication on the local machine causes warning messages (as replication fails) which is scary for a new user trying out his/her first example.
|
| | | |
|
| |\ \
| | | |
| | | |
| | | |
| | | | |
Conflicts:
streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala
|
| | | |
| | | |
| | | |
| | | | |
were useless as InputDStream has been made to register itself. Also made DStream.register() private[streaming] - not useful to expose the confusing function. Updated a lot of documentation.
|
| |\ \ \ |
|
|\ \ \ \ \
| |_|_|/ /
|/| | | |
| | | | | |
Enable compression by default for spills
|
|/ / / / |
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add Naive Bayes to Python MLlib, and some API fixes
- Added a Python wrapper for Naive Bayes
- Updated the Scala Naive Bayes to match the style of our other
algorithms better and in particular make it easier to call from Java
(added builder pattern, removed default value in train method)
- Updated Python MLlib functions to not require a SparkContext; we can
get that from the RDD the user gives
- Added a toString method in LabeledPoint
- Made the Python MLlib tests run as part of run-tests as well (before
they could only be run individually through each file)
|
| | | | | |
|
| | | | | |
|
| | | | | |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This helps in case the exception happened while serializing a record to
be sent to Java, leaving the stream to Java in an inconsistent state
where PythonRDD won't be able to read the error.
|
| | | | |
| | | | |
| | | | |
| | | | | |
Also fixes mains of a few other algorithms to print the final model
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
We've used camel case in other Spark methods so it felt reasonable to
keep using it here and make the code match Scala/Java as much as
possible. Note that parameter names matter in Python because it allows
passing optional parameters by name.
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
- Added a Python wrapper for Naive Bayes
- Updated the Scala Naive Bayes to match the style of our other
algorithms better and in particular make it easier to call from Java
(added builder pattern, removed default value in train method)
- Updated Python MLlib functions to not require a SparkContext; we can
get that from the RDD the user gives
- Added a toString method in LabeledPoint
- Made the Python MLlib tests run as part of run-tests as well (before
they could only be run individually through each file)
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
GraphX: Unifying Graphs and Tables
GraphX extends Spark's distributed fault-tolerant collections API and interactive console with a new graph API which leverages recent advances in graph systems (e.g., [GraphLab](http://graphlab.org)) to enable users to easily and interactively build, transform, and reason about graph structured data at scale. See http://amplab.github.io/graphx/.
Thanks to @jegonzal, @rxin, @ankurdave, @dcrankshaw, @jianpingjwang, @amatsukawa, @kellrott, and @adamnovak.
Tasks left:
- [x] Graph-level uncache
- [x] Uncache previous iterations in Pregel
- [x] ~~Uncache previous iterations in GraphLab~~ (postponed to post-release)
- [x] - Describe GC issue with GraphLab
- [ ] Write `docs/graphx-programming-guide.md`
- [x] - Mention future Bagel support in docs
- [ ] - Section on caching/uncaching in docs: As with Spark, cache something that is used more than once. In an iterative algorithm, try to cache and force (i.e., materialize) something every iteration, then uncache the cached things that depended on the newly materialized RDD but that won't be referenced again.
- [x] Undo modifications to core collections and instead copy them to org.apache.spark.graphx
- [x] Make Graph serializable to work around capture in Spark shell
- [x] Rename graph -> graphx in package name and subproject
- [x] Remove standalone PageRank
- [x] ~~Fix amplab/graphx#52 by checking `iter.hasNext`~~
|
| | | | | | |
|
| | | | | | |
|
| | | | | | |
|