| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add Naive Bayes to Python MLlib, and some API fixes
- Added a Python wrapper for Naive Bayes
- Updated the Scala Naive Bayes to match the style of our other
algorithms better and in particular make it easier to call from Java
(added builder pattern, removed default value in train method)
- Updated Python MLlib functions to not require a SparkContext; we can
get that from the RDD the user gives
- Added a toString method in LabeledPoint
- Made the Python MLlib tests run as part of run-tests as well (before
they could only be run individually through each file)
|
| |
| |
| |
| | |
Also fixes mains of a few other algorithms to print the final model
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
- Added a Python wrapper for Naive Bayes
- Updated the Scala Naive Bayes to match the style of our other
algorithms better and in particular make it easier to call from Java
(added builder pattern, removed default value in train method)
- Updated Python MLlib functions to not require a SparkContext; we can
get that from the RDD the user gives
- Added a toString method in LabeledPoint
- Made the Python MLlib tests run as part of run-tests as well (before
they could only be run individually through each file)
|
|\| |
|
| | |
|
|/
|
|
|
|
| |
-) Only change simple return statements at the end of method
-) Ignore the complex if-else check
-) Ignore the ones inside synchronized
|
|\ |
|
| | |
|
| | |
|
| |\
| | |
| | |
| | |
| | |
| | | |
standard Naive Bayes classifier
Has implemented the standard Naive Bayes classifier. This is an updated version of #288, which is closed because of misoperations.
|
| | | |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
* Arguments renamed according to Ameet's suggestion
* Using DoubleMatrix instead of Array[Double] in computation
* Removed arguments C (kinds of label) and D (dimension of feature vector) from NaiveBayes.train()
* Replaced reduceByKey with foldByKey to avoid modifying original input data
|
| | | |
|
| | | |
|
| | | |
|
| | |
| | |
| | |
| | | |
Also refactored some heavy FP code to improve readability and reduce memory footprint.
|
| | |
| | |
| | |
| | |
| | | |
* Minimized shuffle output with mapPartitions.
* Reduced RDD actions from 3 to 1.
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
|/ / |
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Conflicts:
core/src/main/scala/org/apache/spark/SparkContext.scala
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
core/src/main/scala/org/apache/spark/scheduler/local/LocalScheduler.scala
core/src/main/scala/org/apache/spark/util/MetadataCleaner.scala
core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
new-yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala
streaming/src/test/scala/org/apache/spark/streaming/WindowOperationsSuite.scala
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| |/ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
- Got rid of global SparkContext.globalConf
- Pass SparkConf to serializers and compression codecs
- Made SparkConf public instead of private[spark]
- Improved API of SparkContext and SparkConf
- Switched executor environment vars to be passed through SparkConf
- Fixed some places that were still using system properties
- Fixed some tests, though others are still failing
This still fails several tests in core, repl and streaming, likely due
to properties not being set or cleared correctly (some of the tests run
fine in isolation).
|
|/ |
|
| |
|
| |
|
|
|
|
| |
build/dep-resolution atleast a bit faster.
|
|\
| |
| |
| |
| |
| | |
Conflicts:
core/src/main/scala/org/apache/spark/rdd/RDD.scala
project/SparkBuild.scala
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
To run unit test, start SBT console and type:
compile
test-only org.apache.spark.util.XORShiftRandomSuite
To run benchmark, type:
project core
console
Once the Scala console starts, type:
org.apache.spark.util.XORShiftRandom.benchmark(100000000)
|
|\| |
|
| |\ |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|