aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | Minor changes after auditing diff from earlier versionPatrick Wendell2014-01-233-7/+1
| | | | | |
| * | | | | Response to Matei's reviewPatrick Wendell2014-01-232-21/+22
| | | | | |
| * | | | | Remove Hadoop object cloning and warn users making Hadoop RDD's.Patrick Wendell2014-01-235-221/+134
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code introduced in #359 used Hadoop's WritableUtils.clone() to duplicate objects when reading from Hadoop files. Some users have reported exceptions when cloning data in verious file formats, including Avro and another custom format. This patch removes that functionality to ensure stability for the 0.9 release. Instead, it puts a clear warning in the documentation that copying may be necessary for Hadoop data sets.
* | | | | | Merge pull request #501 from JoshRosen/cartesian-rdd-fixesPatrick Wendell2014-01-233-22/+56
|\ \ \ \ \ \ | | |_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix two bugs in PySpark cartesian(): SPARK-978 and SPARK-1034 This pull request fixes two bugs in PySpark's `cartesian()` method: - [SPARK-978](https://spark-project.atlassian.net/browse/SPARK-978): PySpark's cartesian method throws ClassCastException exception - [SPARK-1034](https://spark-project.atlassian.net/browse/SPARK-1034): Py4JException on PySpark Cartesian Result The JIRAs have more details describing the fixes.
| * | | | | Fix SPARK-978: ClassCastException in PySpark cartesian.Josh Rosen2014-01-232-20/+48
| | | | | |
| * | | | | Fix SPARK-1034: Py4JException on PySpark Cartesian ResultJosh Rosen2014-01-232-2/+8
| | | | | |
* | | | | | Merge pull request #406 from eklavya/masterJosh Rosen2014-01-231-1/+39
|\ \ \ \ \ \ | |/ / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extending Java API coverage Hi, I have added three new methods to JavaRDD. Please review and merge.
| * | | | | fixed ClassTag in mapPartitionseklavya2014-01-231-8/+9
| | | | | |
| * | | | | Modifications as suggested in PR feedback-Saurabh Rawat2014-01-142-8/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - more variants of mapPartitions added to JavaRDDLike - move setGenerator to JavaRDDLike - clean up
| * | | | | Modifications as suggested in PR feedback-Saurabh Rawat2014-01-132-17/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - mapPartitions, foreachPartition moved to JavaRDDLike - call scala rdd's setGenerator instead of setting directly in JavaRDD
| * | | | | Remove default param from mapPartitionseklavya2014-01-131-1/+1
| | | | | |
| * | | | | Remove classtag from mapPartitions.eklavya2014-01-131-1/+1
| | | | | |
| * | | | | Added foreachPartition method to JavaRDD.eklavya2014-01-131-1/+8
| | | | | |
| * | | | | Added mapPartitions method to JavaRDD.eklavya2014-01-131-1/+12
| | | | | |
| * | | | | Added setter method setGenerator to JavaRDD.eklavya2014-01-131-0/+5
| | | | | |
* | | | | | Merge pull request #499 from jianpingjwang/dev1Reynold Xin2014-01-233-37/+40
|\ \ \ \ \ \ | |_|/ / / / |/| | | | | | | | | | | Replace commons-math with jblas in SVDPlusPlus
| * | | | | Add jblas dependencyJianping J Wang2014-01-231-1/+1
| | | | | |
| * | | | | Add jblas dependencyJianping J Wang2014-01-231-4/+3
| | | | | |
| * | | | | Replace commons-math with jblasJianping J Wang2014-01-231-32/+36
| | | | | |
* | | | | | Merge pull request #496 from pwendell/masterPatrick Wendell2014-01-221-1/+1
|\ \ \ \ \ \ | | |_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix bug in worker clean-up in UI Introduced in d5a96fec (/cc @aarondav). This should be picked into 0.8 and 0.9 as well. The bug causes old (zombie) workers on a node to not disappear immediately from the UI when a new one registers.
| * | | | | Fix bug in worker clean-up in UIPatrick Wendell2014-01-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Introduced in d5a96fec. This should be picked into 0.8 and 0.9 as well.
* | | | | | Merge pull request #447 from CodingCat/SPARK-1027Patrick Wendell2014-01-228-27/+37
|\ \ \ \ \ \ | |_|/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fix for SPARK-1027 fix for SPARK-1027 (https://spark-project.atlassian.net/browse/SPARK-1027) FIXES 1. change sparkhome from String to Option(String) in ApplicationDesc 2. remove sparkhome parameter in LaunchExecutor message 3. adjust involved files
| * | | | | refactor sparkHome to valCodingCat2014-01-221-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | clean code
| * | | | | fix for SPARK-1027CodingCat2014-01-208-17/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | change TestClient & Worker to Some("xxx") kill manager if it is started remove unnecessary .get when fetch "SPARK_HOME" values
| * | | | | executor creation failed should not make the worker restartCodingCat2014-01-201-12/+20
| | | | | |
* | | | | | Merge pull request #495 from srowen/GraphXCommonsMathDependencyPatrick Wendell2014-01-223-2/+10
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix graphx Commons Math dependency `graphx` depends on Commons Math (2.x) in `SVDPlusPlus.scala`. However the module doesn't declare this dependency. It happens to work because it is included by Hadoop artifacts. But, I can tell you this isn't true as of a month or so ago. Building versus recent Hadoop would fail. (That's how we noticed.) The simple fix is to declare the dependency, as it should be. But it's also worth noting that `commons-math` is the old-ish 2.x line, while `commons-math3` is where newer 3.x releases are. Drop-in replacement, but different artifact and package name. Changing this only usage to `commons-math3` works, tests pass, and isn't surprising that it does, so is probably also worth changing. (A comment in some test code also references `commons-math3`, FWIW.) It does raise another question though: `mllib` looks like it uses the `jblas` `DoubleMatrix` for general purpose vector/matrix stuff. Should `graphx` really use Commons Math for this? Beyond the tiny scope here but worth asking.
| * | | | | | Also add graphx commons-math3 dependeny in sbt buildSean Owen2014-01-221-1/+4
| | | | | | |
| * | | | | | Depend on Commons Math explicitly instead of accidentally getting it from ↵Sean Owen2014-01-222-1/+6
| | |_|_|/ / | |/| | | | | | | | | | | | | | | | Hadoop (which stops working in 2.2.x) and also use the newer commons-math3
* | | | | | Merge pull request #492 from skicavs/masterPatrick Wendell2014-01-221-2/+2
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | fixed job name and usage information for the JavaSparkPi example
| * | | | | | fixed job name and usage information for the JavaSparkPi exampleKevin Mader2014-01-221-2/+2
| | | | | | |
* | | | | | | Merge pull request #478 from sryza/sandy-spark-1033Patrick Wendell2014-01-222-4/+4
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SPARK-1033. Ask for cores in Yarn container requests Tested on a pseudo-distributed cluster against the Fair Scheduler and observed a worker taking more than a single core.
| * | | | | | | Incorporate Tom's comments - update doc and code to reflect that core ↵Sandy Ryza2014-01-212-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | requests may not always be honored
| * | | | | | | SPARK-1033. Ask for cores in Yarn container requestsSandy Ryza2014-01-202-5/+6
| | |_|/ / / / | |/| | | | |
* | | | | | | Merge pull request #493 from kayousterhout/double_addMatei Zaharia2014-01-221-1/+1
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixed bug where task set managers are added to queue twice @mateiz can you verify that this is a bug and wasn't intentional? (https://github.com/apache/incubator-spark/commit/90a04dab8d9a2a9a372cea7cdf46cc0fd0f2f76c#diff-7fa4f84a961750c374f2120ca70e96edR551) This bug leads to a small performance hit because task set managers will get offered each rejected resource offer twice, but doesn't lead to any incorrect functionality. Thanks to @hdc1112 for pointing this out.
| * | | | | | | Fixed bug where task set managers are added to queue twiceKay Ousterhout2014-01-221-1/+1
| | |_|/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This bug leads to a small performance hit because task set managers will get offered each rejected resource offer twice, but doesn't lead to any incorrect functionality.
* | | | | | | Merge pull request #315 from rezazadeh/sparsesvdMatei Zaharia2014-01-227-0/+543
|\ \ \ \ \ \ \ | |/ / / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sparse SVD # Singular Value Decomposition Given an *m x n* matrix *A*, compute matrices *U, S, V* such that *A = U * S * V^T* There is no restriction on m, but we require n^2 doubles to fit in memory. Further, n should be less than m. The decomposition is computed by first computing *A^TA = V S^2 V^T*, computing svd locally on that (since n x n is small), from which we recover S and V. Then we compute U via easy matrix multiplication as *U = A * V * S^-1* Only singular vectors associated with the largest k singular values If there are k such values, then the dimensions of the return will be: * *S* is *k x k* and diagonal, holding the singular values on diagonal. * *U* is *m x k* and satisfies U^T*U = eye(k). * *V* is *n x k* and satisfies V^TV = eye(k). All input and output is expected in sparse matrix format, 0-indexed as tuples of the form ((i,j),value) all in RDDs. # Testing Tests included. They test: - Decomposition promise (A = USV^T) - For small matrices, output is compared to that of jblas - Rank 1 matrix test included - Full Rank matrix test included - Middle-rank matrix forced via k included # Example Usage import org.apache.spark.SparkContext import org.apache.spark.mllib.linalg.SVD import org.apache.spark.mllib.linalg.SparseMatrix import org.apache.spark.mllib.linalg.MatrixyEntry // Load and parse the data file val data = sc.textFile("mllib/data/als/test.data").map { line => val parts = line.split(',') MatrixEntry(parts(0).toInt, parts(1).toInt, parts(2).toDouble) } val m = 4 val n = 4 // recover top 1 singular vector val decomposed = SVD.sparseSVD(SparseMatrix(data, m, n), 1) println("singular values = " + decomposed.S.data.toArray.mkString) # Documentation Added to docs/mllib-guide.md
| * | | | | | rename to MatrixSVDReza Zadeh2014-01-171-2/+2
| | | | | | |
| * | | | | | rename to MatrixSVDReza Zadeh2014-01-172-4/+4
| | | | | | |
| * | | | | | Merge remote-tracking branch 'upstream/master' into sparsesvdReza Zadeh2014-01-17146-1799/+2613
| |\ \ \ \ \ \
| * | | | | | | make example 0-indexedReza Zadeh2014-01-171-1/+1
| | | | | | | |
| * | | | | | | 0index docsReza Zadeh2014-01-171-1/+1
| | | | | | | |
| * | | | | | | prettifyReza Zadeh2014-01-171-2/+2
| | | | | | | |
| * | | | | | | add rename computeSVDReza Zadeh2014-01-171-1/+1
| | | | | | | |
| * | | | | | | replace this.type with SVDReza Zadeh2014-01-171-1/+1
| | | | | | | |
| * | | | | | | use 0-indexingReza Zadeh2014-01-175-14/+14
| | | | | | | |
| * | | | | | | changes from PRReza Zadeh2014-01-172-3/+4
| | | | | | | |
| * | | | | | | Merge remote-tracking branch 'upstream/master' into sparsesvdReza Zadeh2014-01-13252-903/+8912
| |\ \ \ \ \ \ \
| * \ \ \ \ \ \ \ Merge remote-tracking branch 'upstream/master' into sparsesvdReza Zadeh2014-01-1173-455/+2096
| |\ \ \ \ \ \ \ \
| * | | | | | | | | add dimension parameters to exampleReza Zadeh2014-01-101-5/+5
| | | | | | | | | |
| * | | | | | | | | Merge remote-tracking branch 'upstream/master' into sparsesvdReza Zadeh2014-01-09304-3293/+6140
| |\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: docs/mllib-guide.md