aboutsummaryrefslogtreecommitdiff
path: root/LICENSE
Commit message (Collapse)AuthorAgeFilesLines
* SPARK-3926 [CORE] Reopened: result of JavaRDD collectAsMap() is not serializableSean Owen2014-12-081-1/+2
| | | | | | | | | | | My original 'fix' didn't fix at all. Now, there's a unit test to check whether it works. Of the two options to really fix it -- copy the `Map` to a `java.util.HashMap`, or copy and modify Scala's implementation in `Wrappers.MapWrapper`, I went with the latter. Author: Sean Owen <sowen@cloudera.com> Closes #3587 from srowen/SPARK-3926 and squashes the following commits: 8586bb9 [Sean Owen] Remove unneeded no-arg constructor, and add additional note about copied code in LICENSE 7bb0e66 [Sean Owen] Make SerializableMapWrapper actually serialize, and add unit test
* [SPARK-4242] [Core] Add SASL to external shuffle serviceAaron Davidson2014-11-051-1/+20
| | | | | | | | | | | | | | Does three things: (1) Adds SASL to ExternalShuffleClient, (2) puts SecurityManager in BlockManager's constructor, and (3) adds unit test. Author: Aaron Davidson <aaron@databricks.com> Closes #3108 from aarondav/sasl-client and squashes the following commits: 48b622d [Aaron Davidson] Screw it, let's just get LimitedInputStream 3543b70 [Aaron Davidson] Back out of pom change due to unknown test issue? b58518a [Aaron Davidson] ByteStreams.limit() not available :( cbe451a [Aaron Davidson] Address comments 2bf2908 [Aaron Davidson] [SPARK-4242] [Core] Add SASL to external shuffle service
* SPARK-4022 [CORE] [MLLIB] Replace colt dependency (LGPL) with commons-mathSean Owen2014-10-271-12/+0
| | | | | | | | | | | | | | | This change replaces usages of colt with commons-math3 equivalents, and makes some minor necessary adjustments to related code and tests to match. Author: Sean Owen <sowen@cloudera.com> Closes #2928 from srowen/SPARK-4022 and squashes the following commits: 61a232f [Sean Owen] Fix failure due to different sampling in JavaAPISuite.sample() 16d66b8 [Sean Owen] Simplify seeding with call to reseedRandomGenerator a1a78e0 [Sean Owen] Use Well19937c 31c7641 [Sean Owen] Fix Python Poisson test by choosing a different seed; about 88% of seeds should work but 1 didn't, it seems 5c9c67f [Sean Owen] Additional test fixes from review d8f88e0 [Sean Owen] Replace colt with commons-math3. Some tests do not pass yet.
* [SPARK-3073] [PySpark] use external sort in sortBy() and sortByKey()Davies Liu2014-08-261-0/+283
| | | | | | | | | | | | | | | | | Using external sort to support sort large datasets in reduce stage. Author: Davies Liu <davies.liu@gmail.com> Closes #1978 from davies/sort and squashes the following commits: bbcd9ba [Davies Liu] check spilled bytes in tests b125d2f [Davies Liu] add test for external sort in rdd eae0176 [Davies Liu] choose different disks from different processes and instances 1f075ed [Davies Liu] Merge branch 'master' into sort eb53ca6 [Davies Liu] Merge branch 'master' into sort 644abaf [Davies Liu] add license in LICENSE 19f7873 [Davies Liu] improve tests 55602ee [Davies Liu] use external sort in sortBy() and sortByKey()
* SPARK-2414 [BUILD] Add LICENSE entry for jquerySean Owen2014-08-021-0/+1
| | | | | | | | | | The JIRA concerned removing jquery, and this does not remove jquery. While it is distributed by Spark it should have an accompanying line in LICENSE, very technically, as per http://www.apache.org/dev/licensing-howto.html Author: Sean Owen <srowen@gmail.com> Closes #1748 from srowen/SPARK-2414 and squashes the following commits: 2fdb03c [Sean Owen] Add LICENSE entry for jquery
* [SPARK-2305] [PySpark] Update Py4J to version 0.8.2.1Josh Rosen2014-07-291-2/+2
| | | | | | | | Author: Josh Rosen <joshrosen@apache.org> Closes #1626 from JoshRosen/SPARK-2305 and squashes the following commits: 03fb283 [Josh Rosen] Update Py4J to version 0.8.2.1.
* SPARK-2047: Introduce an in-mem Sorter, and use it to reduce mem usageAaron Davidson2014-07-221-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ### Why and what? Currently, the AppendOnlyMap performs an "in-place" sort by converting its array of [key, value, key, value] pairs into a an array of [(key, value), (key, value)] pairs. However, this causes us to allocate many Tuple2 objects, which come at a nontrivial overhead. This patch adds a Sorter API, intended for in memory sorts, which simply ports the Android Timsort implementation (available under Apache v2) and abstracts the interface in a way which introduces no more than 1 virtual function invocation of overhead at each abstraction point. Please compare our port of the Android Timsort sort with the original implementation: http://www.diffchecker.com/wiwrykcl ### Memory implications An AppendOnlyMap contains N kv pairs, which results in roughly 2N elements within its underlying array. Each of these elements is 4 bytes wide in a [compressed OOPS](https://wikis.oracle.com/display/HotSpotInternals/CompressedOops) system, which is the default. Today's approach immediately allocates N Tuple2 objects, which take up 24N bytes in total (exposed via YourKit), and undergoes a Java sort. The Java 6 version immediately copies the entire array (4N bytes here), while the Java 7 version has a worst-case allocation of half the array (2N bytes). This results in a worst-case sorting overhead of 24N + 2N = 26N bytes (for Java 7). The Sorter does not require allocating any tuples, but since it uses Timsort, it may copy up to half the entire array in the worst case. This results in a worst-case sorting overhead of 4N bytes. Thus, we have reduced the worst-case overhead of the sort by roughly 22 bytes times the number of elements. ### Performance implications As the destructiveSortedIterator is used for spilling in an ExternalAppendOnlyMap, the purpose of this patch is to provide stability by reducing memory usage rather than improve performance. However, because it implements Timsort, it also brings a substantial performance boost over our prior implementation. Here are the results of a microbenchmark that sorted 25 million, randomly distributed (Float, Int) pairs. The Java Arrays.sort() tests were run **only on the keys**, and thus moved less data. Our current implementation is called "Tuple-sort using Arrays.sort()" while the new implementation is "KV-array using Sorter". <table> <tr><th>Test</th><th>First run (JDK6)</th><th>Average of 10 (JDK6)</th><th>First run (JDK7)</th><th>Average of 10 (JDK7)</th></tr> <tr><td>primitive Arrays.sort()</td><td>3216 ms</td><td>1190 ms</td><td>2724 ms</td><td>131 ms (!!)</td></tr> <tr><td>Arrays.sort()</td><td>18564 ms</td><td>2006 ms</td><td>13201 ms</td><td>878 ms</td></tr> <tr><td>Tuple-sort using Arrays.sort()</td><td>31813 ms</td><td>3550 ms</td><td>20990 ms</td><td>1919 ms</td></tr> <tr><td><b>KV-array using Sorter</b></td><td></td><td></td><td><b>15020 ms</b></td><td><b>834 ms</b></td></tr> </table> The results show that this Sorter performs exactly as expected (after the first run) -- it is as fast as the Java 7 Arrays.sort() (which shares the same algorithm), but is significantly faster than the Tuple-sort on Java 6 or 7. In short, this patch should significantly improve performance for users running either Java 6 or 7. Author: Aaron Davidson <aaron@databricks.com> Closes #1502 from aarondav/sort and squashes the following commits: 652d936 [Aaron Davidson] Update license, move Sorter to java src a7b5b1c [Aaron Davidson] fix licenses 5c0efaf [Aaron Davidson] Update tmpLength ec395c8 [Aaron Davidson] Ignore benchmark (again) and fix docs 034bf10 [Aaron Davidson] Change to Apache v2 Timsort b97296c [Aaron Davidson] Don't try to run benchmark on Jenkins + private[spark] 6307338 [Aaron Davidson] SPARK-2047: Introduce an in-mem Sorter, and use it to reduce mem usage
* SPARK-1827. LICENSE and NOTICE files need a refresh to contain transitive ↵Sean Owen2014-05-141-0/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dependency info LICENSE and NOTICE policy is explained here: http://www.apache.org/dev/licensing-howto.html http://www.apache.org/legal/3party.html This leads to the following changes. First, this change enables two extensions to maven-shade-plugin in assembly/ that will try to include and merge all NOTICE and LICENSE files. This can't hurt. This generates a consolidated NOTICE file that I manually added to NOTICE. Next, a list of all dependencies and their licenses was generated: `mvn ... license:aggregate-add-third-party` to create: `target/generated-sources/license/THIRD-PARTY.txt` Each dependency is listed with one or more licenses. Determine the most-compatible license for each if there is more than one. For "unknown" license dependencies, I manually evaluateD their license. Many are actually Apache projects or components of projects covered already. The only non-trivial one was Colt, which has its own (compatible) license. I ignored Apache-licensed and public domain dependencies as these require no further action (beyond NOTICE above). BSD and MIT licenses (permissive Category A licenses) are evidently supposed to be mentioned in LICENSE, so I added a section without output from the THIRD-PARTY.txt file appropriately. Everything else, Category B licenses, are evidently mentioned in NOTICE (?) Same there. LICENSE contained some license statements for source code that is redistributed. I left this as I think that is the right place to put it. Author: Sean Owen <sowen@cloudera.com> Closes #770 from srowen/SPARK-1827 and squashes the following commits: a764504 [Sean Owen] Add LICENSE and NOTICE info for all transitive dependencies as of 1.0
* Merge the old sbt-launch-lib.bash with the new sbt-launcher jar downloading ↵Michael Armbrust2014-03-021-0/+32
| | | | | | | | | | | | logic. This allows developers to pass options (such as -D) to sbt. I also modified the SparkBuild to ensure spark specific properties are propagated to forked test JVMs. Author: Michael Armbrust <michael@databricks.com> Closes #14 from marmbrus/sbtScripts and squashes the following commits: c008b18 [Michael Armbrust] Merge the old sbt-launch-lib.bash with the new sbt-launcher jar downloading logic.
* Updated LICENSE with third-party licensesMatei Zaharia2013-09-021-0/+196
|
* Add Apache license headers and LICENSE and NOTICE filesMatei Zaharia2013-07-161-27/+202
|
* Added BSD licenseMatei Zaharia2010-12-071-0/+27