diff options
author | Sandy Ryza <sandy@cloudera.com> | 2015-02-05 10:15:55 -0800 |
---|---|---|
committer | Josh Rosen <joshrosen@databricks.com> | 2015-02-05 10:16:17 -0800 |
commit | c22ccc07c50d4aaead98918dbb7c98dd520cdc6a (patch) | |
tree | a729d63480694ec2dac9e1b427f819545f5c6108 /docs/index.md | |
parent | 40746749a6670f43aece69fe1482e92fa87decf5 (diff) | |
download | spark-c22ccc07c50d4aaead98918dbb7c98dd520cdc6a.tar.gz spark-c22ccc07c50d4aaead98918dbb7c98dd520cdc6a.tar.bz2 spark-c22ccc07c50d4aaead98918dbb7c98dd520cdc6a.zip |
SPARK-4687. Add a recursive option to the addFile API
This adds a recursive option to the addFile API to satisfy Hive's needs. It only allows specifying HDFS dirs that will be copied down on every executor.
There are a couple outstanding questions.
* Should we allow specifying local dirs as well? The best way to do this would probably be to archive them. The drawback is that it would require a fair bit of code that I don't know of any current use cases for.
* The addFiles implementation has a caching component that I don't entirely understand. What events are we caching between? AFAICT it's users calling addFile on the same file in the same app at different times? Do we want/need to add something similar for addDirectory.
* The addFiles implementation will check to see if an added file already exists and has the same contents. I imagine we want the same behavior, so planning to add this unless people think otherwise.
I plan to add some tests if people are OK with the approach.
Author: Sandy Ryza <sandy@cloudera.com>
Closes #3670 from sryza/sandy-spark-4687 and squashes the following commits:
f9fc77f [Sandy Ryza] Josh's comments
70cd24d [Sandy Ryza] Add another test
13da824 [Sandy Ryza] Revert executor changes
38bf94d [Sandy Ryza] Marcelo's comments
ca83849 [Sandy Ryza] Add addFile test
1941be3 [Sandy Ryza] Fix test and avoid HTTP server in local mode
31f15a9 [Sandy Ryza] Use cache recursively and fix some compile errors
0239c3d [Sandy Ryza] Change addDirectory to addFile with recursive
46fe70a [Sandy Ryza] SPARK-4687. Add a addDirectory API
(cherry picked from commit c4b1108c3f9658adebbdf8508d325528c3206f16)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Diffstat (limited to 'docs/index.md')
0 files changed, 0 insertions, 0 deletions