| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|\
| |
| |
| |
| |
| |
| |
| | |
SPARK-1008: Logging improvments
1. Adds a default log4j file that gets loaded if users haven't specified a log4j file.
2. Isolates use of the tools assembly jar. I found this produced SLF4J warnings
after building with SBT (and I've seen similar warnings on the mailing list).
|
| |\
| |/
|/|
| |
| | |
Conflicts:
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
|
|\ \
| | |
| | |
| | | |
restore core/pom.xml file modification
|
|/ / |
|
|\ \
| | |
| | |
| | |
| | |
| | | |
Approximate distinct count
Added countApproxDistinct() to RDD and countApproxDistinctByKey() to PairRDDFunctions to approximately count distinct number of elements and distinct number of values per key, respectively. Both functions use HyperLogLog from stream-lib for counting. Both functions take a parameter that controls the trade-off between accuracy and memory consumption. Also added Scala docs and test suites for both methods.
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| |\ \ |
|
| | | | |
|
| | | | |
|
| | | | |
|
| | | | |
|
| | | | |
|
| | | |
| | | |
| | | |
| | | | |
support.
|
| | | |
| | | |
| | | |
| | | | |
number of unique values for each key in the RDD.
|
| | | |
| | | |
| | | |
| | | | |
and returns the (approximate) number of distinct elements in the RDD.
|
| | | | |
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
upgrade Netty from 4.0.0.Beta2 to 4.0.13.Final
the changes are listed at https://github.com/netty/netty/wiki/New-and-noteworthy
|
| | | | |
| | | | |
| | | | |
| | | | | |
Also clean up a bit.
|
| | | | | |
|
| | | | | |
|
| | | | | |
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Bug fixes for file input stream and checkpointing
- Fixed bugs in the file input stream that led the stream to fail due to transient HDFS errors (listing files when a background thread it deleting fails caused errors, etc.)
- Updated Spark's CheckpointRDD and Streaming's CheckpointWriter to use SparkContext.hadoopConfiguration, to allow checkpoints to be written to any HDFS compatible store requiring special configuration.
- Changed the API of SparkContext.setCheckpointDir() - eliminated the unnecessary 'useExisting' parameter. Now SparkContext will always create a unique subdirectory within the user specified checkpoint directory. This is to ensure that previous checkpoint files are not accidentally overwritten.
- Fixed bug where setting checkpoint directory as a relative local path caused the checkpointing to fail.
|
| | | | | | |
|
| | | | | | |
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
0 partitions).
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
(FileNotFound exception is still caught and ignored).
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
due to failures due FileNotFound exceptions.
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Reynold's comments on PR 289.
|
| |\ \ \ \ \
| | | |_|/ /
| | |/| | | |
|
| | | | | | |
|
| | | | | | |
|
| |\ \ \ \ \ |
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
correctly.
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
FileSystem objects. Refactored JobGenerator to use actor so that all updating of DStream's metadata is single threaded.
|
| |\ \ \ \ \ \ |
|
| |\ \ \ \ \ \ \
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Conflicts:
core/src/main/scala/org/apache/spark/rdd/CheckpointRDD.scala
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala
streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
- Made file stream more robust to transient failures.
- Changed Spark.setCheckpointDir API to not have the second
'useExisting' parameter. Spark will always create a unique directory
for checkpointing underneath the directory provide to the funtion.
- Fixed bug wrt local relative paths as checkpoint directory.
- Made DStream and RDD checkpointing use
SparkContext.hadoopConfiguration, so that more HDFS compatible
filesystems are supported for checkpointing.
|
|\ \ \ \ \ \ \ \ \
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Changed naming of StageCompleted event to be consistent
The rest of the SparkListener events are named with "SparkListener"
as the prefix of the name; this commit renames the StageCompleted
event to SparkListenerStageCompleted for consistency.
|
| | | | | | | | | | |
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
The rest of the SparkListener events are named with "SparkListener"
as the prefix of the name; this commit renames the StageCompleted
event to SparkListenerStageCompleted for consistency.
|
| | | | | | | | | | |
|
| | | | | | | | | | |
|
| | | | | | | | | | |
|