aboutsummaryrefslogtreecommitdiff
path: root/project
diff options
context:
space:
mode:
authorSital Kedia <skedia@fb.com>2016-05-10 15:28:35 +0100
committerSean Owen <sowen@cloudera.com>2016-05-10 15:28:35 +0100
commita019e6efb71e4dce51ca91e41c3d293cf3a6ccb8 (patch)
treef1d3996e25faac01f7b78ac5ff526a589597913a /project
parent570647267055cbe33291232b375e08fa1f5d8e7a (diff)
downloadspark-a019e6efb71e4dce51ca91e41c3d293cf3a6ccb8.tar.gz
spark-a019e6efb71e4dce51ca91e41c3d293cf3a6ccb8.tar.bz2
spark-a019e6efb71e4dce51ca91e41c3d293cf3a6ccb8.zip
[SPARK-14542][CORE] PipeRDD should allow configurable buffer size for…
## What changes were proposed in this pull request? Currently PipedRDD internally uses PrintWriter to write data to the stdin of the piped process, which by default uses a BufferedWriter of buffer size 8k. In our experiment, we have seen that 8k buffer size is too small and the job spends significant amount of CPU time in system calls to copy the data. We should have a way to configure the buffer size for the writer. ## How was this patch tested? Ran PipedRDDSuite tests. Author: Sital Kedia <skedia@fb.com> Closes #12309 from sitalkedia/bufferedPipedRDD.
Diffstat (limited to 'project')
-rw-r--r--project/MimaExcludes.scala4
1 files changed, 4 insertions, 0 deletions
diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala
index a5d57e1b01..b0d862d006 100644
--- a/project/MimaExcludes.scala
+++ b/project/MimaExcludes.scala
@@ -686,6 +686,10 @@ object MimaExcludes {
ProblemFilters.exclude[IncompatibleMethTypeProblem](
"org.apache.spark.sql.DataFrameReader.this")
) ++ Seq(
+ // SPARK-14542 configurable buffer size for pipe RDD
+ ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.rdd.RDD.pipe"),
+ ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.api.java.JavaRDDLike.pipe")
+ ) ++ Seq(
// [SPARK-4452][Core]Shuffle data structures can starve others on the same thread for memory
ProblemFilters.exclude[IncompatibleTemplateDefProblem]("org.apache.spark.util.collection.Spillable")
) ++ Seq(