From b0f2fb5b9729b38744bf784f2072f5ee52314f87 Mon Sep 17 00:00:00 2001 From: Dongjoon Hyun Date: Mon, 20 Jun 2016 13:41:03 -0700 Subject: [SPARK-16053][R] Add `spark_partition_id` in SparkR ## What changes were proposed in this pull request? This PR adds `spark_partition_id` virtual column function in SparkR for API parity. The following is just an example to illustrate a SparkR usage on a partitioned parquet table created by `spark.range(10).write.mode("overwrite").parquet("/tmp/t1")`. ```r > collect(select(read.parquet('/tmp/t1'), c('id', spark_partition_id()))) id SPARK_PARTITION_ID() 1 3 0 2 4 0 3 8 1 4 9 1 5 0 2 6 1 3 7 2 4 8 5 5 9 6 6 10 7 7 ``` ## How was this patch tested? Pass the Jenkins tests (including new testcase). Author: Dongjoon Hyun Closes #13768 from dongjoon-hyun/SPARK-16053. --- R/pkg/NAMESPACE | 1 + 1 file changed, 1 insertion(+) (limited to 'R/pkg/NAMESPACE') diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE index aaeab665a4..45663f4c2c 100644 --- a/R/pkg/NAMESPACE +++ b/R/pkg/NAMESPACE @@ -260,6 +260,7 @@ exportMethods("%in%", "skewness", "sort_array", "soundex", + "spark_partition_id", "stddev", "stddev_pop", "stddev_samp", -- cgit v1.2.3