aboutsummaryrefslogtreecommitdiff
path: root/python
diff options
context:
space:
mode:
authorTarek Auel <tarek.auel@googlemail.com>2015-07-04 01:10:52 -0700
committerReynold Xin <rxin@databricks.com>2015-07-04 01:10:52 -0700
commit6b3574e68704d58ba41efe0ea4fe928cc166afcd (patch)
treec8dc9f32d4081d94063df0d7cf6665d99e797641 /python
parentf35b0c3436898f22860d2c6c1d12f3a661005201 (diff)
downloadspark-6b3574e68704d58ba41efe0ea4fe928cc166afcd.tar.gz
spark-6b3574e68704d58ba41efe0ea4fe928cc166afcd.tar.bz2
spark-6b3574e68704d58ba41efe0ea4fe928cc166afcd.zip
[SPARK-8270][SQL] levenshtein distance
Jira: https://issues.apache.org/jira/browse/SPARK-8270 Info: I can not build the latest master, it stucks during the build process: `[INFO] Dependency-reduced POM written at: /Users/tarek/test/spark/bagel/dependency-reduced-pom.xml` Author: Tarek Auel <tarek.auel@googlemail.com> Closes #7214 from tarekauel/SPARK-8270 and squashes the following commits: ab348b9 [Tarek Auel] Merge branch 'master' into SPARK-8270 a2ad318 [Tarek Auel] [SPARK-8270] changed order of fields d91b12c [Tarek Auel] [SPARK-8270] python fix adbd075 [Tarek Auel] [SPARK-8270] fixed typo 23185c9 [Tarek Auel] [SPARK-8270] levenshtein distance
Diffstat (limited to 'python')
-rw-r--r--python/pyspark/sql/functions.py14
1 files changed, 14 insertions, 0 deletions
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index 69e563ef36..49dd0332af 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -325,6 +325,20 @@ def explode(col):
@ignore_unicode_prefix
@since(1.5)
+def levenshtein(left, right):
+ """Computes the Levenshtein distance of the two given strings.
+
+ >>> df0 = sqlContext.createDataFrame([('kitten', 'sitting',)], ['l', 'r'])
+ >>> df0.select(levenshtein('l', 'r').alias('d')).collect()
+ [Row(d=3)]
+ """
+ sc = SparkContext._active_spark_context
+ jc = sc._jvm.functions.levenshtein(_to_java_column(left), _to_java_column(right))
+ return Column(jc)
+
+
+@ignore_unicode_prefix
+@since(1.5)
def md5(col):
"""Calculates the MD5 digest and returns the value as a 32 character hex string.