diff options
author | Tarek Auel <tarek.auel@gmail.com> | 2015-06-29 11:57:19 -0700 |
---|---|---|
committer | Davies Liu <davies@databricks.com> | 2015-06-29 11:57:19 -0700 |
commit | a5c2961caaafd751f11bdd406bb6885443d7572e (patch) | |
tree | 8cdb6288d459f82e155e4510baa0a2523a76b6ad /python | |
parent | 3664ee25f0a67de5ba76e9487a55a55216ae589f (diff) | |
download | spark-a5c2961caaafd751f11bdd406bb6885443d7572e.tar.gz spark-a5c2961caaafd751f11bdd406bb6885443d7572e.tar.bz2 spark-a5c2961caaafd751f11bdd406bb6885443d7572e.zip |
[SPARK-8235] [SQL] misc function sha / sha1
Jira: https://issues.apache.org/jira/browse/SPARK-8235
I added the support for sha1. If I understood rxin correctly, sha and sha1 should execute the same algorithm, shouldn't they?
Please take a close look on the Python part. This is adopted from #6934
Author: Tarek Auel <tarek.auel@gmail.com>
Author: Tarek Auel <tarek.auel@googlemail.com>
Closes #6963 from tarekauel/SPARK-8235 and squashes the following commits:
f064563 [Tarek Auel] change to shaHex
7ce3cdc [Tarek Auel] rely on automatic cast
a1251d6 [Tarek Auel] Merge remote-tracking branch 'upstream/master' into SPARK-8235
68eb043 [Tarek Auel] added docstring
be5aff1 [Tarek Auel] improved error message
7336c96 [Tarek Auel] added type check
cf23a80 [Tarek Auel] simplified example
ebf75ef [Tarek Auel] [SPARK-8301] updated the python documentation. Removed sha in python and scala
6d6ff0d [Tarek Auel] [SPARK-8233] added docstring
ea191a9 [Tarek Auel] [SPARK-8233] fixed signatureof python function. Added expected type to misc
e3fd7c3 [Tarek Auel] SPARK[8235] added sha to the list of __all__
e5dad4e [Tarek Auel] SPARK[8235] sha / sha1
Diffstat (limited to 'python')
-rw-r--r-- | python/pyspark/sql/functions.py | 14 |
1 files changed, 14 insertions, 0 deletions
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py index 7d3d036161..45ecd826bd 100644 --- a/python/pyspark/sql/functions.py +++ b/python/pyspark/sql/functions.py @@ -42,6 +42,7 @@ __all__ = [ 'monotonicallyIncreasingId', 'rand', 'randn', + 'sha1', 'sha2', 'sparkPartitionId', 'struct', @@ -382,6 +383,19 @@ def sha2(col, numBits): return Column(jc) +@ignore_unicode_prefix +@since(1.5) +def sha1(col): + """Returns the hex string result of SHA-1. + + >>> sqlContext.createDataFrame([('ABC',)], ['a']).select(sha1('a').alias('hash')).collect() + [Row(hash=u'3c01bdbb26f358bab27f267924aa2c9a03fcfdb8')] + """ + sc = SparkContext._active_spark_context + jc = sc._jvm.functions.sha1(_to_java_column(col)) + return Column(jc) + + @since(1.4) def sparkPartitionId(): """A column for partition ID of the Spark task. |