From d642b273544bb77ef7f584326aa2d214649ac61b Mon Sep 17 00:00:00 2001
From: Daoyuan Wang <daoyuan.wang@intel.com>
Date: Mon, 23 May 2016 23:29:15 -0700
Subject: [SPARK-15397][SQL] fix string udf locate as hive

## What changes were proposed in this pull request?

in hive, `locate("aa", "aaa", 0)` would yield 0, `locate("aa", "aaa", 1)` would yield 1 and `locate("aa", "aaa", 2)` would yield 2, while in Spark, `locate("aa", "aaa", 0)` would yield 1,  `locate("aa", "aaa", 1)` would yield 2 and  `locate("aa", "aaa", 2)` would yield 0. This results from the different understanding of the third parameter in udf `locate`. It means the starting index and starts from 1, so when we use 0, the return would always be 0.

## How was this patch tested?

tested with modified `StringExpressionsSuite` and `StringFunctionsSuite`

Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes #13186 from adrian-wang/locate.
---
 python/pyspark/sql/functions.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'python')

diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index 1f15eec645..64b8bc442d 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -1359,7 +1359,7 @@ def levenshtein(left, right):
 
 
 @since(1.5)
-def locate(substr, str, pos=0):
+def locate(substr, str, pos=1):
     """
     Locate the position of the first occurrence of substr in a string column, after position pos.
 
-- 
cgit v1.2.3