summaryrefslogtreecommitdiff
path: root/readme.md
diff options
context:
space:
mode:
authorRocky Madden <git@rockymadden.com>2013-12-29 15:12:48 -0700
committerRocky Madden <git@rockymadden.com>2013-12-29 15:12:48 -0700
commit84863d35f7e04bd54818d8efeba1f1d67b3097a4 (patch)
tree006184847228733b8098ce4ec19a70a6d23fa709 /readme.md
parent00761f46a32839a67e8ccfd5c702979099a6dd05 (diff)
downloadstringmetric-84863d35f7e04bd54818d8efeba1f1d67b3097a4.tar.gz
stringmetric-84863d35f7e04bd54818d8efeba1f1d67b3097a4.tar.bz2
stringmetric-84863d35f7e04bd54818d8efeba1f1d67b3097a4.zip
Major revamp to match new codebase.
Diffstat (limited to 'readme.md')
-rwxr-xr-xreadme.md60
1 files changed, 41 insertions, 19 deletions
diff --git a/readme.md b/readme.md
index 3375603..325eb02 100755
--- a/readme.md
+++ b/readme.md
@@ -63,13 +63,14 @@ __Maven:__
## Similarity package
Useful for approximate string matching and measurement of string distance. Most metrics calculate the similarity of two strings as a double with a value between 0 and 1. A value of 0 being completely different and a value of 1 being completely similar.
+---
__Dice / Sorensen Metric:__
```scala
DiceSorensenMetric(1).compare("night", "nacht") // 0.6
DiceSorensenMetric(1).compare("context", "contact") // 0.7142857142857143
```
-<sup>(Note you must specify the size of the n-gram you wish to use.)</sup>
+<sup>Note you must specify the size of the n-gram you wish to use.</sup>
---
@@ -78,7 +79,7 @@ __Hamming Metric:__
HammingMetric.compare("toned", "roses") // 3
HammingMetric.compare("1011101", "1001001") // 2
```
-<sup>(Note the exception of integers, rather than doubles, being returned.)</sup>
+<sup>Note the exception of integers, rather than doubles, being returned.</sup>
---
@@ -88,7 +89,7 @@ __Jaccard Metric:__
JaccardMetric(1).compare("night", "nacht") // 0.3
JaccardMetric(1).compare("context", "contact") // 0.35714285714285715
```
-<sup>(Note you must specify the size of the n-gram you wish to use.)</sup>
+<sup>Note you must specify the size of the n-gram you wish to use.</sup>
---
@@ -116,7 +117,7 @@ __Levenshtein Metric:__
LevenshteinMetric.compare("sitting", "kitten") // 3
LevenshteinMetric.compare("cake", "drake") // 2
```
-<sup>(Note the exception of integers, rather than doubles, being returned.)</sup>
+<sup>Note the exception of integers, rather than doubles, being returned.</sup>
---
@@ -127,7 +128,7 @@ NGramMetric(1).compare("night", "nacht") // 0.6
NGramMetric(2).compare("night", "nacht") // 0.25
NGramMetric(2).compare("context", "contact") // 0.5
```
-<sup>(Note you must specify the size of the n-gram you wish to use.)</sup>
+<sup>Note you must specify the size of the n-gram you wish to use.</sup>
---
@@ -136,7 +137,7 @@ __Overlap Metric:__
OverlapMetric(1).compare("night", "nacht") // 0.6
OverlapMetric(1).compare("context", "contact") // 0.7142857142857143
```
-<sup>(Note you must specify the size of the n-gram you wish to use.)</sup>
+<sup>Note you must specify the size of the n-gram you wish to use.</sup>
---
@@ -154,13 +155,15 @@ WeightedLevenshteinMetric(10, 0.1, 1).compare("book", "back") // 2
WeightedLevenshteinMetric(10, 0.1, 1).compare("hosp", "hospital") // 0.4
WeightedLevenshteinMetric(10, 0.1, 1).compare("hospital", "hosp") // 40
```
-<sup>(Note you must specify the weight of each operation. Delete, insert, and then substitute. Note that while a double is returned, it can be outside the range of 0 to 1, based upon the weights used.)</sup>
+<sup>Note you must specify the weight of each operation. Delete, insert, and then substitute. Note that while a double is returned, it can be outside the range of 0 to 1, based upon the weights used.</sup>
---
## Phonetic package
Useful for indexing by word pronunciation and performing sounds-like comparisons. All metrics return a boolean value indicating if the two strings sound the same, per the algorithm used. All metrics have an algorithm counterpart which provide the means to perform indexing by word pronunciation.
+---
+
__Metaphone Metric:__
```scala
MetaphoneMetric.compare("merci", "mercy") // true
@@ -241,45 +244,64 @@ SoundexAlgorithm.compute("lukasiewicz") // l222
---
## Decorating
-It is possible to decorate algorithms and metrics with additional functionality. The most common decorations are filters, which are useful for filtering strings prior to evaluation (e.g. ignore case, ignore non-alpha, ignore spaces).
+It is possible to decorate algorithms and metrics with additional functionality. This is provided by rich wrapping via implicits, and [StringAlgorithmDecorator](https://github.com/rockymadden/stringmetric/blob/master/core/source/main/scala/com/rockymadden/stringmetric/Algorithm.scala)/[StringMetricDecorator](https://github.com/rockymadden/stringmetric/blob/master/core/source/main/scala/com/rockymadden/stringmetric/Metric.scala). A handful of pre-built transforms are located in the [transform module](https://github.com/rockymadden/stringmetric/blob/master/core/source/main/scala/com/rockymadden/stringmetric/Transform.scala).
+
+---
+
+Non-decorated usage:
+```scala
+MetaphoneAlgorithm.compute("abc123")
+MetaphoneMetric.compare("abc123", "abc456")
+```
+
+---
-Basic examples with no filtering:
+Single filter, so that we only examine alphabetical characters:
```scala
-JaroWinklerMetric.compare("string1", "string2")
-JaroWinklerMetric().compare("string1", "string2")
-(new JaroWinklerMetric).compare("string1", "string2")
+(MetaphoneAlgorithm withTransform StringTransform.filterAlpha).compute("abc123")
+(MetaphoneMetric withTransform StringTransform.filterAlpha).compare("abc123", "abc456")
```
---
-Basic example with single filter:
+Functionally composed filter, so that we only examine uppercase characters:
```scala
-(new JaroWinklerMetric with IgnoreAsciiLetterCaseFilter).compare("string1", "string2")
+val composedTransform = (StringTransform.filterAlpha andThen StringTransform.filterUpperCase)
+
+(MetaphoneAlgorithm withTransform composedTransform).compute("abc123")
+(MetaphoneMetric withTransform composedTransform).compare("abc123", "abc456")
```
---
-Basic example with stacked filters. Filters are applied in reverse order:
+Make your own:
```scala
-(new JaroWinklerMetric with IgnoreAsciiLetterCaseFilter with AsciiLetterOnlyFilter).compare("string1", "string2")
+// StringTransform is a type alias for (Array[Char] => Array[Char])
+val myTransform: StringTransform = (ca) => ca.filter(_ == 'x')
+
+(MetaphoneAlgorithm withTransform myTransform).compute("abc123")
+(MetaphoneMetric withTransform myTransform).compare("abc123", "abc456")
```
+---
+
## Convenience objects
Convenience objects are available to make interactions with the library easier.
__StringAlgorithm:__
```scala
-// Easy access to compute methods.
StringAlgorithm.computeWithMetaphone("string")
```
+<sup>Located in the [algorithm module](https://github.com/rockymadden/stringmetric/blob/master/core/source/main/scala/com/rockymadden/stringmetric/Algorithm.scala).</sup>
---
__StringMetric:__
```scala
-// Easy access to compare methods.
-StringMetric.compareWithJaroWinkler("string1", "string2")
+StringMetric.compareWithJaccard(1)("abc123", "abc456")
+StringMetric.compareWithJaroWinkler("abc123", "abc456")
```
+<sup>Located in the [metric module](https://github.com/rockymadden/stringmetric/blob/master/core/source/main/scala/com/rockymadden/stringmetric/Metric.scala).</sup>
---