From 51b619fc868e67e388f7d8078a895a0c4f46a256 Mon Sep 17 00:00:00 2001 From: Rocky Madden Date: Wed, 23 Jan 2013 13:39:07 -0700 Subject: Refactor of examples. Dice/Sorensen done. --- readme.md | 79 ++++++++------------------------------------------------------- 1 file changed, 9 insertions(+), 70 deletions(-) diff --git a/readme.md b/readme.md index 1baf2ac..057bd32 100755 --- a/readme.md +++ b/readme.md @@ -1,5 +1,5 @@ #stringmetric [![Build Status](https://travis-ci.org/rockymadden/stringmetric.png?branch=master)](http://travis-ci.org/rockymadden/stringmetric) -String metrics and phonetic algorithms implemented in Scala. The library provides facilities to perform approximate string matching, measurement of string similarity/distance, and indexing by word pronunciation. In addition to the core library, each metric and algorithm has a command line interface. Both subprojects are heavily unit tested and performant (verified via microbenchmark suites). +String metrics and phonetic algorithms implemented in Scala. The library provides facilities to perform approximate string matching, measurement of string similarity/distance, indexing by word pronunciation, and sounds-like comparisions. In addition to the core library, each metric and algorithm has a command line interface. Heavy emphasis is placed on unit testing and performance (verified via microbenchmark suites). ## Metrics and Algorithms * __[Dice / Sorensen](http://en.wikipedia.org/wiki/Dice%27s_coefficient)__ (Similarity metric) @@ -16,80 +16,19 @@ String metrics and phonetic algorithms implemented in Scala. The library provide * __[Soundex](http://en.wikipedia.org/wiki/Soundex)__ (Phonetic metric and algorithm) * __Weighted Levenshtein__ (Similarity metric) -## Using the Core -Basic example with no filtering: -```scala -val distance = JaroWinklerMetric.compare("string1", "string2") - -if (distance >= 0.9) println("It's likely you're a match!") -``` - -Basic example with single filter: -```scala -val distance = JaroWinklerMetric.compare("string1", "string2") - (new StringFilterDelegate with AsciiLetterCaseStringFilter) - -if (distance >= 0.9) println("It's likely you're a match!") -``` +## Similarity package +Useful for approximate string matching and measurement of string distance. Most metrics calculate the similarity of two strings as a double with a value between 0 and 1. A value of 0 being completely different and a value of 1 being completely similar. -Basic example with stacked filter. Filters are applied in reverse order: +Dice / Sorensen Metric: ```scala -val distance = JaroWinklerMetric.compare("string1", "string2") - (new StringFilterDelegate with AsciiLetterCaseStringFilter with AsciiLetterOnlyStringFilter) - -if (distance >= 0.9) println("It's likely you're a match!") -``` - -You can also use the StringMetric, StringAlgorithm, and StringFilter convenience objects: -```scala -if (StringMetric.compareWithJaroWinkler("string1", "string2") >= 0.9) - println("It's likely you're a match!") - -if (StringMetric.compareWithJaroWinkler("string1", "string2")(StringFilter.asciiLetterCase) >= 0.9) - println("It's likely you're a match!") -``` - -## Using the CLI -The help option prints command syntax and usage: -```shell -$ metaphoneMetric --help -Compares two strings to determine if they are phonetically similarly, per the Metaphone algorithm. - -Syntax: - metaphoneMetric [Options] string1 string2... - -Options: - -h, --help - Outputs description, syntax, and options. -``` - -```shell -$ jaroWinklerMetric --help -Compares two strings to calculate the Jaro-Winkler distance. - -Syntax: - jaroWinklerMetric [Options] string1 string2... - -Options: - -h, --help - Outputs description, syntax, and options. -``` - -Compare "dog" to "dawg": -```shell -$ metaphoneMetric dog dawg -true -``` - -```shell -$ jaroWinklerMetric dog dawg -0.75 +println(DiceSorensenMetric.compare("night", "nacht")) +println(DiceSorensenMetric.compare("context", "contact") ``` -Get the phonetic representation of "dog" using the Metaphone phonetic algorithm: +Outputs: ```shell -$ metaphoneAlgorithm dog -tk +0.6 +0.7142857142857143 ``` ## Testing -- cgit v1.2.3