path: root/
blob: 98870c353950f3fd0f2f8e27d28b513ef629a921 (plain) (tree)

































#stringmetric [![Build Status](](
A collection of string metrics implemented in Scala. All phonetic string metrics have a standalone algorithm counterpart. They provide a means to determine the phonetic representation of the argument passed, rather than evaluating if two arguments sound the same phonetically. __Each metric and algorithm has a CLI.__

## Metrics and Phonetic Algorithms
* __[Dice / Sorensen](
	* API: org.hashtree.stringmetric.similarity.DiceSorensenMetric
	* CLI: diceSorensenMetric
* __[Hamming](
	* API: org.hashtree.stringmetric.similarity.HammingMetric
	* CLI: hammingMetric
* __[Jaro](
	* API: org.hashtree.stringmetric.similarity.JaroMetric
	* CLI: jaroMetric
* __[Jaro-Winkler](
	* API: org.hashtree.stringmetric.similarity.JaroWinklerMetric
	* CLI: jaroWinklerMetric
* __[Levenshtein](
	* API: org.hashtree.stringmetric.similarity.LevenshteinMetric
	* CLI: levenshteinMetric
* __[Metaphone](
	* API: org.hashtree.stringmetric.phonetic.MetaphoneMetric and org.hashtree.stringmetric.phonetic.MetaphoneAlgorithm
	* CLI: metaphoneMetric and metaphoneAlgorithm
* __[N-Gram](
	* API: org.hashtree.stringmetric.similarity.NGramMetric and org.hashtree.stringmetric.similarity.NGramAlgorithm
	* CLI: nGramMetric and nGramAlgorithm
* __[NYSIIS](
	* API: org.hashtree.stringmetric.phonetic.NysiisMetric and org.hashtree.stringmetric.phonetic.NysiisAlgorithm
	* CLI: nysiisMetric and nysiisAlgorithm
* __[Refined Soundex](
	* API: org.hashtree.stringmetric.phonetic.RefinedSoundexMetric and org.hashtree.stringmetric.phonetic.RefinedSoundexAlgorithm
	* CLI: refinedSoundexMetric and refinedSoundexAlgorithm
* __[Soundex](
	* API: org.hashtree.stringmetric.phonetic.SoundexMetric and org.hashtree.stringmetric.phonetic.SoundexAlgorithm
	* CLI: soundexMetric and soundexAlgorithm
* __Weighted Levenshtein__
	* API: org.hashtree.stringmetric.similarity.WeightedLevenshteinMetric
	* CLI: weightedLevenshteinMetric

## Filters
Filters, which can optionally be applied, clean up arguments prior to evaluation. Filtering rules can be composed via trait stacking.

* __Ensure only ASCII control characters matter__
	* API: org.hashtree.stringmetric.filter.AsciiControlOnlyStringFilter
* __Ensure ASCII controls do not matter__
	* API: org.hashtree.stringmetric.filter.AsciiControlStringFilter
* __Ensure ASCII letter case-sensitivity does not matter__
	* API: org.hashtree.stringmetric.filter.AsciiLetterCaseStringFilter
* __Ensure only ASCII letters and numbers matter__
	* API: org.hashtree.stringmetric.filter.AsciiLetterNumberOnlyStringFilter
* __Ensure ASCII letters and numbers do not matter__
	* API: org.hashtree.stringmetric.filter.AsciiLetterNumberStringFilter
* __Ensure only ASCII letters matter__
	* API: org.hashtree.stringmetric.filter.AsciiLetterOnlyStringFilter
* __Ensure ASCII letters do not matter__
	* AlI: org.hashtree.stringmetric.filter.AsciiLetterStringFilter
* __Ensure only ASCII numbers matter__
	* API: org.hashtree.stringmetric.filter.AsciiNumberOnlyStringFilter
* __Ensure ASCII numbers do not matter__
	* API: org.hashtree.stringmetric.filter.AsciiNumberStringFilter
* __Ensure ASCII spaces do not matter__
	* API: org.hashtree.stringmetric.filter.AsciiSpaceStringFilter
* __Ensure only ASCII symbols matter__
	* API: org.hashtree.stringmetric.filter.AsciiSymbolOnlyStringFilter
* __Ensure ASCII symbols do not matter__
	* API: org.hashtree.stringmetric.filter.AsciiSymbolStringFilter

## Building the API
gradle :stringmetric-core:jar

## Building the CLI
gradle :stringmetric-cli:tar

## Using the API
The easiest non-filtered example involves using the StringMetric convenience object.
import org.hashtree.stringmetric.StringMetric
if (StringMetric.compareJaroWinkler("string1", "string2") >= 0.9) 
    println("It's likely you're a match!")

The easiest single filtered example involves using the StringMetric and StringFilter convenience objects.
import org.hashtree.stringmetric.{ StringFilter, StringMetric }
if (StringMetric.compareJaroWinkler("string1", "string2")(StringFilter.asciiLetterCase) >= 0.9) 
    println("It's likely you're a match!")

Basic example with no filtering.
import org.hashtree.stringmetric.similarity.JaroWinklerMetric  
val distance ="string1", "string2")

if (distance >= 0.9) println("It's likely you're a match!")

Basic example with single filter.
import org.hashtree.stringmetric.similarity.{ JaroWinklerMetric, StringFilterDelegate }
import org.hashtree.stringmetric.filter.AsciiLetterCaseStringFilter

val distance ="string1", "string2")
    (new StringFilterDelegate with AsciiLetterCaseStringFilter)

if (distance >= 0.9) println("It's likely you're a match!")

Basic example with stacked filter. Filters are applied in reverse order.
import org.hashtree.stringmetric.similarity.{ JaroWinklerMetric, StringFilterDelegate }
import org.hashtree.stringmetric.filter.{ AsciiLetterCaseStringFilter, AsciiLetterOnlyStringFilter }

val distance ="string1", "string2")
    (new StringFilterDelegate with AsciiLetterCaseStringFilter with AsciiLetterOnlyStringFilter)

if (distance >= 0.9) println("It's likely you're a match!")

## Using the CLI
Uncompress the built tar and ensure you have ability to execute the commands. Execute the metric of choice via the command line:

The help option prints command syntax and usage.
jaroWinklerMetric --help
metaphoneMetric --help
metaphoneAlgorithm --help

Compare "abc" to "xyz" using the Jaro-Winkler metric.
jaroWinklerMetric abc xyz

Compare "abc "to "xyz" using the Metaphone metric.
metaphoneMetric abc xyz

Get the phonetic representation of "abc" using the Metaphone phonetic algorithm.
metaphoneAlgorithm abc

## Requirements
* Scala 2.9.2
* Gradle 1.0 or above

## Versioning
[Semantic Versioning 2.0.0](

## License
[Apache License, Version 2.0](