Class JaroWinkler
java.lang.Object
info.debatty.java.stringsimilarity.JaroWinkler
- All Implemented Interfaces:
NormalizedStringDistance,NormalizedStringSimilarity,StringDistance,StringSimilarity,Serializable
@Immutable
public class JaroWinkler
extends Object
implements NormalizedStringSimilarity, NormalizedStringDistance
The Jaro–Winkler distance metric is designed and best suited for short
strings such as person names, and to detect typos; it is (roughly) a
variation of Damerau-Levenshtein, where the substitution of 2 close
characters is considered less important then the substitution of 2 characters
that a far from each other.
Jaro-Winkler was developed in the area of record linkage (duplicate
detection) (Winkler, 1990). It returns a value in the interval [0.0, 1.0].
The distance is computed as 1 - Jaro-Winkler similarity.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final doubleprivate static final doubleprivate static final intprivate final double -
Constructor Summary
ConstructorsConstructorDescriptionInstantiate with default threshold (0.7).JaroWinkler(double threshold) Instantiate with given threshold to determine when Winkler bonus should be used. -
Method Summary
Modifier and TypeMethodDescriptionfinal doubleReturn 1 - similarity.final doubleReturns the current value of the threshold used for adding the Winkler bonus.private int[]final doublesimilarity(String s1, String s2) Compute Jaro-Winkler similarity.
-
Field Details
-
DEFAULT_THRESHOLD
private static final double DEFAULT_THRESHOLD- See Also:
-
THREE
private static final int THREE- See Also:
-
JW_COEF
private static final double JW_COEF- See Also:
-
threshold
private final double threshold
-
-
Constructor Details
-
JaroWinkler
public JaroWinkler()Instantiate with default threshold (0.7). -
JaroWinkler
public JaroWinkler(double threshold) Instantiate with given threshold to determine when Winkler bonus should be used. Set threshold to a negative value to get the Jaro distance.- Parameters:
threshold-
-
-
Method Details
-
getThreshold
public final double getThreshold()Returns the current value of the threshold used for adding the Winkler bonus. The default value is 0.7.- Returns:
- the current value of the threshold
-
similarity
Compute Jaro-Winkler similarity.- Specified by:
similarityin interfaceStringSimilarity- Parameters:
s1- The first string to compare.s2- The second string to compare.- Returns:
- The Jaro-Winkler similarity in the range [0, 1]
- Throws:
NullPointerException- if s1 or s2 is null.
-
distance
Return 1 - similarity.- Specified by:
distancein interfaceStringDistance- Parameters:
s1- The first string to compare.s2- The second string to compare.- Returns:
- 1 - similarity.
- Throws:
NullPointerException- if s1 or s2 is null.
-
matches
-