Class SorensenDice
java.lang.Object
info.debatty.java.stringsimilarity.ShingleBased
info.debatty.java.stringsimilarity.SorensenDice
- All Implemented Interfaces:
NormalizedStringDistance,NormalizedStringSimilarity,StringDistance,StringSimilarity,Serializable
@Immutable
public class SorensenDice
extends ShingleBased
implements NormalizedStringDistance, NormalizedStringSimilarity
Similar to Jaccard index, but this time the similarity is computed as 2 * |V1
inter V2| / (|V1| + |V2|). Distance is computed as 1 - cosine similarity.
- See Also:
-
Constructor Summary
ConstructorsConstructorDescriptionSorensen-Dice coefficient, aka Sørensen index, Dice's coefficient or Czekanowski's binary (non-quantitative) index.SorensenDice(int k) Sorensen-Dice coefficient, aka Sørensen index, Dice's coefficient or Czekanowski's binary (non-quantitative) index. -
Method Summary
Methods inherited from class info.debatty.java.stringsimilarity.ShingleBased
getK, getProfile
-
Constructor Details
-
SorensenDice
public SorensenDice(int k) Sorensen-Dice coefficient, aka Sørensen index, Dice's coefficient or Czekanowski's binary (non-quantitative) index. The strings are first converted to boolean sets of k-shingles (sequences of k characters), then the similarity is computed as 2 * |A inter B| / (|A| + |B|). Attention: Sorensen-Dice distance (and similarity) does not satisfy triangle inequality.- Parameters:
k-
-
SorensenDice
public SorensenDice()Sorensen-Dice coefficient, aka Sørensen index, Dice's coefficient or Czekanowski's binary (non-quantitative) index. The strings are first converted to boolean sets of k-shingles (sequences of k characters), then the similarity is computed as 2 * |A inter B| / (|A| + |B|). Attention: Sorensen-Dice distance (and similarity) does not satisfy triangle inequality. Default k is 3.
-
-
Method Details
-
similarity
Similarity is computed as 2 * |A inter B| / (|A| + |B|).- Specified by:
similarityin interfaceStringSimilarity- Parameters:
s1- The first string to compare.s2- The second string to compare.- Returns:
- The computed Sorensen-Dice similarity.
- Throws:
NullPointerException- if s1 or s2 is null.
-
distance
Returns 1 - similarity.- Specified by:
distancein interfaceStringDistance- Parameters:
s1- The first string to compare.s2- The second string to compare.- Returns:
- 1.0 - the computed similarity
- Throws:
NullPointerException- if s1 or s2 is null.
-