Package org.languagetool.language
Class LanguageIdentifier
- java.lang.Object
-
- org.languagetool.language.LanguageIdentifier
-
public class LanguageIdentifier extends java.lang.ObjectIdentify the language of a text. Note that some languages might never be detected because they are close to another language. Language variants like en-US or en-GB are not detected, the result will beenfor those. By default, only the first 1000 characters of a text are considered. Email signatures that use\n-- \nas a delimiter are ignored.- Since:
- 2.9
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description (package private) classLanguageIdentifier.RemoveEMailSignatureFilter
-
Field Summary
Fields Modifier and Type Field Description private static intCONSIDER_ONLY_PREFERRED_THRESHOLDprivate static java.util.List<java.lang.String>externalLangCodesprivate booleanfasttextEnabledprivate java.io.BufferedReaderfasttextInprivate java.io.BufferedWriterfasttextOutprivate java.lang.ProcessfasttextProcessprivate static java.util.List<java.lang.String>ignoreLangCodesprivate static intK_HIGHEST_SCORESprivate com.optimaize.langdetect.LanguageDetectorlanguageDetectorprivate static org.slf4j.Loggerloggerprivate intmaxLengthprivate static doubleMINIMAL_CONFIDENCEprivate static intSHORT_ALGO_THRESHOLDprivate static java.util.regex.PatternSIGNATUREprivate com.optimaize.langdetect.text.TextObjectFactorytextObjectFactoryprivate static floatTHRESHOLD
-
Constructor Summary
Constructors Constructor Description LanguageIdentifier()LanguageIdentifier(int maxLength)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private booleancanLanguageBeDetected(java.lang.String langCode, java.util.List<java.lang.String> additionalLanguageCodes)@Nullable LanguagedetectLanguage(java.lang.String text)@Nullable DetectedLanguagedetectLanguage(java.lang.String text, java.util.List<java.lang.String> noopLangsTmp, java.util.List<java.lang.String> preferredLangsTmp)private java.util.Map.Entry<java.lang.String,java.lang.Double>detectLanguageCode(java.lang.String text)(package private) @Nullable DetectedLanguagedetectLanguageWithDetails(java.lang.String text)voidenableFasttext(java.io.File fasttextBinary, java.io.File fasttextModel)private java.util.Map.Entry<java.lang.String,java.lang.Double>getHighestScoringResult(java.util.Map<java.lang.String,java.lang.Double> probs)private static java.util.List<java.lang.String>getLanguageCodes()private java.util.List<com.optimaize.langdetect.profiles.LanguageProfile>loadProfiles(java.util.List<java.lang.String> langCodes)private java.util.Map<java.lang.String,java.lang.Double>runFasttext(java.lang.String text, java.util.List<java.lang.String> additionalLanguageCodes)private voidstartFasttext(java.io.File modelPath, java.io.File binaryPath)
-
-
-
Field Detail
-
logger
private static final org.slf4j.Logger logger
-
MINIMAL_CONFIDENCE
private static final double MINIMAL_CONFIDENCE
- See Also:
- Constant Field Values
-
K_HIGHEST_SCORES
private static final int K_HIGHEST_SCORES
- See Also:
- Constant Field Values
-
SHORT_ALGO_THRESHOLD
private static final int SHORT_ALGO_THRESHOLD
- See Also:
- Constant Field Values
-
CONSIDER_ONLY_PREFERRED_THRESHOLD
private static final int CONSIDER_ONLY_PREFERRED_THRESHOLD
- See Also:
- Constant Field Values
-
SIGNATURE
private static final java.util.regex.Pattern SIGNATURE
-
ignoreLangCodes
private static final java.util.List<java.lang.String> ignoreLangCodes
-
externalLangCodes
private static final java.util.List<java.lang.String> externalLangCodes
-
THRESHOLD
private static final float THRESHOLD
- See Also:
- Constant Field Values
-
languageDetector
private final com.optimaize.langdetect.LanguageDetector languageDetector
-
textObjectFactory
private final com.optimaize.langdetect.text.TextObjectFactory textObjectFactory
-
maxLength
private final int maxLength
-
fasttextEnabled
private boolean fasttextEnabled
-
fasttextProcess
private java.lang.Process fasttextProcess
-
fasttextIn
private java.io.BufferedReader fasttextIn
-
fasttextOut
private java.io.BufferedWriter fasttextOut
-
-
Constructor Detail
-
LanguageIdentifier
public LanguageIdentifier()
-
LanguageIdentifier
public LanguageIdentifier(int maxLength)
- Parameters:
maxLength- the maximum number of characters that will be considered - can help with performance. Don't use values below 100, as this would decrease accuracy.- Throws:
java.lang.IllegalArgumentException- ifmaxLengthis less than 10- Since:
- 4.2
-
-
Method Detail
-
enableFasttext
public void enableFasttext(java.io.File fasttextBinary, java.io.File fasttextModel)
-
getLanguageCodes
private static java.util.List<java.lang.String> getLanguageCodes()
-
loadProfiles
private java.util.List<com.optimaize.langdetect.profiles.LanguageProfile> loadProfiles(java.util.List<java.lang.String> langCodes) throws java.io.IOException- Throws:
java.io.IOException
-
detectLanguage
@Nullable public @Nullable Language detectLanguage(java.lang.String text)
- Returns:
- language or
nullif language could not be identified
-
detectLanguageWithDetails
@Nullable @Experimental @Nullable DetectedLanguage detectLanguageWithDetails(java.lang.String text)
- Returns:
- language or
nullif language could not be identified
-
detectLanguage
@Nullable public @Nullable DetectedLanguage detectLanguage(java.lang.String text, java.util.List<java.lang.String> noopLangsTmp, java.util.List<java.lang.String> preferredLangsTmp)
- Parameters:
noopLangsTmp- list of codes that are detected but will lead to the NoopLanguage that has no rules- Returns:
- language or
nullif language could not be identified - Since:
- 4.4 (new parameter noopLangs, changed return type to DetectedLanguage)
-
canLanguageBeDetected
private boolean canLanguageBeDetected(java.lang.String langCode, java.util.List<java.lang.String> additionalLanguageCodes)
-
startFasttext
private void startFasttext(java.io.File modelPath, java.io.File binaryPath) throws java.io.IOException- Throws:
java.io.IOException
-
getHighestScoringResult
private java.util.Map.Entry<java.lang.String,java.lang.Double> getHighestScoringResult(java.util.Map<java.lang.String,java.lang.Double> probs)
-
runFasttext
private java.util.Map<java.lang.String,java.lang.Double> runFasttext(java.lang.String text, java.util.List<java.lang.String> additionalLanguageCodes) throws java.io.IOException- Throws:
java.io.IOException
-
detectLanguageCode
@Nullable private java.util.Map.Entry<java.lang.String,java.lang.Double> detectLanguageCode(java.lang.String text)
- Returns:
- language or
nullif language could not be identified
-
-