Package org.languagetool
Class AnalyzedSentence
- java.lang.Object
-
- org.languagetool.AnalyzedSentence
-
public final class AnalyzedSentence extends java.lang.ObjectA sentence that has been tokenized and analyzed.
-
-
Field Summary
Fields Modifier and Type Field Description private java.util.Set<java.lang.String>lemmaSetprivate AnalyzedTokenReadings[]nonBlankPreDisambigTokensprivate AnalyzedTokenReadings[]nonBlankTokensprivate AnalyzedTokenReadings[]preDisambigTokensprivate AnalyzedTokenReadings[]tokensprivate java.util.Set<java.lang.String>tokenSetprivate int[]whPositions
-
Constructor Summary
Constructors Modifier Constructor Description AnalyzedSentence(AnalyzedTokenReadings[] tokens)Creates an AnalyzedSentence from the givenAnalyzedTokenReadings.privateAnalyzedSentence(AnalyzedTokenReadings[] tokens, int[] mapping, AnalyzedTokenReadings[] nonBlankTokens, AnalyzedTokenReadings[] nonBlankPreDisambigTokens)AnalyzedSentence(AnalyzedTokenReadings[] tokens, AnalyzedTokenReadings[] preDisambigTokens)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description AnalyzedSentencecopy(AnalyzedSentence sentence)The method copiesAnalyzedSentenceand returns the copy.booleanequals(java.lang.Object o)java.lang.StringgetAnnotations()Get disambiguator actions log.java.util.Set<java.lang.String>getLemmaSet()Get the lowercase lemmas of this sentence in a set.private java.util.Set<java.lang.String>getLemmaSet(AnalyzedTokenReadings[] tokens)private @NotNull java.util.List<AnalyzedTokenReadings>getNonBlankReadings(AnalyzedTokenReadings[] tokens, int whCounter, int nonWhCounter, int[] mapping)intgetOriginalPosition(int nonWhPosition)Get a position of a non-whitespace token in the original sentence with whitespace.AnalyzedTokenReadings[]getPreDisambigTokens()AnalyzedTokenReadings[]getPreDisambigTokensWithoutWhitespace()java.lang.StringgetText()Return the original text.AnalyzedTokenReadings[]getTokens()Returns theAnalyzedTokenReadingsof the analyzed text.java.util.Set<java.lang.String>getTokenSet()Get the lowercase tokens of this sentence in a set.private java.util.Set<java.lang.String>getTokenSet(AnalyzedTokenReadings[] tokens)AnalyzedTokenReadings[]getTokensWithoutWhitespace()Returns theAnalyzedTokenReadingsof the analyzed text, with whitespace tokens removed but with the artificialSENT_STARTtoken included.inthashCode()booleanhasParagraphEndMark(Language lang)Returns true if sentences ends with a paragraph break.java.lang.StringtoShortString(java.lang.String readingDelimiter)Return string representation without chunk information.java.lang.StringtoString()java.lang.StringtoString(java.lang.String readingDelimiter)Return string representation with chunk information.private java.lang.StringtoString(java.lang.String readingDelimiter, boolean includeChunks)(package private) java.lang.StringtoTextString()Return string representation without any analysis information, just the original text.
-
-
-
Field Detail
-
tokens
private final AnalyzedTokenReadings[] tokens
-
preDisambigTokens
private final AnalyzedTokenReadings[] preDisambigTokens
-
nonBlankTokens
private final AnalyzedTokenReadings[] nonBlankTokens
-
nonBlankPreDisambigTokens
private final AnalyzedTokenReadings[] nonBlankPreDisambigTokens
-
whPositions
private final int[] whPositions
-
tokenSet
private final java.util.Set<java.lang.String> tokenSet
-
lemmaSet
private final java.util.Set<java.lang.String> lemmaSet
-
-
Constructor Detail
-
AnalyzedSentence
public AnalyzedSentence(AnalyzedTokenReadings[] tokens)
Creates an AnalyzedSentence from the givenAnalyzedTokenReadings. Whitespace is also a token.
-
AnalyzedSentence
public AnalyzedSentence(AnalyzedTokenReadings[] tokens, AnalyzedTokenReadings[] preDisambigTokens)
-
AnalyzedSentence
private AnalyzedSentence(AnalyzedTokenReadings[] tokens, int[] mapping, AnalyzedTokenReadings[] nonBlankTokens, AnalyzedTokenReadings[] nonBlankPreDisambigTokens)
-
-
Method Detail
-
getNonBlankReadings
@NotNull private @NotNull java.util.List<AnalyzedTokenReadings> getNonBlankReadings(AnalyzedTokenReadings[] tokens, int whCounter, int nonWhCounter, int[] mapping)
-
getTokenSet
private java.util.Set<java.lang.String> getTokenSet(AnalyzedTokenReadings[] tokens)
-
getLemmaSet
private java.util.Set<java.lang.String> getLemmaSet(AnalyzedTokenReadings[] tokens)
-
copy
public AnalyzedSentence copy(AnalyzedSentence sentence)
The method copiesAnalyzedSentenceand returns the copy. Useful for performing local immunization (for example).- Parameters:
sentence-AnalyzedSentenceto be copied- Returns:
- a new object which is a copy
- Since:
- 2.5
-
getTokens
public AnalyzedTokenReadings[] getTokens()
Returns theAnalyzedTokenReadingsof the analyzed text. Whitespace is also a token.
-
getPreDisambigTokens
@Experimental public AnalyzedTokenReadings[] getPreDisambigTokens()
- Since:
- 4.5
-
getTokensWithoutWhitespace
public AnalyzedTokenReadings[] getTokensWithoutWhitespace()
Returns theAnalyzedTokenReadingsof the analyzed text, with whitespace tokens removed but with the artificialSENT_STARTtoken included.
-
getPreDisambigTokensWithoutWhitespace
@Experimental public AnalyzedTokenReadings[] getPreDisambigTokensWithoutWhitespace()
- Since:
- 4.5
-
getOriginalPosition
public int getOriginalPosition(int nonWhPosition)
Get a position of a non-whitespace token in the original sentence with whitespace.- Parameters:
nonWhPosition- position of a non-whitespace token- Returns:
- position in the original sentence.
-
toString
public java.lang.String toString()
- Overrides:
toStringin classjava.lang.Object
-
toShortString
public java.lang.String toShortString(java.lang.String readingDelimiter)
Return string representation without chunk information.- Since:
- 2.3
-
getText
public java.lang.String getText()
Return the original text.- Since:
- 2.7
-
toTextString
java.lang.String toTextString()
Return string representation without any analysis information, just the original text.- Since:
- 2.6
-
toString
public java.lang.String toString(java.lang.String readingDelimiter)
Return string representation with chunk information.
-
toString
private java.lang.String toString(java.lang.String readingDelimiter, boolean includeChunks)
-
getAnnotations
public java.lang.String getAnnotations()
Get disambiguator actions log.
-
getTokenSet
public java.util.Set<java.lang.String> getTokenSet()
Get the lowercase tokens of this sentence in a set. Used internally for performance optimization.- Since:
- 2.4
-
getLemmaSet
public java.util.Set<java.lang.String> getLemmaSet()
Get the lowercase lemmas of this sentence in a set. Used internally for performance optimization.- Since:
- 2.5
-
equals
public boolean equals(java.lang.Object o)
- Overrides:
equalsin classjava.lang.Object
-
hashCode
public int hashCode()
- Overrides:
hashCodein classjava.lang.Object
-
hasParagraphEndMark
public boolean hasParagraphEndMark(Language lang)
Returns true if sentences ends with a paragraph break.- Since:
- 4.3
-
-