Package org.languagetool.tokenizers
Class SRXSentenceTokenizer
- java.lang.Object
-
- org.languagetool.tokenizers.SRXSentenceTokenizer
-
- All Implemented Interfaces:
SentenceTokenizer,Tokenizer
- Direct Known Subclasses:
SimpleSentenceTokenizer
public class SRXSentenceTokenizer extends java.lang.Object implements SentenceTokenizer
Class to tokenize sentences using rules from an SRX file.
-
-
Field Summary
Fields Modifier and Type Field Description private Languagelanguageprivate java.lang.StringparCodeprivate net.loomchild.segment.srx.SrxDocumentsrxDocument
-
Constructor Summary
Constructors Constructor Description SRXSentenceTokenizer(Language language)Build a sentence tokenizer based on the rules in thesegment.srxfile that comes with LanguageTool.SRXSentenceTokenizer(Language language, java.lang.String srxInClassPath)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidsetSingleLineBreaksMarksParagraph(boolean lineBreakParagraphs)booleansingleLineBreaksMarksPara()java.util.List<java.lang.String>tokenize(java.lang.String text)Tokenize the given string to sentences.
-
-
-
Field Detail
-
srxDocument
private final net.loomchild.segment.srx.SrxDocument srxDocument
-
language
private final Language language
-
parCode
private java.lang.String parCode
-
-
Constructor Detail
-
SRXSentenceTokenizer
public SRXSentenceTokenizer(Language language)
Build a sentence tokenizer based on the rules in thesegment.srxfile that comes with LanguageTool.
-
SRXSentenceTokenizer
public SRXSentenceTokenizer(Language language, java.lang.String srxInClassPath)
- Parameters:
srxInClassPath- the path to an SRX file in the classpath- Since:
- 3.2
-
-
Method Detail
-
tokenize
public final java.util.List<java.lang.String> tokenize(java.lang.String text)
Description copied from interface:SentenceTokenizerTokenize the given string to sentences.- Specified by:
tokenizein interfaceSentenceTokenizer- Specified by:
tokenizein interfaceTokenizer
-
singleLineBreaksMarksPara
public final boolean singleLineBreaksMarksPara()
- Specified by:
singleLineBreaksMarksParain interfaceSentenceTokenizer
-
setSingleLineBreaksMarksParagraph
public final void setSingleLineBreaksMarksParagraph(boolean lineBreakParagraphs)
- Specified by:
setSingleLineBreaksMarksParagraphin interfaceSentenceTokenizer- Parameters:
lineBreakParagraphs- iftrue, single lines breaks are assumed to end a paragraph; iffalse, only two ore more consecutive line breaks end a paragraph
-
-