Package org.languagetool.tools
Class StringTools
java.lang.Object
org.languagetool.tools.StringTools
Tools for working with strings.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enumConstants for printing XML rule matches. -
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic StringAdds spaces before words that are not punctuation.static @Nullable Stringstatic voidThrow exception if the given string is null or empty or only whitespace.private static @Nullable StringchangeFirstCharCase(String str, boolean toUpperCase) Returnstrmodified so that its first character is now an lowercase or uppercase character, depending ontoUpperCase.static Stringstatic Stringstatic StringescapeHTML(String s) Escapes these characters: less than, greater than, quote, ampersand.static StringCallsescapeHTML(String).static StringSimple XML filtering for XML tags.static booleanisAllUppercase(String str) Returns true if the given string is made up of all-uppercase characters (ignoring characters for which no upper-/lowercase distinction exists).static booleanisCapitalizedWord(String str) static booleanHelper method to replace calls to"".equals().static booleanisMixedCase(String str) Returns true if the given string is mixed case, likeMixedCaseormixedCase(but notMixedcase).static booleanChecks if a string is the non-breaking whitespace ().static booleanisNotAllLowercase(String str) Returns true ifstris made up of all-lowercase characters (ignoring characters for which no upper-/lowercase distinction exists).static booleanisParagraphEnd(String sentence, boolean singleLineBreaksMarksPara) static booleanisPositiveNumber(char ch) static booleanisWhitespace(String str) Checks if a string contains a whitespace, including: all Unicode whitespace the non-breaking space (U+00A0) the narrow non-breaking space (U+202F) the zero width space (U+200B), used in KhmerLoads file, ignoring comments (lines starting with#).static @Nullable StringlowercaseFirstChar(String str) Returnstrmodified so that its first character is now an lowercase character.static StringreaderToString(Reader reader) static StringreadStream(InputStream stream, String encoding) Read the text stream using the given encoding.static booleanWhether the first character ofstris an uppercase character.static StringstreamToString(InputStream is, String charsetName) static Stringeliminate special (unicode) characters, e.g.static StringFilters any whitespace characters.static @Nullable StringuppercaseFirstChar(String str) Returnstrmodified so that its first character is now an uppercase character.static @Nullable StringuppercaseFirstChar(String str, Language language) LikeuppercaseFirstChar(String), but handles a special case for Dutch (IJ in e.g.
-
Field Details
-
XML_COMMENT_PATTERN
-
XML_PATTERN
-
UPPERCASE_GREEK_LETTERS
-
LOWERCASE_GREEK_LETTERS
-
-
Constructor Details
-
StringTools
private StringTools()
-
-
Method Details
-
assureSet
Throw exception if the given string is null or empty or only whitespace. -
readStream
Read the text stream using the given encoding.- Parameters:
stream- InputStream the stream to be readencoding- the stream's character encoding, e.g.utf-8, ornullto use the system encoding- Returns:
- a string with the stream's content, lines separated by
\n(note that\nwill be added to the last line even if it is not in the stream) - Throws:
IOException- Since:
- 2.3
-
isAllUppercase
Returns true if the given string is made up of all-uppercase characters (ignoring characters for which no upper-/lowercase distinction exists). -
isMixedCase
Returns true if the given string is mixed case, likeMixedCaseormixedCase(but notMixedcase).- Parameters:
str- input str
-
isNotAllLowercase
Returns true ifstris made up of all-lowercase characters (ignoring characters for which no upper-/lowercase distinction exists).- Since:
- 2.5
-
isCapitalizedWord
- Parameters:
str- input string- Returns:
- true if word starts with an uppercase letter and all other letters are lowercase
-
startsWithUppercase
Whether the first character ofstris an uppercase character. -
uppercaseFirstChar
Returnstrmodified so that its first character is now an uppercase character. Ifstrstarts with non-alphabetic characters, such as quotes or parentheses, the first character is determined as the first alphabetic character. -
uppercaseFirstChar
LikeuppercaseFirstChar(String), but handles a special case for Dutch (IJ in e.g. "ijsselmeer" -> "IJsselmeer").- Parameters:
language- the language, will be ignored if it'snull- Since:
- 2.7
-
lowercaseFirstChar
Returnstrmodified so that its first character is now an lowercase character. Ifstrstarts with non-alphabetic characters, such as quotes or parentheses, the first character is determined as the first alphabetic character. -
changeFirstCharCase
Returnstrmodified so that its first character is now an lowercase or uppercase character, depending ontoUpperCase. Ifstrstarts with non-alphabetic characters, such as quotes or parentheses, the first character is determined as the first alphabetic character. -
readerToString
- Throws:
IOException
-
streamToString
- Throws:
IOException
-
escapeXML
CallsescapeHTML(String). -
escapeForXmlAttribute
- Since:
- 2.9
-
escapeForXmlContent
- Since:
- 2.9
-
escapeHTML
Escapes these characters: less than, greater than, quote, ampersand. -
trimWhitespace
Filters any whitespace characters. Useful for trimming the contents of token elements that cannot possibly contain any spaces, with the exception for a single space in a word (for example, if the language supports numbers formatted with spaces as single tokens, as Catalan in LanguageTool).- Parameters:
s- String to be filtered.- Returns:
- Filtered s.
-
trimSpecialCharacters
eliminate special (unicode) characters, e.g. soft hyphens- Parameters:
s- String to filter- Returns:
- s, with non-(alphanumeric, punctuation, space) characters deleted
- Since:
- 4.3
-
addSpace
Adds spaces before words that are not punctuation.- Parameters:
word- Word to add the preceding space.language- Language of the word (to check typography conventions). Currently French convention of not adding spaces only before '.' and ',' is implemented; other languages assume that before ,.;:!? no spaces should be added.- Returns:
- String containing a space or an empty string.
-
isWhitespace
Checks if a string contains a whitespace, including:- all Unicode whitespace
- the non-breaking space (U+00A0)
- the narrow non-breaking space (U+202F)
- the zero width space (U+200B), used in Khmer
- Parameters:
str- String to check- Returns:
- true if the string is a whitespace character
-
isNonBreakingWhitespace
Checks if a string is the non-breaking whitespace ().- Since:
- 2.1
-
isPositiveNumber
public static boolean isPositiveNumber(char ch) - Parameters:
ch- Character to check- Returns:
- True if the character is a positive number (decimal digit from 1 to 9).
-
isEmpty
Helper method to replace calls to"".equals().- Parameters:
str- String to check- Returns:
- true if string is empty or
null
-
filterXML
Simple XML filtering for XML tags.- Parameters:
str- XML string to be filtered.- Returns:
- Filtered string without XML tags.
-
asString
-
isParagraphEnd
- Since:
- 4.3
-
loadLines
Loads file, ignoring comments (lines starting with#).- Parameters:
path- path in resource dir- Since:
- 4.6
-