Package org.languagetool.tools
Class StringTools
java.lang.Object
org.languagetool.tools.StringTools
Tools for working with strings.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enum
Constants for printing XML rule matches. -
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic String
Adds spaces before words that are not punctuation.static @Nullable String
static void
Throw exception if the given string is null or empty or only whitespace.private static @Nullable String
changeFirstCharCase
(String str, boolean toUpperCase) Returnstr
modified so that its first character is now an lowercase or uppercase character, depending ontoUpperCase
.static String
static String
static String
escapeHTML
(String s) Escapes these characters: less than, greater than, quote, ampersand.static String
CallsescapeHTML(String)
.static String
Simple XML filtering for XML tags.static boolean
isAllUppercase
(String str) Returns true if the given string is made up of all-uppercase characters (ignoring characters for which no upper-/lowercase distinction exists).static boolean
isCapitalizedWord
(String str) static boolean
Helper method to replace calls to"".equals()
.static boolean
isMixedCase
(String str) Returns true if the given string is mixed case, likeMixedCase
ormixedCase
(but notMixedcase
).static boolean
Checks if a string is the non-breaking whitespace (static boolean
isNotAllLowercase
(String str) Returns true ifstr
is made up of all-lowercase characters (ignoring characters for which no upper-/lowercase distinction exists).static boolean
isParagraphEnd
(String sentence, boolean singleLineBreaksMarksPara) static boolean
isPositiveNumber
(char ch) static boolean
isWhitespace
(String str) Checks if a string contains a whitespace, including: all Unicode whitespace the non-breaking space (U+00A0) the narrow non-breaking space (U+202F) the zero width space (U+200B), used in KhmerLoads file, ignoring comments (lines starting with#
).static @Nullable String
lowercaseFirstChar
(String str) Returnstr
modified so that its first character is now an lowercase character.static String
readerToString
(Reader reader) static String
readStream
(InputStream stream, String encoding) Read the text stream using the given encoding.static boolean
Whether the first character ofstr
is an uppercase character.static String
streamToString
(InputStream is, String charsetName) static String
eliminate special (unicode) characters, e.g.static String
Filters any whitespace characters.static @Nullable String
uppercaseFirstChar
(String str) Returnstr
modified so that its first character is now an uppercase character.static @Nullable String
uppercaseFirstChar
(String str, Language language) LikeuppercaseFirstChar(String)
, but handles a special case for Dutch (IJ in e.g.
-
Field Details
-
XML_COMMENT_PATTERN
-
XML_PATTERN
-
UPPERCASE_GREEK_LETTERS
-
LOWERCASE_GREEK_LETTERS
-
-
Constructor Details
-
StringTools
private StringTools()
-
-
Method Details
-
assureSet
Throw exception if the given string is null or empty or only whitespace. -
readStream
Read the text stream using the given encoding.- Parameters:
stream
- InputStream the stream to be readencoding
- the stream's character encoding, e.g.utf-8
, ornull
to use the system encoding- Returns:
- a string with the stream's content, lines separated by
\n
(note that\n
will be added to the last line even if it is not in the stream) - Throws:
IOException
- Since:
- 2.3
-
isAllUppercase
Returns true if the given string is made up of all-uppercase characters (ignoring characters for which no upper-/lowercase distinction exists). -
isMixedCase
Returns true if the given string is mixed case, likeMixedCase
ormixedCase
(but notMixedcase
).- Parameters:
str
- input str
-
isNotAllLowercase
Returns true ifstr
is made up of all-lowercase characters (ignoring characters for which no upper-/lowercase distinction exists).- Since:
- 2.5
-
isCapitalizedWord
- Parameters:
str
- input string- Returns:
- true if word starts with an uppercase letter and all other letters are lowercase
-
startsWithUppercase
Whether the first character ofstr
is an uppercase character. -
uppercaseFirstChar
Returnstr
modified so that its first character is now an uppercase character. Ifstr
starts with non-alphabetic characters, such as quotes or parentheses, the first character is determined as the first alphabetic character. -
uppercaseFirstChar
LikeuppercaseFirstChar(String)
, but handles a special case for Dutch (IJ in e.g. "ijsselmeer" -> "IJsselmeer").- Parameters:
language
- the language, will be ignored if it'snull
- Since:
- 2.7
-
lowercaseFirstChar
Returnstr
modified so that its first character is now an lowercase character. Ifstr
starts with non-alphabetic characters, such as quotes or parentheses, the first character is determined as the first alphabetic character. -
changeFirstCharCase
Returnstr
modified so that its first character is now an lowercase or uppercase character, depending ontoUpperCase
. Ifstr
starts with non-alphabetic characters, such as quotes or parentheses, the first character is determined as the first alphabetic character. -
readerToString
- Throws:
IOException
-
streamToString
- Throws:
IOException
-
escapeXML
CallsescapeHTML(String)
. -
escapeForXmlAttribute
- Since:
- 2.9
-
escapeForXmlContent
- Since:
- 2.9
-
escapeHTML
Escapes these characters: less than, greater than, quote, ampersand. -
trimWhitespace
Filters any whitespace characters. Useful for trimming the contents of token elements that cannot possibly contain any spaces, with the exception for a single space in a word (for example, if the language supports numbers formatted with spaces as single tokens, as Catalan in LanguageTool).- Parameters:
s
- String to be filtered.- Returns:
- Filtered s.
-
trimSpecialCharacters
eliminate special (unicode) characters, e.g. soft hyphens- Parameters:
s
- String to filter- Returns:
- s, with non-(alphanumeric, punctuation, space) characters deleted
- Since:
- 4.3
-
addSpace
Adds spaces before words that are not punctuation.- Parameters:
word
- Word to add the preceding space.language
- Language of the word (to check typography conventions). Currently French convention of not adding spaces only before '.' and ',' is implemented; other languages assume that before ,.;:!? no spaces should be added.- Returns:
- String containing a space or an empty string.
-
isWhitespace
Checks if a string contains a whitespace, including:- all Unicode whitespace
- the non-breaking space (U+00A0)
- the narrow non-breaking space (U+202F)
- the zero width space (U+200B), used in Khmer
- Parameters:
str
- String to check- Returns:
- true if the string is a whitespace character
-
isNonBreakingWhitespace
Checks if a string is the non-breaking whitespace (- Since:
- 2.1
-
isPositiveNumber
public static boolean isPositiveNumber(char ch) - Parameters:
ch
- Character to check- Returns:
- True if the character is a positive number (decimal digit from 1 to 9).
-
isEmpty
Helper method to replace calls to"".equals()
.- Parameters:
str
- String to check- Returns:
- true if string is empty or
null
-
filterXML
Simple XML filtering for XML tags.- Parameters:
str
- XML string to be filtered.- Returns:
- Filtered string without XML tags.
-
asString
-
isParagraphEnd
- Since:
- 4.3
-
loadLines
Loads file, ignoring comments (lines starting with#
).- Parameters:
path
- path in resource dir- Since:
- 4.6
-