Class TextBlock
java.lang.Object
com.kohlschutter.boilerpipe.document.TextBlock
- All Implemented Interfaces:
Cloneable
Describes a block of text.
A block can be an "atomic" text element (i.e., a sequence of text that is not interrupted by any
HTML markup) or a compound of such atomic elements.
-
Field Summary
FieldsModifier and TypeFieldDescription(package private) BitSetprivate static final BitSetstatic final TextBlockstatic final TextBlock(package private) boolean(package private) floatprivate int(package private) int(package private) int(package private) int(package private) int(package private) int(package private) intprivate intprivate CharSequence(package private) float -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidAdds an arbitrary String label to thisTextBlock.voidAdds a set of labels to thisTextBlock.voidAdds a set of labels to thisTextBlock.protected TextBlockclone()Returns the containedTextElements BitSet, ornull.Returns the labels associated to this TextBlock, ornullif no such labels exist.floatintintintintintgetText()floatbooleanChecks whether this TextBlock has the given label.private voidbooleanvoidbooleanremoveLabel(String label) booleansetIsContent(boolean isContent) voidsetTagLevel(int tagLevel) toString()
-
Field Details
-
isContent
boolean isContent -
text
-
labels
-
offsetBlocksStart
int offsetBlocksStart -
offsetBlocksEnd
int offsetBlocksEnd -
numWords
int numWords -
numWordsInAnchorText
int numWordsInAnchorText -
numWordsInWrappedLines
int numWordsInWrappedLines -
numWrappedLines
int numWrappedLines -
textDensity
float textDensity -
linkDensity
float linkDensity -
containedTextElements
BitSet containedTextElements -
numFullTextWords
private int numFullTextWords -
tagLevel
private int tagLevel -
EMPTY_BITSET
-
EMPTY_START
-
EMPTY_END
-
-
Constructor Details
-
TextBlock
-
TextBlock
-
-
Method Details
-
isContent
public boolean isContent() -
setIsContent
public boolean setIsContent(boolean isContent) -
getText
-
getNumWords
public int getNumWords() -
getNumWordsInAnchorText
public int getNumWordsInAnchorText() -
getTextDensity
public float getTextDensity() -
getLinkDensity
public float getLinkDensity() -
mergeNext
-
initDensities
private void initDensities() -
getOffsetBlocksStart
public int getOffsetBlocksStart() -
getOffsetBlocksEnd
public int getOffsetBlocksEnd() -
toString
-
addLabel
Adds an arbitrary String label to thisTextBlock.- Parameters:
label- The label- See Also:
-
hasLabel
Checks whether this TextBlock has the given label.- Parameters:
label- The label- Returns:
trueif this block is marked by the given label.
-
removeLabel
-
getLabels
Returns the labels associated to this TextBlock, ornullif no such labels exist. NOTE: The returned instance is the one used directly in TextBlock. You have full access to the data structure. However it is recommended to use the label-specific methods inTextBlockwhenever possible.- Returns:
- Returns the set of labels, or
nullif no labels was added yet.
-
addLabels
Adds a set of labels to thisTextBlock.null-references are silently ignored.- Parameters:
l- The labels to be added.
-
addLabels
Adds a set of labels to thisTextBlock.null-references are silently ignored.- Parameters:
l- The labels to be added.
-
getContainedTextElements
Returns the containedTextElements BitSet, ornull.- Returns:
-
clone
-
getTagLevel
public int getTagLevel() -
setTagLevel
public void setTagLevel(int tagLevel)
-