Class TextBlock

java.lang.Object
com.kohlschutter.boilerpipe.document.TextBlock
All Implemented Interfaces:
Cloneable

public class TextBlock extends Object implements Cloneable
Describes a block of text. A block can be an "atomic" text element (i.e., a sequence of text that is not interrupted by any HTML markup) or a compound of such atomic elements.
  • Field Details

    • isContent

      boolean isContent
    • text

      private CharSequence text
    • labels

      Set<String> labels
    • offsetBlocksStart

      int offsetBlocksStart
    • offsetBlocksEnd

      int offsetBlocksEnd
    • numWords

      int numWords
    • numWordsInAnchorText

      int numWordsInAnchorText
    • numWordsInWrappedLines

      int numWordsInWrappedLines
    • numWrappedLines

      int numWrappedLines
    • textDensity

      float textDensity
    • linkDensity

      float linkDensity
    • containedTextElements

      BitSet containedTextElements
    • numFullTextWords

      private int numFullTextWords
    • tagLevel

      private int tagLevel
    • EMPTY_BITSET

      private static final BitSet EMPTY_BITSET
    • EMPTY_START

      public static final TextBlock EMPTY_START
    • EMPTY_END

      public static final TextBlock EMPTY_END
  • Constructor Details

    • TextBlock

      public TextBlock(String text)
    • TextBlock

      public TextBlock(String text, BitSet containedTextElements, int numWords, int numWordsInAnchorText, int numWordsInWrappedLines, int numWrappedLines, int offsetBlocks)
  • Method Details

    • isContent

      public boolean isContent()
    • setIsContent

      public boolean setIsContent(boolean isContent)
    • getText

      public String getText()
    • getNumWords

      public int getNumWords()
    • getNumWordsInAnchorText

      public int getNumWordsInAnchorText()
    • getTextDensity

      public float getTextDensity()
    • getLinkDensity

      public float getLinkDensity()
    • mergeNext

      public void mergeNext(TextBlock other)
    • initDensities

      private void initDensities()
    • getOffsetBlocksStart

      public int getOffsetBlocksStart()
    • getOffsetBlocksEnd

      public int getOffsetBlocksEnd()
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • addLabel

      public void addLabel(String label)
      Adds an arbitrary String label to this TextBlock.
      Parameters:
      label - The label
      See Also:
    • hasLabel

      public boolean hasLabel(String label)
      Checks whether this TextBlock has the given label.
      Parameters:
      label - The label
      Returns:
      true if this block is marked by the given label.
    • removeLabel

      public boolean removeLabel(String label)
    • getLabels

      public Set<String> getLabels()
      Returns the labels associated to this TextBlock, or null if no such labels exist. NOTE: The returned instance is the one used directly in TextBlock. You have full access to the data structure. However it is recommended to use the label-specific methods in TextBlock whenever possible.
      Returns:
      Returns the set of labels, or null if no labels was added yet.
    • addLabels

      public void addLabels(Set<String> l)
      Adds a set of labels to this TextBlock. null-references are silently ignored.
      Parameters:
      l - The labels to be added.
    • addLabels

      public void addLabels(String... l)
      Adds a set of labels to this TextBlock. null-references are silently ignored.
      Parameters:
      l - The labels to be added.
    • getContainedTextElements

      public BitSet getContainedTextElements()
      Returns the containedTextElements BitSet, or null.
      Returns:
    • clone

      protected TextBlock clone()
      Overrides:
      clone in class Object
    • getTagLevel

      public int getTagLevel()
    • setTagLevel

      public void setTagLevel(int tagLevel)