Class HTMLHighlighter


  • public final class HTMLHighlighter
    extends java.lang.Object
    Highlights text blocks in an HTML document that have been marked as "content" in the corresponding TextDocument.
    • Field Detail

      • tagWhitelist

        private java.util.Map<java.lang.String,​java.util.Set<java.lang.String>> tagWhitelist
      • PAT_TAG_NO_TEXT

        private static final java.util.regex.Pattern PAT_TAG_NO_TEXT
      • PAT_SUPER_TAG

        private static final java.util.regex.Pattern PAT_SUPER_TAG
      • outputHighlightOnly

        private boolean outputHighlightOnly
      • extraStyleSheet

        private java.lang.String extraStyleSheet
      • preHighlight

        private java.lang.String preHighlight
      • postHighlight

        private java.lang.String postHighlight
    • Constructor Detail

      • HTMLHighlighter

        private HTMLHighlighter​(boolean extractHTML)
    • Method Detail

      • newHighlightingInstance

        public static HTMLHighlighter newHighlightingInstance()
        Creates a new HTMLHighlighter, which is set-up to return the full HTML text, with the extracted text portion highlighted.
      • newExtractingInstance

        public static HTMLHighlighter newExtractingInstance()
        Creates a new HTMLHighlighter, which is set-up to return only the extracted HTML text, including enclosed markup.
      • isOutputHighlightOnly

        public boolean isOutputHighlightOnly()
        If true, only HTML enclosed within highlighted content will be returned
      • setOutputHighlightOnly

        public void setOutputHighlightOnly​(boolean outputHighlightOnly)
        Sets whether only HTML enclosed within highlighted content will be returned, or the whole HTML document.
      • getExtraStyleSheet

        public java.lang.String getExtraStyleSheet()
        Returns the extra stylesheet definition that will be inserted in the HEAD element. By default, this corresponds to a simple definition that marks text in class "x-boilerpipe-mark1" as inline text with yellow background.
      • setExtraStyleSheet

        public void setExtraStyleSheet​(java.lang.String extraStyleSheet)
        Sets the extra stylesheet definition that will be inserted in the HEAD element. To disable, set it to the empty string: ""
        Parameters:
        extraStyleSheet - Plain HTML
      • getPreHighlight

        public java.lang.String getPreHighlight()
        Returns the string that will be inserted before any highlighted HTML block. By default, this corresponds to <span class=&qupt;x-boilerpipe-mark1">
      • setPreHighlight

        public void setPreHighlight​(java.lang.String preHighlight)
        Sets the string that will be inserted prior to any highlighted HTML block. To disable, set it to the empty string: ""
      • getPostHighlight

        public java.lang.String getPostHighlight()
        Returns the string that will be inserted after any highlighted HTML block. By default, this corresponds to </span>
      • setPostHighlight

        public void setPostHighlight​(java.lang.String postHighlight)
        Sets the string that will be inserted after any highlighted HTML block. To disable, set it to the empty string: ""
      • xmlEncode

        private static java.lang.String xmlEncode​(java.lang.String in)
      • getTagWhitelist

        public java.util.Map<java.lang.String,​java.util.Set<java.lang.String>> getTagWhitelist()
      • setTagWhitelist

        public void setTagWhitelist​(java.util.Map<java.lang.String,​java.util.Set<java.lang.String>> tagWhitelist)