Package nl.siegmann.epublib.search
Class SearchIndex
- java.lang.Object
-
- nl.siegmann.epublib.search.SearchIndex
-
public class SearchIndex extends java.lang.ObjectA searchindex for searching through a book.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private static classSearchIndex.ResourceSearchIndex
-
Field Summary
Fields Modifier and Type Field Description private Bookbookprivate static org.slf4j.Loggerlogstatic intNBSPprivate static java.util.regex.PatternREMOVE_ACCENT_PATTERNprivate java.util.List<SearchIndex.ResourceSearchIndex>resourceSearchIndexesprivate static java.util.regex.PatternWHITESPACE_PATTERN
-
Constructor Summary
Constructors Constructor Description SearchIndex()SearchIndex(Book book)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static java.lang.StringcleanText(java.lang.String text)Turns html encoded text into plain text.private static SearchIndex.ResourceSearchIndexcreateResourceSearchIndex(Resource resource)private static java.util.List<SearchIndex.ResourceSearchIndex>createSearchIndex(Book book)SearchResultsdoSearch(java.lang.String searchTerm)protected static java.util.List<SearchResult>doSearch(java.lang.String searchTerm, java.lang.String content, Resource resource)private static java.util.List<SearchResult>doSearch(java.lang.String searchTerm, SearchIndex.ResourceSearchIndex resourceSearchIndex)BookgetBook()static java.lang.StringgetSearchContent(java.io.Reader content)static java.lang.StringgetSearchContent(Resource resource)voidinitBook(Book book)private static booleanisHtmlWhitespace(int c)Checks whether the given character is a java whitespace or a non-breaking-space ( ).static java.lang.StringunicodeTrim(java.lang.String text)
-
-
-
Field Detail
-
log
private static final org.slf4j.Logger log
-
NBSP
public static int NBSP
-
WHITESPACE_PATTERN
private static final java.util.regex.Pattern WHITESPACE_PATTERN
-
REMOVE_ACCENT_PATTERN
private static final java.util.regex.Pattern REMOVE_ACCENT_PATTERN
-
resourceSearchIndexes
private java.util.List<SearchIndex.ResourceSearchIndex> resourceSearchIndexes
-
book
private Book book
-
-
Constructor Detail
-
SearchIndex
public SearchIndex()
-
SearchIndex
public SearchIndex(Book book)
-
-
Method Detail
-
getBook
public Book getBook()
-
createResourceSearchIndex
private static SearchIndex.ResourceSearchIndex createResourceSearchIndex(Resource resource)
-
initBook
public void initBook(Book book)
-
createSearchIndex
private static java.util.List<SearchIndex.ResourceSearchIndex> createSearchIndex(Book book)
-
doSearch
public SearchResults doSearch(java.lang.String searchTerm)
-
getSearchContent
public static java.lang.String getSearchContent(Resource resource)
-
getSearchContent
public static java.lang.String getSearchContent(java.io.Reader content)
-
isHtmlWhitespace
private static boolean isHtmlWhitespace(int c)
Checks whether the given character is a java whitespace or a non-breaking-space ( ).- Parameters:
c-- Returns:
- whether the given character is a java whitespace or a non-breaking-space ( ).
-
unicodeTrim
public static java.lang.String unicodeTrim(java.lang.String text)
-
cleanText
public static java.lang.String cleanText(java.lang.String text)
Turns html encoded text into plain text. Replaces ö type of expressions into ¨
Removes accents
Replaces multiple whitespaces with a single space.- Parameters:
text-- Returns:
- html encoded text turned into plain text.
-
doSearch
private static java.util.List<SearchResult> doSearch(java.lang.String searchTerm, SearchIndex.ResourceSearchIndex resourceSearchIndex)
-
doSearch
protected static java.util.List<SearchResult> doSearch(java.lang.String searchTerm, java.lang.String content, Resource resource)
-
-