| AddPrecedingLabelsFilter |
Adds the labels of the preceding block to the current block, optionally adding a prefix.
|
| ArticleMetadataFilter |
Tries to find TextBlocks that comprise of "article metadata".
|
| BlockProximityFusion |
Fuses adjacent blocks if their distance (in blocks) does not exceed a certain limit.
|
| ContentFusion |
Merges two blocks using some heuristics.
|
| ExpandTitleToContentFilter |
|
| KeepLargestBlockFilter |
Keeps the largest TextBlock only (by the number of words).
|
| LabelFusion |
Fuses adjacent blocks if their labels are equal.
|
| LargeBlockSameTagLevelToContentFilter |
Marks all blocks as content that:
are on the same tag-level as very likely main content (usually the level of the largest
block)
have a significant number of words, currently: at least 100
|
| ListAtEndFilter |
Marks nested list-item blocks after the end of the main content.
|
| SimpleBlockFusionProcessor |
Merges two subsequent blocks if their text densities are equal.
|
| TrailingHeadlineToBoilerplateFilter |
|