Package org.apache.uima.tools.components
Class XmlDetagger
- java.lang.Object
-
- org.apache.uima.analysis_component.AnalysisComponent_ImplBase
-
- org.apache.uima.analysis_component.Annotator_ImplBase
-
- org.apache.uima.analysis_component.CasAnnotator_ImplBase
-
- org.apache.uima.tools.components.XmlDetagger
-
- All Implemented Interfaces:
AnalysisComponent
public class XmlDetagger extends CasAnnotator_ImplBase
A multi-sofa annotator that does XML detagging. Reads XML data from the input Sofa (named "xmlDocument"); this data can be stored in the CAS as a string or array, or it can be a URI to a remote file. The XML is parsed using the JVM's default parser, and the plain-text content is written to a new sofa called "plainTextDocument".
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description (package private) classXmlDetagger.DetagHandler
-
Field Summary
Fields Modifier and Type Field Description private java.lang.StringmXmlTagContainingTextstatic java.lang.StringPARAM_TEXT_TAGName of optional configuration parameter that contains the name of an XML tag that appears in the input file.private javax.xml.parsers.SAXParserFactoryparserFactoryprivate TypesourceDocInfoType
-
Constructor Summary
Constructors Constructor Description XmlDetagger()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static AnalysisEngineDescriptiongetDescription()Parses and returns the descriptor for this Analysis Gnein.static java.net.URLgetDescriptorURL()voidinitialize(UimaContext aContext)Performs any startup tasks required by this component.voidprocess(CAS aCAS)Inputs a CAS to the AnalysisComponent.voidtypeSystemInit(TypeSystem aTypeSystem)Informs this annotator that the CAS TypeSystem has changed.-
Methods inherited from class org.apache.uima.analysis_component.CasAnnotator_ImplBase
getRequiredCasInterface, process
-
Methods inherited from class org.apache.uima.analysis_component.Annotator_ImplBase
getCasInstancesRequired, hasNext, next
-
Methods inherited from class org.apache.uima.analysis_component.AnalysisComponent_ImplBase
batchProcessComplete, collectionProcessComplete, destroy, getContext, getLogger, getResultSpecification, reconfigure, setResultSpecification
-
-
-
-
Field Detail
-
PARAM_TEXT_TAG
public static final java.lang.String PARAM_TEXT_TAG
Name of optional configuration parameter that contains the name of an XML tag that appears in the input file. Only text that falls within this XML tag will be considered part of the "document" that it is added to the CAS by this CAS Initializer. If not specified, the entire file will be considered the document.- See Also:
- Constant Field Values
-
parserFactory
private javax.xml.parsers.SAXParserFactory parserFactory
-
sourceDocInfoType
private Type sourceDocInfoType
-
mXmlTagContainingText
private java.lang.String mXmlTagContainingText
-
-
Method Detail
-
initialize
public void initialize(UimaContext aContext) throws ResourceInitializationException
Description copied from interface:AnalysisComponentPerforms any startup tasks required by this component. The framework calls this method only once, just after the AnalysisComponent has been instantiated.The framework supplies this AnalysisComponent with a reference to the
UimaContextthat it will use, for example to access configuration settings or resources. This AnalysisComponent should store a reference to its theUimaContextfor later use.- Specified by:
initializein interfaceAnalysisComponent- Overrides:
initializein classAnalysisComponent_ImplBase- Parameters:
aContext- Provides access to services and resources managed by the framework. This includes configuration parameters, logging, and access to external resources.- Throws:
ResourceInitializationException- if this AnalysisComponent cannot initialize successfully.
-
typeSystemInit
public void typeSystemInit(TypeSystem aTypeSystem) throws AnalysisEngineProcessException
Description copied from class:CasAnnotator_ImplBaseInforms this annotator that the CAS TypeSystem has changed. The Analysis Engine calls this from PrimitiveAnalysisEngine_impl which-calls CasAnnotator_ImplBase.process which-calls checkTypeSystemChangeIn this method, the Annotator should use the
TypeSystemto resolve the names of Type and Features to the actualTypeandFeatureobjects, which can then be used during processing.- Overrides:
typeSystemInitin classCasAnnotator_ImplBase- Parameters:
aTypeSystem- the new type system to use as input to your initialization- Throws:
AnalysisEngineProcessException- if the provided type system is missing types or features required by this annotator
-
process
public void process(CAS aCAS) throws AnalysisEngineProcessException
Description copied from class:CasAnnotator_ImplBaseInputs a CAS to the AnalysisComponent. This method should be overriden by subclasses to perform analysis of the CAS.- Specified by:
processin classCasAnnotator_ImplBase- Parameters:
aCAS- A CAS that this AnalysisComponent should process.- Throws:
AnalysisEngineProcessException- if a problem occurs during processing
-
getDescription
public static AnalysisEngineDescription getDescription() throws InvalidXMLException
Parses and returns the descriptor for this Analysis Gnein. The descriptor is stored in the uima-core.jar file and located using the ClassLoader.- Returns:
- an object containing all of the information parsed from the descriptor.
- Throws:
InvalidXMLException- if the descriptor is invalid or missing
-
getDescriptorURL
public static java.net.URL getDescriptorURL()
-
-