Class AbstractRDFParser<T extends AbstractRDFParser<T>>
- Direct Known Subclasses:
JsonLdParser, RDF4JParser
This abstract class keeps the properties in protected fields like
sourceFile using Optional. Some basic checking like
checkIsAbsolute(IRI) is performed.
This class and its subclasses are Cloneable, immutable and
(therefore) thread-safe - each call to option methods like
contentType(String) or source(IRI) will return a cloned,
mutated copy.
By default, parsing is done by the abstract method
parseSynchronusly() - which is executed in a cloned snapshot - hence
multiple parse() calls are thread-safe. The default parse()
uses a thread pool in threadGroup - but implementations can override
parse() (e.g. because it has its own threading model or use
asynchronous remote execution).
-
Nested Class Summary
Nested classes/interfaces inherited from interface RDFParser
RDFParser.ParseResult -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static RDFprivate Optional<InputStream> static final ThreadGroupprivate static final ExecutorService -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected TasT()Specify a base IRI to use for parsing any relative IRI references.Specify a base IRI to use for parsing any relative IRI references.protected voidCheck if base is required.protected voidSubclasses can override this method to check compatibility with the contentType setting.protected voidcheckIsAbsolute(IRI iri) Check if an iri is absolute.protected voidCheck that one and only one source is present and valid.protected voidSubclasses can override this method to check the target is valid.clone()contentType(String contentType) Specify the content type of the RDF syntax to parse.contentType(RDFSyntax rdfSyntax) Specify the content type of the RDF syntax to parse.protected RDFCreate a newRDFfor a parse session.fileExtension(Path path) Return the file extension of a Path - if any.getBase()Get the set baseIRI, if present.Get the set content-type String, if any.Get the set content-typeRDFSyntax, if any.Get the setRDF, if any.Get the set sourcePath.Get the set sourceInputStream.Get the set sourcePath.Get the target to consume parsed Quads.Get the target dataset as set bytarget(Dataset).Get the target graph as set bytarget(Graph).guessRDFSyntax(Path path) Guess RDFSyntax from a local file's extension.parse()Parse the specified source.protected abstract voidprotected TPrepare a clone of this RDFParser which have been checked and completed.rdfTermFactory(RDF rdfTermFactory) protected voidReset all source* fields to Optional.empty()protected voidReset all optional target* fields toOptional.empty().source(InputStream inputStream) Specify a sourceInputStreamto parse.Specify an absolute source IRI to retrieve and parse.Specify a source filePathto parse.Specify an absolute sourceIRIto retrieve and parse.Specify a consumer for parsed quads.Specify aDatasetto add parsed quads to.Specify aGraphto add parsed triples to.
-
Field Details
-
threadGroup
-
threadpool
-
internalRdfTermFactory
-
rdfTermFactory
-
contentTypeSyntax
-
contentType
-
base
-
sourceInputStream
-
sourceFile
-
sourceIri
-
target
-
targetDataset
-
targetGraph
-
-
Constructor Details
-
AbstractRDFParser
public AbstractRDFParser()
-
-
Method Details
-
getRdfTermFactory
Get the setRDF, if any.- Returns:
- The
RDFto use, orOptional.empty()if it has not been set
-
getContentTypeSyntax
Get the set content-typeRDFSyntax, if any.If this is
Optional.isPresent(), thengetContentType()contains the value ofRDFSyntax.mediaType().- Returns:
- The
RDFSyntaxof the content type, orOptional.empty()if it has not been set
-
getContentType
Get the set content-type String, if any.If this is
Optional.isPresent()and is recognized byRDFSyntax.byMediaType(String), then the correspondingRDFSyntaxis set ongetContentType(), otherwise that isOptional.empty().- Returns:
- The Content-Type IANA media type, e.g.
text/turtle, orOptional.empty()if it has not been set
-
getTarget
Get the target to consume parsed Quads.From the call to
parseSynchronusly(), this will be a non-nullvalue (as a target is a required setting).- Returns:
- The target consumer of
Quads, ornullif it has not yet been set.
-
getTargetDataset
Get the target dataset as set bytarget(Dataset).The return value is
Optional.isPresent()if and only iftarget(Dataset)has been set, meaning that the implementation may choose to append parsed quads to theDatasetdirectly instead of relying on the generatedgetTarget()consumer.If this value is present, then
getTargetGraph()MUST beOptional.empty().- Returns:
- The target Dataset, or
Optional.empty()if another kind of target has been set.
-
getTargetGraph
Get the target graph as set bytarget(Graph).The return value is
Optional.isPresent()if and only iftarget(Graph)has been set, meaning that the implementation may choose to append parsed triples to theGraphdirectly instead of relying on the generatedgetTarget()consumer.If this value is present, then
getTargetDataset()MUST beOptional.empty().- Returns:
- The target Graph, or
Optional.empty()if another kind of target has been set.
-
getBase
Get the set baseIRI, if present.- Returns:
- The base
IRI, orOptional.empty()if it has not been set
-
getSourceInputStream
Get the set sourceInputStream.If this is
Optional.isPresent(), thengetSourceFile()andgetSourceIri()areOptional.empty().- Returns:
- The source
InputStream, orOptional.empty()if it has not been set
-
getSourceFile
Get the set sourcePath.If this is
Optional.isPresent(), thengetSourceInputStream()andgetSourceIri()areOptional.empty().- Returns:
- The source
Path, orOptional.empty()if it has not been set
-
getSourceIri
Get the set sourcePath.If this is
Optional.isPresent(), thengetSourceInputStream()andgetSourceInputStream()areOptional.empty().- Returns:
- The source
IRI, orOptional.empty()if it has not been set
-
clone
-
asT
-
rdfTermFactory
Description copied from interface:RDFParserSpecify whichRDFto use for generatingRDFTerms.This option may be used together with
RDFParser.target(Graph)to override the implementation's default factory and graph.Warning: Using the same
RDFfor multipleRDFParser.parse()calls may accidentally mergeBlankNodes having the same label, as the parser may use theRDF.createBlankNode(String)method from the parsed blank node labels.- Specified by:
rdfTermFactoryin interfaceRDFParser- Parameters:
rdfTermFactory-RDFto use for generating RDFTerms.- Returns:
- An
RDFParserthat will use the specified rdfTermFactory - See Also:
-
contentType
Description copied from interface:RDFParserSpecify the content type of the RDF syntax to parse.This option can be used to select the RDFSyntax of the source, overriding any
Content-Typeheaders or equivalent.The character set of the RDFSyntax is assumed to be
StandardCharsets.UTF_8unless overridden within the document (e.g.<?xml version="1.0" encoding="iso-8859-1"?>inRDFSyntax.RDFXML).This method will override any contentType set with
RDFParser.contentType(String).- Specified by:
contentTypein interfaceRDFParser- Parameters:
rdfSyntax- AnRDFSyntaxto parse the source according to, e.g.RDFSyntax.TURTLE.- Returns:
- An
RDFParserthat will use the specified content type. - Throws:
IllegalArgumentException- If this RDFParser does not support the specified RDFSyntax.- See Also:
-
contentType
Description copied from interface:RDFParserSpecify the content type of the RDF syntax to parse.This option can be used to select the RDFSyntax of the source, overriding any
Content-Typeheaders or equivalent.The content type MAY include a
charsetparameter if the RDF media types permit it; the default charset isStandardCharsets.UTF_8unless overridden within the document.This method will override any contentType set with
RDFParser.contentType(RDFSyntax).- Specified by:
contentTypein interfaceRDFParser- Parameters:
contentType- A content-type string, e.g.application/ld+jsonortext/turtle;charset="UTF-8"as specified by RFC7231.- Returns:
- An
RDFParserthat will use the specified content type. - Throws:
IllegalArgumentException- If the contentType has an invalid syntax, or this RDFParser does not support the specified contentType.- See Also:
-
base
Description copied from interface:RDFParserSpecify a base IRI to use for parsing any relative IRI references.Setting this option will override any protocol-specific base IRI (e.g.
Content-Locationheader) or theRDFParser.source(IRI)IRI, but does not override any base IRIs set within the source document (e.g.@basein Turtle documents).If the source is in a syntax that does not support relative IRI references (e.g.
RDFSyntax.NTRIPLES), setting thebasehas no effect.This method will override any base IRI set with
RDFParser.base(String). -
base
Description copied from interface:RDFParserSpecify a base IRI to use for parsing any relative IRI references.Setting this option will override any protocol-specific base IRI (e.g.
Content-Locationheader) or theRDFParser.source(IRI)IRI, but does not override any base IRIs set within the source document (e.g.@basein Turtle documents).If the source is in a syntax that does not support relative IRI references (e.g.
RDFSyntax.NTRIPLES), setting thebasehas no effect.This method will override any base IRI set with
RDFParser.base(IRI).- Specified by:
basein interfaceRDFParser- Parameters:
base- An absolute IRI to use as a base.- Returns:
- An
RDFParserthat will use the specified base IRI. - Throws:
IllegalArgumentException- If the base is not a valid absolute IRI string- See Also:
-
source
Description copied from interface:RDFParserSpecify a sourceInputStreamto parse.The source set will not be read before the call to
RDFParser.parse().The InputStream will not be closed after parsing. The InputStream does not need to support
InputStream.markSupported().The parser might not consume the complete stream (e.g. an RDF/XML parser may not read beyond the closing tag of
</rdf:Description>).The
RDFParser.contentType(RDFSyntax)orRDFParser.contentType(String)SHOULD be set before callingRDFParser.parse().The character set is assumed to be
StandardCharsets.UTF_8unless theRDFParser.contentType(String)specifies otherwise or the document declares its own charset (e.g. RDF/XML with a<?xml encoding="iso-8859-1">header).The
RDFParser.base(IRI)orRDFParser.base(String)MUST be set before callingRDFParser.parse(), unless the RDF syntax does not permit relative IRIs (e.g.RDFSyntax.NTRIPLES).This method will override any source set with
RDFParser.source(IRI),RDFParser.source(Path)orRDFParser.source(String). -
source
Description copied from interface:RDFParserSpecify a source filePathto parse.The source set will not be read before the call to
RDFParser.parse().The
RDFParser.contentType(RDFSyntax)orRDFParser.contentType(String)SHOULD be set before callingRDFParser.parse().The character set is assumed to be
StandardCharsets.UTF_8unless theRDFParser.contentType(String)specifies otherwise or the document declares its own charset (e.g. RDF/XML with a<?xml encoding="iso-8859-1">header).The
RDFParser.base(IRI)orRDFParser.base(String)MAY be set before callingRDFParser.parse(), otherwisePath.toUri()will be used as the base IRI.This method will override any source set with
RDFParser.source(IRI),RDFParser.source(InputStream)orRDFParser.source(String). -
source
Description copied from interface:RDFParserSpecify an absolute sourceIRIto retrieve and parse.The source set will not be read before the call to
RDFParser.parse().If this builder does not support the given IRI protocol (e.g.
urn:uuid:ce667463-c5ab-4c23-9b64-701d055c4890), this method should succeed, while theRDFParser.parse()should throw anIOException.The
RDFParser.contentType(RDFSyntax)orRDFParser.contentType(String)MAY be set before callingRDFParser.parse(), in which case that type MAY be used for content negotiation (e.g.Acceptheader in HTTP), and SHOULD be used for selecting the RDFSyntax.The character set is assumed to be
StandardCharsets.UTF_8unless the protocol's equivalent ofContent-Typespecifies otherwise or the document declares its own charset (e.g. RDF/XML with a<?xml encoding="iso-8859-1">header).The
RDFParser.base(IRI)orRDFParser.base(String)MAY be set before callingRDFParser.parse(), otherwise the source IRI will be used as the base IRI.This method will override any source set with
RDFParser.source(Path),RDFParser.source(InputStream)orRDFParser.source(String). -
source
Description copied from interface:RDFParserSpecify an absolute source IRI to retrieve and parse.The source set will not be read before the call to
RDFParser.parse().If this builder does not support the given IRI (e.g.
urn:uuid:ce667463-c5ab-4c23-9b64-701d055c4890), this method should succeed, while theRDFParser.parse()should throw anIOException.The
RDFParser.contentType(RDFSyntax)orRDFParser.contentType(String)MAY be set before callingRDFParser.parse(), in which case that type MAY be used for content negotiation (e.g.Acceptheader in HTTP), and SHOULD be used for selecting the RDFSyntax.The character set is assumed to be
StandardCharsets.UTF_8unless the protocol's equivalent ofContent-Typespecifies otherwise or the document declares its own charset (e.g. RDF/XML with a<?xml encoding="iso-8859-1">header).The
RDFParser.base(IRI)orRDFParser.base(String)MAY be set before callingRDFParser.parse(), otherwise the source IRI will be used as the base IRI.This method will override any source set with
RDFParser.source(Path),RDFParser.source(InputStream)orRDFParser.source(IRI).- Specified by:
sourcein interfaceRDFParser- Parameters:
iri- An IRI to retrieve and parse- Returns:
- An
RDFParserthat will use the specified source. - Throws:
IllegalArgumentException- If the base is not a valid absolute IRI string
-
checkIsAbsolute
Check if an iri is absolute.Used by
source(String)andbase(String).- Parameters:
iri- IRI to check- Throws:
IllegalArgumentException- If the IRI is not absolute
-
checkSource
Check that one and only one source is present and valid.Used by
parse().Subclasses might override this method, e.g. to support other source combinations, or to check if the sourceIri is resolvable.
- Throws:
IOException- If a source file can't be read
-
checkBaseRequired
Check if base is required.- Throws:
IllegalStateException- if base is required, but not set.
-
resetSource
protected void resetSource()Reset all source* fields to Optional.empty()Subclasses should override this and call
super.resetSource()if they need to reset any additional source* fields. -
resetTarget
protected void resetTarget()Reset all optional target* fields toOptional.empty().Note that the consumer set for
getTarget()is note reset.Subclasses should override this and call
super.resetTarget()if they need to reset any additional target* fields. -
parseSynchronusly
ParsesourceInputStream,sourceFileorsourceIri.One of the source fields MUST be present, as checked by
checkSource().checkBaseRequired()is called to verify ifgetBase()is required.- Throws:
IOException- If the source could not be readRDFParseException- If the source could not be parsed (e.g. a .ttl file was not valid Turtle)
-
prepareForParsing
Prepare a clone of this RDFParser which have been checked and completed.The returned clone will always have
getTarget()andgetRdfTermFactory()present.If the
getSourceFile()is present, but thegetBase()is not present, the base will be set to thefile:///IRI for the Path's real path (e.g. resolving any symbolic links).- Returns:
- A completed and checked clone of this RDFParser
- Throws:
IOException- If the source was not accessible (e.g. a file was not found)IllegalStateException- If the parser was not in a compatible setting (e.g. contentType was an invalid string)
-
checkTarget
protected void checkTarget()Subclasses can override this method to check the target is valid.The default implementation throws an IllegalStateException if the target has not been set.
-
checkContentType
Subclasses can override this method to check compatibility with the contentType setting.- Throws:
IllegalStateException- if thegetContentType()orgetContentTypeSyntax()is not compatible or invalid
-
guessRDFSyntax
Guess RDFSyntax from a local file's extension.This method can be used by subclasses if
getContentType()is not present andgetSourceFile()is set.- Parameters:
path- Path which extension should be checked- Returns:
- The
RDFSyntaxwhich has a matchingRDFSyntax.fileExtension(), otherwiseOptional.empty().
-
fileExtension
Return the file extension of a Path - if any.The returned file extension includes the leading
.Note that this only returns the last extension, e.g. the file extension for
archive.tar.gzwould be.gz- Parameters:
path- Path which filename might contain an extension- Returns:
- File extension (including the leading
., orOptional.empty()if the path has no extension
-
createRDFTermFactory
Create a newRDFfor a parse session.This is called by
parse()to setrdfTermFactory(RDF)if it isOptional.empty().As parsed blank nodes might be made with
RDF.createBlankNode(String), each call to this method SHOULD return a new RDF instance.- Returns:
- A new
RDF
-
parse
Description copied from interface:RDFParserParse the specified source.A source method (e.g.
RDFParser.source(InputStream),RDFParser.source(IRI),RDFParser.source(Path),RDFParser.source(String)or an equivalent subclass method) MUST have been called before calling this method, otherwise anIllegalStateExceptionwill be thrown.A target method (e.g.
RDFParser.target(Consumer),RDFParser.target(Dataset),RDFParser.target(Graph)or an equivalent subclass method) MUST have been called before calling parse(), otherwise anIllegalStateExceptionwill be thrown.It is undefined if this method is thread-safe, however the
RDFParsermay be reused (e.g. setting a different source) as soon as theFuturehas been returned from this method.The RDFParser SHOULD perform the parsing as an asynchronous operation, and return the
Futureas soon as preliminary checks (such as validity of theRDFParser.source(IRI)andRDFParser.contentType(RDFSyntax)settings) have finished. The future SHOULD not markFuture.isDone()before parsing is complete. A synchronous implementation MAY be blocking on theparse()call and return a Future that is alreadyFuture.isDone().The returned
Futurecontains aRDFParser.ParseResult. Implementations may subclass this interface to provide any parser details, e.g. list of warnings.nullis a possible return value if no details are available, but parsing succeeded.If an exception occurs during parsing, (e.g.
IOExceptionororg.apache.commons.rdf.simple.experimental.RDFParseException), it should be indicated as theThrowable.getCause()in theExecutionExceptionthrown onFuture.get().- Specified by:
parsein interfaceRDFParser- Returns:
- A Future that will return the populated
Graphwhen the parsing has finished. - Throws:
IOException- If an error occurred while starting to read the source (e.g. file not found, unsupported IRI protocol). Note that IO errors during parsing would instead be theThrowable.getCause()of theExecutionExceptionthrown onFuture.get().IllegalStateException- If the builder is in an invalid state, e.g. asourcehas not been set.
-
target
Description copied from interface:RDFParserSpecify a consumer for parsed quads.The quads will include triples in all named graphs of the parsed source, including any triples in the default graph. When parsing a source format which do not support datasets, all quads delivered to the consumer will be in the default graph (e.g. their
Quad.getGraphName()will be asOptional.empty()), while for a sourceIt is undefined if any quads are consumed if
RDFParser.parse()throws any exceptions. On the other hand, ifRDFParser.parse()does not indicate an exception, the implementation SHOULD have produced all parsed quads to the specified consumer.Calling this method will override any earlier targets set with
RDFParser.target(Graph),RDFParser.target(Consumer)orRDFParser.target(Dataset).The consumer is not assumed to be thread safe - only one
Consumer.accept(Object)is delivered at a time for a givenRDFParser.parse()call.This method is typically called with a functional consumer, for example:
List<Quad> quads = new ArrayList<Quad>; parserBuilder.target(quads::add).parse(); -
target
Description copied from interface:RDFParserSpecify aDatasetto add parsed quads to.It is undefined if any quads are added to the specified
DatasetifRDFParser.parse()throws any exceptions. (However implementations are free to prevent this using transaction mechanisms or similar). On the other hand, ifRDFParser.parse()does not indicate an exception, the implementation SHOULD have inserted all parsed quads to the specified dataset.Calling this method will override any earlier targets set with
RDFParser.target(Graph),RDFParser.target(Consumer)orRDFParser.target(Dataset).The default implementation of this method calls
RDFParser.target(Consumer)with aConsumerthat doesDataset.add(Quad). -
target
Description copied from interface:RDFParserSpecify aGraphto add parsed triples to.If the source supports datasets (e.g. the
RDFParser.contentType(RDFSyntax)set hasRDFSyntax.supportsDataset()is true)), then only quads in the default graph will be added to the Graph asTriples.It is undefined if any triples are added to the specified
GraphifRDFParser.parse()throws any exceptions. (However implementations are free to prevent this using transaction mechanisms or similar). IfFuture.get()does not indicate an exception, the parser implementation SHOULD have inserted all parsed triples to the specified graph.Calling this method will override any earlier targets set with
RDFParser.target(Graph),RDFParser.target(Consumer)orRDFParser.target(Dataset).The default implementation of this method calls
RDFParser.target(Consumer)with aConsumerthat doesGraph.add(Triple)withQuad.asTriple()if the quad is in the default graph.
-