Interface RDFParser
- All Known Implementing Classes:
AbstractRDFParser, JsonLdParser, RDF4JParser
Experimental
This interface (and its implementations) should be considered at risk; they might change or be removed in the next minor update of Commons RDF. It may move to the theorg.apache.commons.rdf.api package when it has stabilized.
Description
This interface follows the
Builder pattern,
allowing to set parser settings like contentType(RDFSyntax) and
base(IRI). A caller MUST call one of the source methods
(e.g. source(IRI), source(Path),
source(InputStream)), and MUST call one of the target
methods (e.g. target(Consumer), target(Dataset),
target(Graph)) before calling parse() on the returned
RDFParser - however methods can be called in any order.
The call to parse() returns a Future, allowing asynchronous
parse operations. Callers are recommended to check Future.get() to
ensure parsing completed successfully, or catch exceptions thrown during
parsing.
Setting a method that has already been set will override any existing value
in the returned builder - regardless of the parameter type (e.g.
source(IRI) will override a previous source(Path). Settings
can be unset by passing null - note that this may require
casting, e.g. contentType( (RDFSyntax) null ) to undo a previous
call to contentType(RDFSyntax).
It is undefined if a RDFParser is mutable or thread-safe, so callers should
always use the returned modified RDFParser from the builder methods. The
builder may return itself after modification, or a cloned builder with the
modified settings applied. Implementations are however encouraged to be
immutable, thread-safe and document this. As an example starting point, see
org.apache.commons.rdf.simple.AbstractRDFParser.
Example usage:
Graph g1 = rDFTermFactory.createGraph();
new ExampleRDFParserBuilder().source(Paths.get("/tmp/graph.ttl")).contentType(RDFSyntax.TURTLE).target(g1).parse()
.get(30, TimeUnit.Seconds);
-
Nested Class Summary
Nested ClassesModifier and TypeInterfaceDescriptionstatic interfaceThe result ofparse()indicating parsing completed. -
Method Summary
Modifier and TypeMethodDescriptionSpecify a base IRI to use for parsing any relative IRI references.Specify a base IRI to use for parsing any relative IRI references.contentType(String contentType) Specify the content type of the RDF syntax to parse.contentType(RDFSyntax rdfSyntax) Specify the content type of the RDF syntax to parse.Future<? extends RDFParser.ParseResult> parse()Parse the specified source.rdfTermFactory(RDF rdfTermFactory) source(InputStream inputStream) Specify a sourceInputStreamto parse.Specify an absolute source IRI to retrieve and parse.Specify a source filePathto parse.Specify an absolute sourceIRIto retrieve and parse.Specify a consumer for parsed quads.default RDFParserSpecify aDatasetto add parsed quads to.default RDFParserSpecify aGraphto add parsed triples to.
-
Method Details
-
rdfTermFactory
Specify whichRDFto use for generatingRDFTerms.This option may be used together with
target(Graph)to override the implementation's default factory and graph.Warning: Using the same
RDFfor multipleparse()calls may accidentally mergeBlankNodes having the same label, as the parser may use theRDF.createBlankNode(String)method from the parsed blank node labels. -
contentType
Specify the content type of the RDF syntax to parse.This option can be used to select the RDFSyntax of the source, overriding any
Content-Typeheaders or equivalent.The character set of the RDFSyntax is assumed to be
StandardCharsets.UTF_8unless overridden within the document (e.g.<?xml version="1.0" encoding="iso-8859-1"?>inRDFSyntax.RDFXML).This method will override any contentType set with
contentType(String).- Parameters:
rdfSyntax- AnRDFSyntaxto parse the source according to, e.g.RDFSyntax.TURTLE.- Returns:
- An
RDFParserthat will use the specified content type. - Throws:
IllegalArgumentException- If this RDFParser does not support the specified RDFSyntax.- See Also:
-
contentType
Specify the content type of the RDF syntax to parse.This option can be used to select the RDFSyntax of the source, overriding any
Content-Typeheaders or equivalent.The content type MAY include a
charsetparameter if the RDF media types permit it; the default charset isStandardCharsets.UTF_8unless overridden within the document.This method will override any contentType set with
contentType(RDFSyntax).- Parameters:
contentType- A content-type string, e.g.application/ld+jsonortext/turtle;charset="UTF-8"as specified by RFC7231.- Returns:
- An
RDFParserthat will use the specified content type. - Throws:
IllegalArgumentException- If the contentType has an invalid syntax, or this RDFParser does not support the specified contentType.- See Also:
-
target
Specify aGraphto add parsed triples to.If the source supports datasets (e.g. the
contentType(RDFSyntax)set hasRDFSyntax.supportsDataset()is true)), then only quads in the default graph will be added to the Graph asTriples.It is undefined if any triples are added to the specified
Graphifparse()throws any exceptions. (However implementations are free to prevent this using transaction mechanisms or similar). IfFuture.get()does not indicate an exception, the parser implementation SHOULD have inserted all parsed triples to the specified graph.Calling this method will override any earlier targets set with
target(Graph),target(Consumer)ortarget(Dataset).The default implementation of this method calls
target(Consumer)with aConsumerthat doesGraph.add(Triple)withQuad.asTriple()if the quad is in the default graph. -
target
Specify aDatasetto add parsed quads to.It is undefined if any quads are added to the specified
Datasetifparse()throws any exceptions. (However implementations are free to prevent this using transaction mechanisms or similar). On the other hand, ifparse()does not indicate an exception, the implementation SHOULD have inserted all parsed quads to the specified dataset.Calling this method will override any earlier targets set with
target(Graph),target(Consumer)ortarget(Dataset).The default implementation of this method calls
target(Consumer)with aConsumerthat doesDataset.add(Quad). -
target
Specify a consumer for parsed quads.The quads will include triples in all named graphs of the parsed source, including any triples in the default graph. When parsing a source format which do not support datasets, all quads delivered to the consumer will be in the default graph (e.g. their
Quad.getGraphName()will be asOptional.empty()), while for a sourceIt is undefined if any quads are consumed if
parse()throws any exceptions. On the other hand, ifparse()does not indicate an exception, the implementation SHOULD have produced all parsed quads to the specified consumer.Calling this method will override any earlier targets set with
target(Graph),target(Consumer)ortarget(Dataset).The consumer is not assumed to be thread safe - only one
Consumer.accept(Object)is delivered at a time for a givenparse()call.This method is typically called with a functional consumer, for example:
List<Quad> quads = new ArrayList<Quad>; parserBuilder.target(quads::add).parse(); -
base
Specify a base IRI to use for parsing any relative IRI references.Setting this option will override any protocol-specific base IRI (e.g.
Content-Locationheader) or thesource(IRI)IRI, but does not override any base IRIs set within the source document (e.g.@basein Turtle documents).If the source is in a syntax that does not support relative IRI references (e.g.
RDFSyntax.NTRIPLES), setting thebasehas no effect.This method will override any base IRI set with
base(String).- Parameters:
base- An absolute IRI to use as a base.- Returns:
- An
RDFParserthat will use the specified base IRI. - See Also:
-
base
Specify a base IRI to use for parsing any relative IRI references.Setting this option will override any protocol-specific base IRI (e.g.
Content-Locationheader) or thesource(IRI)IRI, but does not override any base IRIs set within the source document (e.g.@basein Turtle documents).If the source is in a syntax that does not support relative IRI references (e.g.
RDFSyntax.NTRIPLES), setting thebasehas no effect.This method will override any base IRI set with
base(IRI).- Parameters:
base- An absolute IRI to use as a base.- Returns:
- An
RDFParserthat will use the specified base IRI. - Throws:
IllegalArgumentException- If the base is not a valid absolute IRI string- See Also:
-
source
Specify a sourceInputStreamto parse.The source set will not be read before the call to
parse().The InputStream will not be closed after parsing. The InputStream does not need to support
InputStream.markSupported().The parser might not consume the complete stream (e.g. an RDF/XML parser may not read beyond the closing tag of
</rdf:Description>).The
contentType(RDFSyntax)orcontentType(String)SHOULD be set before callingparse().The character set is assumed to be
StandardCharsets.UTF_8unless thecontentType(String)specifies otherwise or the document declares its own charset (e.g. RDF/XML with a<?xml encoding="iso-8859-1">header).The
base(IRI)orbase(String)MUST be set before callingparse(), unless the RDF syntax does not permit relative IRIs (e.g.RDFSyntax.NTRIPLES).This method will override any source set with
source(IRI),source(Path)orsource(String).- Parameters:
inputStream- An InputStream to consume- Returns:
- An
RDFParserthat will use the specified source.
-
source
Specify a source filePathto parse.The source set will not be read before the call to
parse().The
contentType(RDFSyntax)orcontentType(String)SHOULD be set before callingparse().The character set is assumed to be
StandardCharsets.UTF_8unless thecontentType(String)specifies otherwise or the document declares its own charset (e.g. RDF/XML with a<?xml encoding="iso-8859-1">header).The
base(IRI)orbase(String)MAY be set before callingparse(), otherwisePath.toUri()will be used as the base IRI.This method will override any source set with
source(IRI),source(InputStream)orsource(String).- Parameters:
file- A Path for a file to parse- Returns:
- An
RDFParserthat will use the specified source.
-
source
Specify an absolute sourceIRIto retrieve and parse.The source set will not be read before the call to
parse().If this builder does not support the given IRI protocol (e.g.
urn:uuid:ce667463-c5ab-4c23-9b64-701d055c4890), this method should succeed, while theparse()should throw anIOException.The
contentType(RDFSyntax)orcontentType(String)MAY be set before callingparse(), in which case that type MAY be used for content negotiation (e.g.Acceptheader in HTTP), and SHOULD be used for selecting the RDFSyntax.The character set is assumed to be
StandardCharsets.UTF_8unless the protocol's equivalent ofContent-Typespecifies otherwise or the document declares its own charset (e.g. RDF/XML with a<?xml encoding="iso-8859-1">header).The
base(IRI)orbase(String)MAY be set before callingparse(), otherwise the source IRI will be used as the base IRI.This method will override any source set with
source(Path),source(InputStream)orsource(String).- Parameters:
iri- An IRI to retrieve and parse- Returns:
- An
RDFParserthat will use the specified source.
-
source
Specify an absolute source IRI to retrieve and parse.The source set will not be read before the call to
parse().If this builder does not support the given IRI (e.g.
urn:uuid:ce667463-c5ab-4c23-9b64-701d055c4890), this method should succeed, while theparse()should throw anIOException.The
contentType(RDFSyntax)orcontentType(String)MAY be set before callingparse(), in which case that type MAY be used for content negotiation (e.g.Acceptheader in HTTP), and SHOULD be used for selecting the RDFSyntax.The character set is assumed to be
StandardCharsets.UTF_8unless the protocol's equivalent ofContent-Typespecifies otherwise or the document declares its own charset (e.g. RDF/XML with a<?xml encoding="iso-8859-1">header).The
base(IRI)orbase(String)MAY be set before callingparse(), otherwise the source IRI will be used as the base IRI.This method will override any source set with
source(Path),source(InputStream)orsource(IRI).- Parameters:
iri- An IRI to retrieve and parse- Returns:
- An
RDFParserthat will use the specified source. - Throws:
IllegalArgumentException- If the base is not a valid absolute IRI string
-
parse
Parse the specified source.A source method (e.g.
source(InputStream),source(IRI),source(Path),source(String)or an equivalent subclass method) MUST have been called before calling this method, otherwise anIllegalStateExceptionwill be thrown.A target method (e.g.
target(Consumer),target(Dataset),target(Graph)or an equivalent subclass method) MUST have been called before calling parse(), otherwise anIllegalStateExceptionwill be thrown.It is undefined if this method is thread-safe, however the
RDFParsermay be reused (e.g. setting a different source) as soon as theFuturehas been returned from this method.The RDFParser SHOULD perform the parsing as an asynchronous operation, and return the
Futureas soon as preliminary checks (such as validity of thesource(IRI)andcontentType(RDFSyntax)settings) have finished. The future SHOULD not markFuture.isDone()before parsing is complete. A synchronous implementation MAY be blocking on theparse()call and return a Future that is alreadyFuture.isDone().The returned
Futurecontains aRDFParser.ParseResult. Implementations may subclass this interface to provide any parser details, e.g. list of warnings.nullis a possible return value if no details are available, but parsing succeeded.If an exception occurs during parsing, (e.g.
IOExceptionororg.apache.commons.rdf.simple.experimental.RDFParseException), it should be indicated as theThrowable.getCause()in theExecutionExceptionthrown onFuture.get().- Returns:
- A Future that will return the populated
Graphwhen the parsing has finished. - Throws:
IOException- If an error occurred while starting to read the source (e.g. file not found, unsupported IRI protocol). Note that IO errors during parsing would instead be theThrowable.getCause()of theExecutionExceptionthrown onFuture.get().IllegalStateException- If the builder is in an invalid state, e.g. asourcehas not been set.
-