SPARQL is a recursive acronym standing for SPARQL Protocol and RDF Query Language. As the name implies, SPARQL is a general term for both a protocol and a query language.
Most uses of the SPARQL acronym refer to the RDF query language. In this usage, SPARQL is a syntactically-SQL-like language for querying RDF graphs via pattern matching. The language's features include basic conjunctive patterns, value filters, optional patterns, and pattern disjunction.
The SPARQL protocol is a method for remote invocation of SPARQL queries. It specifies a simple interface that can be supported via HTTP or SOAP that a client can use to issue SPARQL queries against some endpoint.
Both the SPARQL query language and the SPARQL protocol are products of the W3C's RDF Data Access Working Group. The latest released versions of the Working Group's specifications (excluding intermediate working drafts) can be found here:
There are a variety of SPARQL tutorials and introductions scattered around the Web. Some notable ones include:
The jury is still out on best practices surrounding using SPARQL compared to other query languages. Some benefits of SPARQL include:
Some drawbacks are:
The community maintains a list of SPARQL implementations at the W3C ESW Wiki.
Several software packages exist which allow SPARQL queries to generate answers from data sources other than RDF, such as relational databases, LDAP servers, or XML data. The community maintains a list of these tools at the W3C ESW Wiki.
The W3C recently completed an incubator group examining the state of the art in accessing relational databases via SPARQL. As a result of this incubator group, a new Working Group may be established to produce specifications in this area.
There is currently no established, interoperable method for representing or accessing functional descriptions of SPARQL endpoints. (This is not to be confused with the WSDL which describes the SPARQL Protocol itself.)
The Data Access Working Group postponed this topic in 2005, leaving behind a draft "of historical interest only." In the meantime, implementations have devised their own vocabularies and techniques for specifying and advertising the services and datasets supported by a SPARQL endpoint. For example, HP Labs' Joseki allows service descriptions to be specified with an RDF configuration vocabulary. See the SPARQL service description wiki page for more information.
The performance of a SPARQL query against any particular dataset depends not only upon the size of the dataset but also on the nature of the dataset's storage (a relational store, a native triple store, LDAP, etc.), the complexity of the query itself, optimizations in use by the SPARQL engine, the distribution of the data, and other environmental factors. To date, little work has been done in analyzing SPARQL query performance in particular, and the field of SPARQL query optimization is relatively inchoate.
Some analysis has been done on the topic of RDF stores which can handle large datasets. (A large dataset in this context is usually considered one on the order of tens or hundreds of millions of triples). The W3C ESW wiki contains information on a variety of RDF stores which can scale to large numbers of triples, but does not speak specifically to the performance of SPARQL queries against these stores.
The creators of several SPARQL implementations provide online services where SPARQL queries can be input and executed against either canned datasets or arbitrary datasets (identified by URLs). The community maintains a list of SPARQL endpoints on the W3C ESW Wiki.
On January 15, 2008, the W3C published the three SPARQL specifications as Recommendations. This means that the specifications have been endorsed by the W3C membership and should be considered stable documents.
In February 2009, the W3C re-chartered the SPARQL Working Group (new name) to produce specifications for extensions to the SPARQL language, protocol, and landscape. This new Working Group has an 18-month timeframe.
The in-progress SPARQL Working Group is working on a new version of SPARQL. The group's work can be followed on two public mailing lists:
SPARQL provides the function, regex(), which can be used to test whether a literal value contains a certain substring:
SELECT ?title WHERE { _:book :title ?title . FILTER (regex(?title, "SPARQL")) . }
The regex() function expects its first argument to be either a plain literal without a language tag or else a typed literal with a datatype of xsd:string. Plain literals with a language tag or typed literals of other datatypes will evaluate to a type error which causes the filter to fail. If you wish regex() to match solely based upon a literal's lexical value, use the str() function, which converts typed and plain literals to simple literals—i.e., plain literals without a language tag:
SELECT ?title WHERE { _:book :title ?title . FILTER (regex(str(?title), "SPARQL")) . }
There is no built-in support within SPARQL to query hierarchical structures of an unknown depth (e.g. trees or lists), to query transitive relations, or to query via XPath like paths. The Data Access Working Group postponed this issue in early 2005.
There are several workarounds to perform these queries using SPARQL:
While SPARQL does support nested graph patterns, it does not directly support subqueries (for example, the FROM clause of a SPARQL query cannot itself contain a CONSTRUCT query which generates the dataset to be queried against). The Data Access Working Group postponed this issue in early 2005. A very limited form of subqueries can be accomplished with SPARQL engines that will perform HTTP GETs upon graphs named in FROM or FROM NAMED clauses by creating a URL consisting of an embedded SPARQL CONSTRUCT query submitted to a SPARQL endpoint and supplying this URL as part of the RDF dataset being queried. In practice, this technique is often inefficient and is subject to possible URL-maximum-length restrictions of the involved software.
SPARQL does not support setting variable bindings except via graph pattern matching or via the GRAPH ?g construct. Expressions are not allowed in the SELECT list of a SPARQL query.
A combination of the SPARQL OPTIONAL keyword and the bound(...) filter function can be used to mimic some universally quantified queries. As an example, consider this query which finds the minimum price of every book in the underlying default graph:
PREFIX ex: <http://example.org/> SELECT ?book ?minprice WHERE { ?book a ex:book ; ex:price ?minprice . OPTIONAL { ?book ex:price ?otherprice . FILTER( ?otherprice < ?minprice ) . } . FILTER ( !bound(?otherprice) ) . }
When writing SPARQL queries against heterogeneous data sources, one often wants to select a value for a certain purpose without knowing which one of several predicates might be used in the data. The SPARQL OPTIONAL keyword can be used to accomplish this. Suppose we are selecting a human-readable label for a Web page, and we want to use the value of the Dublin Core title predicate (dc:title) if it exists, and otherwise use the value of the rdfs:label predicate. This can be accomplished with SPARQL idiom:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?text WHERE { OPTIONAL { <http://example.org/myWebPage> dc:title ?text . } OPTIONAL { <http://example.org/myWebPage> rdfs:label ?text . } }
More information on this idiom is available in this blog post.
It is sometimes desirable to use SPARQL to query triples entailed from subclass, subproperty, range, domain, and other relations which can be represented using RDF Schema.
The SPARQL specification defines the results of queries based on RDF simple entailment. However, the specification does present a general, parametrized definition of graph pattern matching that can be expanded to other entailments beyond RDF simple entailment.
As an alternative, the SPARQL specification acknowledges that queries are often performed against a virtual graph which may not be fully materialized. Because RDFS entailment rules will always lead to a unique, deductive closure of a base graph, a query engine can treat the RDFS deductive closure of a base graph as the virtual graph against which (simple-entailment based) SPARQL queries are executed.
Whether or not a particular query endpoint supports RDFS entailment is implementation defined. Such a property might be advertised as part of the endpoint's functional description.
For more information, see information on SPARQL extensions for other entailment regimes.
For the most part, the answer to this question is the same as the above answer regarding RDFS entailment. However, OWL-DL axioms do not always result in a unique deductive closure, and as such querying OWL-DL entailments requires instantiating the parametrized SPARQL basic graph pattern definition with values appropriate for OWL-DL entailment. The open-source OWL-DL reasoner, Pellet, will answer SPARQL queries while considering OWL-DL entailments.
For more information, see information on SPARQL extensions for other entailment regimes.
For the most part, blank nodes in SPARQL queries function exactly as variables which cannot be returned to the user/client. So, the following two queries behave identically:
SELECT ?title WHERE { _:book :hasTitle ?title }
SELECT ?title WHERE { ?book :hasTitle ?title }
However, a SPARQL query may not reuse the same blank node label twice in different basic graph patterns. That is, the following is not a legal SPARQL query:
SELECT ?title WHERE { _:book rdfs:seeAlso ?g . GRAPH ?g { _:book dc:title ?title } }
In most cases, it is best practice to use the [] and [ :prop :obj ] syntaxes for blank nodes and to only use explicit blank node labels for constructs that cannot be expresed otherwise.
Well, it is!
Numbers in RDF data can be represented as plain literals or as typed literals (using XML Schema datatypes such as xsd:int). Numbers written in a SPARQL query with surrounding quotation marks (e.g., "4") will only match plain literals in the dataset. Numbers written without quotation marks (e.g. 4) will only match typed literals. Be sure to use the appropriate form for the data you are querying.
If your data contains numbers as plain literals, then they will be compared as strings not numbers, and you may have to cast them to typed literals to get the desired results. For example, to check if a number expressed as a plain literal is less than 100, you'd say:
... FILTER (xsd:int(?number) < 100) ...
Many thanks to Richard Cyganiak for contributing this question and answer.
The current, standardized version of SPARQL deals only with retrieving selected data from RDF graphs. There is no equivalent of the SQL INSERT, UPDATE, or DELETE statements. Most RDF-based applications handle new, changing, and stale data directly via the APIs provided by specific RDF storage systems. Alternatively, RDF data can exist virtually (i.e. created on-demand in response to a SPARQL query). Also, there are systems which create RDF data from other forms of markup, such as Wiki markup or the Atom Syndication Format.
However, there is significant active work going on to extend SPARQL to support update operations. See the SPARQL extension wiki page dealing with update for more details.
The SPARQL protocol requires that the results of SPARQL CONSTRUCT and DESCRIBE queries be RDF graphs.
SELECT and ASK queries, on the other hand, usually have their results returned as XML (or sometimes as JSON). However, as part of the group's test suite, the DAWG includes an RDF vocabulary which can be used to represent the results of SELECT or ASK queries in RDF.
The DAWG defines an XML serialization format, the SPARQL Query Results XML Format, which is used to return the results of SPARQL SELECT and ASK queries. The SPARQL protocol requires that the results of SPARQL CONSTRUCT and DESCRIBE queries be RDF graphs, which can be represented using the XML-based RDF/XML syntax.
JSON is a lightweight serialization format which is a subset of JavaScript's literal-object notation. It is a useful format to use for SPARQL results when a Web client is issuing the queries and processing the results. The DAWG has published a Note defining a JSON format that closely follows the SPARQL Query Results XML Format, and which can be used to return the results of SPARQL SELECT and ASK queries.