General SPARQL Discussion

What is SPARQL?

SPARQL is a recursive acronym standing for SPARQL Protocol and RDF Query Language. As the name implies, SPARQL is a general term for both a protocol and a query language.

Most uses of the SPARQL acronym refer to the RDF query language. In this usage, SPARQL is a syntactically-SQL-like language for querying RDF graphs via pattern matching. The language's features include basic conjunctive patterns, value filters, optional patterns, and pattern disjunction.

The SPARQL protocol is a method for remote invocation of SPARQL queries. It specifies a simple interface that can be supported via HTTP or SOAP that a client can use to issue SPARQL queries against some endpoint.

Both the SPARQL query language and the SPARQL protocol are products of the W3C's RDF Data Access Working Group. The latest released versions of the Working Group's specifications (excluding intermediate working drafts) can be found here:

How can I learn SPARQL?

There are a variety of SPARQL tutorials and introductions scattered around the Web. Some notable ones include:

What are the benefits/drawbacks of SPARQL vis a vis SQL and XQuery?

The jury is still out on best practices surrounding using SPARQL compared to other query languages. Some benefits of SPARQL include:

  • Queries RDF data. If your data is in RDF, then SPARQL can query it natively.
  • Implicit join syntax. SPARQL queries RDF graphs, which consist of various triples expressing binary relations between resources, by specifying a subgraph with certain resources replaced by variables. Because all relationships are of a fixed size and data lives in a single graph, SPARQL does not require explicit joins that specify the relationship between differently structured data. That is, SPARQL is a query language for pattern matching against RDF graphs, and the queies themselves look and act like RDF. This is one main point made by Oracle's Jim Melton in his analysis of SPARQL vis a vis SQL and XQuery: SQL, XQuery, and SPARQL: What's Wrong With This Picture?.
  • SPARQL has strong support for querying semistructured and ragged data—i.e., data with an unpredictable and unreliable structure. Variables may occur in the predicate position to query unknown relationships, and the OPTIONAL keyword provides support for querying relationships that may or may not occur in the data (a la SQL left joins).
  • SPARQL is often an appropriate query language for querying disparate data sources (not sharing a single native representation) in a single query. Because RDF represents all data as a collection of simple binary relations, most data can be easily mapped to RDF and then queried and joined using SPARQL. Often, these mappings can be performed on the fly, meaning that SPARQL can be used to join heterogeneous data at a higher level than that of the native structure of the data.
  • SPARQL is built to support queries in a networked, web environment. SPARQL introduces the notion of an RDF dataset, which is the pairing of a default graph and zero or more named graphs. As both the default graph and the named graphs are identified by URIs, it is common for SPARQL implementations to retrieve a graph by performing an HTTP GET on the graph's URI. This allows a single query to join information from multiple data sources accessible across different Web sites.
  • Similarly, the SPARQL GRAPH keyword allows data to be queried along with its provenance information. GRAPH can be used to discover the URI of the graph that contains the data that matches the query.

Some drawbacks are:

  • Lack of wide deployment. SPARQL is relatively young, and as such there are not many data stores which can be directly queried with SPARQL (as compared with SQL or XPath).
  • Immaturity. As a young query language, SPARQL lacks the explicit processing model of XQuery or the decades of SQL-optimization research. As with the above point, this is likely to improve as current and new research and implementations contribute to a body of knowledge surrounding SPARQL.
  • Lack of support for transitive/hierarchical queries. While SPARQL is designed to query RDF graphs, SPARQL has no facilities for easily querying transitive relations or hierarchical structures within a graph. There are some workarounds for this, but SPARQL does not approach the power of, for instance, XQuery's axes.

What SPARQL implementations are available?

The community maintains a list of SPARQL implementations at the W3C ESW Wiki.

Can I use SPARQL to query data that's not stored in RDF?

Several software packages exist which allow SPARQL queries to generate answers from data sources other than RDF, such as relational databases, LDAP servers, or XML data. The community maintains a list of these tools at the W3C ESW Wiki.

The W3C recently completed an incubator group examining the state of the art in accessing relational databases via SPARQL. As a result of this incubator group, a new Working Group may be established to produce specifications in this area.

How can I tell what dataset, functions, or extensions a SPARQL endpoint supports?

There is currently no established, interoperable method for representing or accessing functional descriptions of SPARQL endpoints. (This is not to be confused with the WSDL which describes the SPARQL Protocol itself.)

The Data Access Working Group postponed this topic in 2005, leaving behind a draft "of historical interest only." In the meantime, implementations have devised their own vocabularies and techniques for specifying and advertising the services and datasets supported by a SPARQL endpoint. For example, HP Labs' Joseki allows service descriptions to be specified with an RDF configuration vocabulary. See the SPARQL service description wiki page for more information.

Do SPARQL queries perform well against small datasets? Large datasets?

The performance of a SPARQL query against any particular dataset depends not only upon the size of the dataset but also on the nature of the dataset's storage (a relational store, a native triple store, LDAP, etc.), the complexity of the query itself, optimizations in use by the SPARQL engine, the distribution of the data, and other environmental factors. To date, little work has been done in analyzing SPARQL query performance in particular, and the field of SPARQL query optimization is relatively inchoate.

Some analysis has been done on the topic of RDF stores which can handle large datasets. (A large dataset in this context is usually considered one on the order of tens or hundreds of millions of triples). The W3C ESW wiki contains information on a variety of RDF stores which can scale to large numbers of triples, but does not speak specifically to the performance of SPARQL queries against these stores.

Is there anywhere on the Web where I can try out SPARQL queries?

The creators of several SPARQL implementations provide online services where SPARQL queries can be input and executed against either canned datasets or arbitrary datasets (identified by URLs). The community maintains a list of SPARQL endpoints on the W3C ESW Wiki.

W3C and the Data Access (DAWG) and SPARQL Working Groups

What is the status of the W3C SPARQL specifications?

On January 15, 2008, the W3C published the three SPARQL specifications as Recommendations. This means that the specifications have been endorsed by the W3C membership and should be considered stable documents.

In February 2009, the W3C re-chartered the SPARQL Working Group (new name) to produce specifications for extensions to the SPARQL language, protocol, and landscape. This new Working Group has an 18-month timeframe.

Will there be a SPARQL 2? When will feature X be standardized?

The in-progress SPARQL Working Group is working on a new version of SPARQL. The group's work can be followed on two public mailing lists:

SPARQL Language

Can I use SPARQL to search for substring matches within literal values?

SPARQL provides the function, regex(), which can be used to test whether a literal value contains a certain substring:

   SELECT ?title 
   WHERE {
     _:book :title ?title .
     FILTER (regex(?title, "SPARQL")) .
   }

Why don't I get any matches when I use regex() to match typed literals or plain literals with language tags?

The regex() function expects its first argument to be either a plain literal without a language tag or else a typed literal with a datatype of xsd:string. Plain literals with a language tag or typed literals of other datatypes will evaluate to a type error which causes the filter to fail. If you wish regex() to match solely based upon a literal's lexical value, use the str() function, which converts typed and plain literals to simple literals—i.e., plain literals without a language tag:

   SELECT ?title 
   WHERE {
     _:book :title ?title .
     FILTER (regex(str(?title), "SPARQL")) .
   }

How can I query transitive closures / trees / hierarchies / RDF lists in SPARQL?

There is no built-in support within SPARQL to query hierarchical structures of an unknown depth (e.g. trees or lists), to query transitive relations, or to query via XPath like paths. The Data Access Working Group postponed this issue in early 2005.

There are several workarounds to perform these queries using SPARQL:

  • Repeated queries. A repeating structure can be queried via a series of queries. If the structure's internal nodes have URIs (or if the SPARQL endpoint supports blank node identifiers which are stable across queries ("told bnodes"), then the same query can be issued repeatedly to explore the structure. Alternatively, an ever-growing query can be created which repeatedly queries the structure from its root to an increasing, fixed depth (until a desired value is found or the end of the structure is reached).
  • Inference. In an environment which supports querying over an inferred graph, inference rules can be used to specify transitive closures or hierarchy membership relations that can then be queried with SPARQL. Jos De Roo has sketched examples of such SPARQL queries using cwm and Euler and a suitable set of N3 rules.
  • Implementation-specific approaches. Several SPARQL implementations provide ways to address this question. In ARQ, for example, there are two approaches:
    1. A filter function named <java:com.hp.hpl.jena.query.function.library.listMember> which takes an RDF list node and a resource as parameters and returns true if the resource is a member of the list.
    2. A special predicate named <http://www.jena.hpl.hp.com/ARQ/list#member> can be used inside the SPARQL query pattern (the WHERE clause) to associate an RDF list head with all the list members.

Can I include subqueries in a SPARQL query?

While SPARQL does support nested graph patterns, it does not directly support subqueries (for example, the FROM clause of a SPARQL query cannot itself contain a CONSTRUCT query which generates the dataset to be queried against). The Data Access Working Group postponed this issue in early 2005. A very limited form of subqueries can be accomplished with SPARQL engines that will perform HTTP GETs upon graphs named in FROM or FROM NAMED clauses by creating a URL consisting of an embedded SPARQL CONSTRUCT query submitted to a SPARQL endpoint and supplying this URL as part of the RDF dataset being queried. In practice, this technique is often inefficient and is subject to possible URL-maximum-length restrictions of the involved software.

Can I bind a variable to a specific value (e.g., the result of a function call)?

SPARQL does not support setting variable bindings except via graph pattern matching or via the GRAPH ?g construct. Expressions are not allowed in the SELECT list of a SPARQL query.

How can I use SPARQL to query maximum/minimum values or other universally quantified criteria?

A combination of the SPARQL OPTIONAL keyword and the bound(...) filter function can be used to mimic some universally quantified queries. As an example, consider this query which finds the minimum price of every book in the underlying default graph:

  PREFIX ex: <http://example.org/>
  SELECT ?book ?minprice
  WHERE {
    ?book a ex:book ; ex:price ?minprice .
    OPTIONAL { 
      ?book ex:price ?otherprice . 
      FILTER( ?otherprice < ?minprice ) .
    } .
    FILTER ( !bound(?otherprice) ) .
  }

Can I use SPARQL to select a single value based on an ordered list of predicates which might appear in the data?

When writing SPARQL queries against heterogeneous data sources, one often wants to select a value for a certain purpose without knowing which one of several predicates might be used in the data. The SPARQL OPTIONAL keyword can be used to accomplish this. Suppose we are selecting a human-readable label for a Web page, and we want to use the value of the Dublin Core title predicate (dc:title) if it exists, and otherwise use the value of the rdfs:label predicate. This can be accomplished with SPARQL idiom:

  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>    
  PREFIX dc: <http://purl.org/dc/elements/1.1/>
  SELECT ?text
  WHERE {
    OPTIONAL { <http://example.org/myWebPage> dc:title ?text . }
    OPTIONAL { <http://example.org/myWebPage> rdfs:label ?text . }
  }

More information on this idiom is available in this blog post.

Can I use SPARQL to query RDFS entailments?

It is sometimes desirable to use SPARQL to query triples entailed from subclass, subproperty, range, domain, and other relations which can be represented using RDF Schema.

The SPARQL specification defines the results of queries based on RDF simple entailment. However, the specification does present a general, parametrized definition of graph pattern matching that can be expanded to other entailments beyond RDF simple entailment.

As an alternative, the SPARQL specification acknowledges that queries are often performed against a virtual graph which may not be fully materialized. Because RDFS entailment rules will always lead to a unique, deductive closure of a base graph, a query engine can treat the RDFS deductive closure of a base graph as the virtual graph against which (simple-entailment based) SPARQL queries are executed.

Whether or not a particular query endpoint supports RDFS entailment is implementation defined. Such a property might be advertised as part of the endpoint's functional description.

For more information, see information on SPARQL extensions for other entailment regimes.

Can I use SPARQL to query OWL entailments?

For the most part, the answer to this question is the same as the above answer regarding RDFS entailment. However, OWL-DL axioms do not always result in a unique deductive closure, and as such querying OWL-DL entailments requires instantiating the parametrized SPARQL basic graph pattern definition with values appropriate for OWL-DL entailment. The open-source OWL-DL reasoner, Pellet, will answer SPARQL queries while considering OWL-DL entailments.

For more information, see information on SPARQL extensions for other entailment regimes.

What do blank nodes mean in a SPARQL query?

For the most part, blank nodes in SPARQL queries function exactly as variables which cannot be returned to the user/client. So, the following two queries behave identically:

  SELECT ?title WHERE { _:book :hasTitle ?title }
  SELECT ?title WHERE { ?book :hasTitle ?title }

However, a SPARQL query may not reuse the same blank node label twice in different basic graph patterns. That is, the following is not a legal SPARQL query:

  SELECT ?title WHERE { 
    _:book rdfs:seeAlso ?g .
    GRAPH ?g { _:book dc:title ?title }
  }

In most cases, it is best practice to use the [] and [ :prop :obj ] syntaxes for blank nodes and to only use explicit blank node labels for constructs that cannot be expresed otherwise.

Did you know that the SPARQL WHERE keyword is optional?

Well, it is!

Why don't I get any matches when I search for numbers?

Numbers in RDF data can be represented as plain literals or as typed literals (using XML Schema datatypes such as xsd:int). Numbers written in a SPARQL query with surrounding quotation marks (e.g., "4") will only match plain literals in the dataset. Numbers written without quotation marks (e.g. 4) will only match typed literals. Be sure to use the appropriate form for the data you are querying.

If your data contains numbers as plain literals, then they will be compared as strings not numbers, and you may have to cast them to typed literals to get the desired results. For example, to check if a number expressed as a plain literal is less than 100, you'd say:

  ... 
  FILTER (xsd:int(?number) < 100)
  ...

Many thanks to Richard Cyganiak for contributing this question and answer.

Can I use SPARQL to insert, update, or delete RDF data?

The current, standardized version of SPARQL deals only with retrieving selected data from RDF graphs. There is no equivalent of the SQL INSERT, UPDATE, or DELETE statements. Most RDF-based applications handle new, changing, and stale data directly via the APIs provided by specific RDF storage systems. Alternatively, RDF data can exist virtually (i.e. created on-demand in response to a SPARQL query). Also, there are systems which create RDF data from other forms of markup, such as Wiki markup or the Atom Syndication Format.

However, there is significant active work going on to extend SPARQL to support update operations. See the SPARQL extension wiki page dealing with update for more details.

SPARQL Protocol

Can I receive the results of SPARQL queries as RDF?

The SPARQL protocol requires that the results of SPARQL CONSTRUCT and DESCRIBE queries be RDF graphs.

SELECT and ASK queries, on the other hand, usually have their results returned as XML (or sometimes as JSON). However, as part of the group's test suite, the DAWG includes an RDF vocabulary which can be used to represent the results of SELECT or ASK queries in RDF.

Can I receive the results of SPARQL queries as XML?

The DAWG defines an XML serialization format, the SPARQL Query Results XML Format, which is used to return the results of SPARQL SELECT and ASK queries. The SPARQL protocol requires that the results of SPARQL CONSTRUCT and DESCRIBE queries be RDF graphs, which can be represented using the XML-based RDF/XML syntax.

Can I receive the results of SPARQL queries as JSON?

JSON is a lightweight serialization format which is a subset of JavaScript's literal-object notation. It is a useful format to use for SPARQL results when a Web client is issuing the queries and processing the results. The DAWG has published a Note defining a JSON format that closely follows the SPARQL Query Results XML Format, and which can be used to return the results of SPARQL SELECT and ASK queries.


Other SPARQL Resources


Acknowledgements

The SPARQL FAQ is maintained by Lee Feigenbaum. If you have a question, answer, correction, suggestion, or any other feedback on this FAQ, please contact me at lee AT thefigtrees DOT net or find LeeF on #swig on irc.freenode.net. Many thanks to all of the following for aid in collecting and answering the questions: Richard Cyganiak, Sean Martin, Andy Seaborne, Ben Szekely. Elias Torres, Wing Yung. Thanks also to Sailesh Panchang for help in improving the accessibility of the FAQ.

Change History

The most recent version of the SPARQL FAQ can always be found at http://thefigtrees.net/lee/sw/sparql-faq. The authors maintain a history of published changes here: