Why SPARQL?

By Lee Feigenbaum on January 25, 2008 1:00 AM | 10 Comments

I'm quite pleased to have played a part in helping SPARQL become a W3C Recommendation. As we were putting together the press release that accompanied the publication of the SPARQL recommendations, Ian Jacobs, Ivan Herman, Tim Berners-Lee, and myself put together some comments (in bullet point form) explaining some of the benefits of SPARQL. They do a good job of capturing a lot of what I find appealing about SPARQL, and I wanted to share them with other people. I don't think these are the best examples of SPARQL's value or the most eloquently expressed, but I do think it captures a lot of the essence of SPARQL. (While some of the text is attributable to me, parts are attributable to Ian, Ivan, and Tim.)

SPARQL is to the Semantic Web (and, really, the Web in general) what SQL is to relational databases. (This is effectively Tim's quotation from the press release.)
If we view the Semantic Web as a global collection of databases, SPARQL can make the collection look like one big database. SPARQL enables us to reap the benefits of federation. Examples:
- Federating information from multiple Web sites (mashups)
- Federating information from multiple enterprise databases (e.g. manufacturing and customer orders and shipping systems)
- Federating information between internal and external systems (e.g. for outsourcing, public Web databases (e.g. NCBI), supply-chain partners)
There are many distinct database technologies in use, and it's of course impossible to dictate a single database technology at the scale of the Web. RDF (the Semantic Web data model), though, serves as a standard lingua franca (least common denominator) in which data from disparate database systems can be represented. SPARQL, then, is the query language for that data. As such, SPARQL hides the details of a sever's particular data management and structure details. This reduces costs and increases robustness of software that issues queries.
SPARQL saves development time and cost by allowing client applications to work with only the data they're interested in. (This is as opposed to bringing it all down and spending time and money writing software to extract the relevant bits of information.)
- Example: Find US cities' population, area, and mass transit (bus) fare, in order to determine if there is a relationship between population density and public transportation costs.
- Without SPARQL, you might tackle this by writing a first query to pull information from cities' pages on Wikipedia, a second query to retrieve mass transit data from another source, and then code to extract the population and area and bus fare data for each city.
- With SPARQL, this application can be accomplished by writing a single SPARQL query that federates the appropriate data source. The application developer need only write a single query and no additional code.
SPARQL builds on other standards including RDF, XML, HTTP, and WSDL. This allows reuse of existing software tooling and promotes good interoperability with other software systems. Examples:
- SPARQL results are expressed in XML: XSLT can be used to generate friendly query result displays for the Web
- It's easy to issue SPARQL queries, given the abundance of HTTP library support in Perl, Python, php, Ruby, etc.

Finally, I scribbled down some of my own thoughts on how SPARQL takes the appealing principles of a Service Oriented Architecture (SOA) one step further:

With SOA, the idea is to move away from tightly-coupled client-server applications in which all of the client code needs to be written specifically for the server code and vice versa. SOA says that if instead we just agree on service interfaces (contracts) then we can develop and maintain services and clients that adhere to these interfaces separately (and therefore more cheaply, scalably, and robustly).
SPARQL takes some of this one step further. For SOA to work, services (people publishing data) still have to define a service, a set of operations that they'll use to let others get at their information. And someone writing a client application against such a service needs to adhere to the operations in the service. If a service has 5 operations that return various bits of related data and a client application wants some data from a few services but doesn't want most of it, the developer still must invoke all 5 services and then write the logic to extract and join the data relevant for her application. This makes for marginally complex software development (and complex == costly, of course).
With SPARQL, a service-provider/data-publisher simply provides one service: SPARQL. Since it's a query language accessible over a standard protocol (HTTP), SPARQL can be considered a 'universal service'. Instead of the data publisher choosing a limited number of operations to support a priori and client applications being forced to conform to these operations, the client application can ask precisely the questions it wants to retrieve precisely the information it needs. Instead of 5 service invocations + extra logic to extract and join data, the client developer need only author a single SPARQL query. This makes for a simpler application (and, of course, less costly).

As an example, consider an online book merchant. Suppose I want to create a Web site that finds books by my favorite author that are selling for less than $15, including shipping. The merchant supplies three relevant services:

Search. Includes search by author. Returns book identifiers.
Book lookup. Takes a book identifier and returns the title, price, abstract, shipping weight, etc.
Shipping lookup. Takes total order weight, shipping method, and zip code, and returns a shipping cost.

To create my Web site without SPARQL, I'd need to:

Invoke the search service. (Query 1)
Write code to extract the result identifiers and, for each one, invoke the book lookup service. (Code 1, Query 2 (issued multiple times))
Write code to extract the price and, for each book, invokes the shipping lookup service with that book's weight (Code 2, Query 3 (issued multiple times))
Write code to add each book's price and shipping cost and check if it's less than $15. (Code 3)

Now, suppose the book merchant exposed this same data via a SPARQL endpoint. The new approach is:

Use the SPARQL protocol to ask a SPARQL query with all the relevant parameters (Query 1 (issued once))

For the record, the query might look something like:

PREFIX : <http://example.com/service/sparql/>
SELECT ?book ?title
  FROM :inventory
 WHERE {
  ?book 
    a :book ; :author ?author ; 
    :title ?title ; :price ?price ;
    :weight ?weight .
  ?author :name "My favorite Author" .
  FILTER(?price + :shipping(?weight) < 15) .
}

(This example also illustrates another feature of SPARQL: SPARQL is extensible via the use of new FILTER functions that can allow a query to invoke operations (in this case, a function (:shipping) that gives shipping cost for a particular order weight) defined by the SPARQL endpoint.)

10 Comments

Kjetil Kjernsmo | January 28, 2008 9:08 AM

And I'm very pleased with the role you played in getting SPARQL into shape! :-)

I also agree that one of SPARQLs strong points is that "With SPARQL, this application can be accomplished by writing a single SPARQL query that federates the appropriate data source."

But can we prove that case by writing that query and see the results returned right now? We have the endpoints, but can we actually write that single query with the endpoints we have right now?

Lee | January 28, 2008 10:07 AM

It's worth checking out, though I actually don't know offhand if the data needed is in the data extracted from wikipedia or elsewhere.

It's now on the TODO list, though if anyone else gets to it first, be my guest :)

Lee

Prateek | January 28, 2008 3:11 PM

Kudos for the work done for work done in shaping up SPARQL!
Just trying to understand your post better,the example about the US cities and their population and the one you gave about bookshop,do they highlight the same feature of SPARQL about working only on data/operations of interest or there are some differences? Just trying to understand the benefit I can achieve.If possible,please clarify!

Thanks...

Lee | January 28, 2008 3:16 PM

Prateek,

Good point. The two examples are really getting at the same point. (That SPARQL often allows a single query to replace what would otherwise requires multiple queries and code to extract relevant information.)

The reason they're both there is just because the two examples happened to be created at different times in slightly different contexts. :)

Lee

Duncan Hull | January 29, 2008 4:37 AM

Hello. Interesting post. So you say "SPARQL saves development time and cost". Can you actually prove this?

Ian Goldsmid | February 27, 2008 10:13 PM

to Duncan Hull:

Actually I would say that Lee needs to prove absolutely nothing. Because companies using SPARQL - and in particular our product - The Semantic Discovery System are independently proving the extraordinary value of SPARQL. Its almost too obvious to be believable. Here's my explanation:

Suppose you are some bod in a large company wanting query across multiple distributed, heterogeneous data silos (i.e. a pervasive need). You know there is valuable knowledge embedded/hidden in the data, - implicit relationships between data points you want to make explicit. OK so far?

Well now, with SPARQL, instead of having to write SQL, build a data warehouse, or spend ages having programmers write some application - with SPARQL you simply state what columns (i.e. as you find in database tables or spreadsheets) you want to have in your result set - and any filters on those columns (i.e. column values) - and boom! SPARQL automatically knows everything that would have to be painstakingly specified in SQL, i.e. no more complex table joins etc etc... Any records that contain values in all the specified set of columns/filters are presented.

As an end user it is now a total simplicity to express your query in terms of the *shape* of the result you want back (essentially the columns/value filters)...

This is a literal marvel, something that has never been possible before - and if Duncan takes the time to look into this he will see it is already fully proven.

Ian Goldsmid
http://www.Meaning2Go.com

glenn mcdonald | July 26, 2008 10:27 AM

(Seeing this post a few months late, because you linked to it again...)

Seems like the extension approach complicates discovery. If I don't know what this endpoint provides, I now not only need a way to discover the data model, but to identify that "shipping" is a function, and that it takes a parameter. I'd think it would be much cleaner to push the extensibility down out of the query language into the data model. That is, make shipping just another property of book, and let the endpoint decide whether to represent that data statically or generate it dynamically as a function of price...

[Lee: In this particular example, I think you might be right, but I'm not sure. Shipping probably depends on the location of the authenticated user, for instance, and so it seems a bit dodgy to expose it as a property of a book. I can see it presenting challenges to a query cache component as well. In the end though, most extension functions in FILTER clauses can also be expressed as a functional property -- I think choosing between the two is somewhat a matter of personal taste, somewhat a matter of implementation support, and somewhat a matter of some vague notion of "what makes sense".

I do agree that discovery of extension functions needs to be improved. It's one of the topics that falls under the very unresolved topic of SPARQL endpoint description: http://esw.w3.org/topic/SparqlEndpointDescription ]

George Izzard O'Veering | August 8, 2008 8:29 AM

As one who has had a lot of personal experience with SPARQL I can say that it has not saved us development time and effort, relative to doing the same thing with existing technologies. In fact, it's been a bugger (sorry). Advice to companies, large and small: don't pick up technologies until they're well and truly mature, unless you have significant time to explore it in a low-risk way.

It's not the idea of SPARQL which is flawed, although it's probably too ambitious for its own good, judging from the fact that there are too many bugs and not enough documentation. Minor problems appear all the time: things like "why doesn't this search work on this record, when it does on the next?" Hacks and work-arounds are often required.

In short, nice idea, but not quite mature yet. Those with time to play, go ahead. To those needing to get projects out of the door in a known timeframe, extreme caution is indicated. And it is not a good idea to naively accept (or make) claims that standards so new that the paint hasn't dried yet will save development time, money, or the environment.

1) Your case is individual

2) Many of the variables are down to third party implementation details

3) There really aren't that many case studies from which to work, anyway.

Lee | August 8, 2008 8:49 AM

Hi George,

Sorry to hear about your unhappy experiences with SPARQL!

I can't really speak to the details of your particular situation, of course, but I did want to reply a small bit.

As I think anyone would, I'd recommend that any new software engineering effort be scoped in a way that minimizes the risk of previously untried efforts. As SPARQL is quite young, implementation quality will vary widely, and expertise in the language is not nearly as widespread as something like SQL. It's important to choose appropriate projects that have the potential to demonstrate the values of SPARQL (and other Semantic Web technologies) while mitigating the risks.

You observe that SPARQL has "...too many bugs and not enough documentation". I assume you're referring to whichever implementation of SPARQL you are using, on which I cannot comment of course. The SPARQL working group compiled an implementation report of over a dozen highly interoperable implementations, based on a suite of several hundred tests.

As for my particular claims: I stand by them. I have extensive experience using SPARQL as a data-access layer for many years, and given engineers with proper training/skill sets, quality implementations, and well-chosen projects (the same caveats I would include for any technology, new or old!), SPARQL has repeatedly saved us significant development time over traditional approaches.

Finally, regarding case studies. There is a collection of Semantic Web case studies and use cases available at

http://www.w3.org/2001/sw/sweo/public/UseCases/

Lee

Brian Donnelly | November 13, 2008 7:15 AM

Hi George and Lee,

I agree with Lee but fully sympathise with George. My direct practical experience of SPARQL after 20 years+ SQL programming is that there is ONE blindingly clear great productivity advantage which is immediately transoarently obvious to true end users - in essence "no more joins".

In Dug R&D, no scientist will ever write SQL no matter how many pretty wizards you give them. I find however (and not easily impressed with new, raw technology) that if you layer SPARQL over lets say Oracle, and have a GUI that hides the SPARQL generation process, then the user can merrily click on two or three concepts and be completely unaware that the generated SPARQL then transforms into a 7 or 8 table native Oracle SQL join to bring the result set back.

I think this is the bees knes and you can prove - trivially - that doing say 2 mouseclicks to do this 7 way join is far more efficient than manually doing the same - even if the user knew what key to join to what - which they may not.

I think the immaturity of SPARQL can indeed drive you mad, but using it as an SQL code generator (would annoy purists but who cares) is clearly a great win.

Why SPARQL?

Categories:

10 Comments

Search

About this Entry

Categories

Monthly Archives