Evolving standards, consensus, and the energy to get there
We’re doing something mildly interesting in the recently re-chartered SPARQL working group. We’re spending our first couple of months defining what our deliverables will be for the rest of our 18-month chartered lifetime. The charter gives us some suggestions on things to consider (update, aggregates, XML serialization for queries, and more) and some constraints to keep in mind (backwards compatibility), but beyond that it’s up to the group.
So we’ve started by gathering potential features. We solicited features—which can be language extensions, syntactic shortcuts, protocol enhancements, result format modifications, integrations with other technologies like XQuery, OWL, or RIF, new query serializations, and more—both from within the Working Group and from the broader community. Within a week or so, we had identified nearly 40 features, and I expect a few more to come in yet.
The problem is: all of these features would be helpful. My take on developer-oriented-technology standards such as SPARQL is that ultimately they serve the users of the users of the implementations. There’s a pyramid here, wherein a small number of SPARQL implementations will support a larger number of developers creating SPARQL-driven software which in turn does useful (sometimes amazing) things for a much larger set of end users. So ideally, we’d focus on the features that benefit the largest swaths of those end users.
But of course that’s tough to calculate. So there’s another way we can look at things: the whole pyramid balances precariously on the shoulders of implementers, and, in fact, the specifications are themselves written to be formal guides to producing interoperable implementations. If implementers can’t understand an extension or willfully choose not to add it to their implementations, then there wasn’t much point in standardizing it in the first place. This suggests that implementer guidance should be a prime factor in choosing what our Working Group should focus on. And that’s far more doable since many of the Working Group participants are themselves SPARQL implementers.
Yet, implementers priorities are not always tied to what’s most useful for SPARQL users and SPARQL users’ users. (This can be for a wide variety of reasons, not the least of which is that the feedback on what’s important for the implementer's’ users’ users often loses something in the multiple layers of communication that end up relaying it to implementers.) So what about that middle category, SPARQL users/developers? These fine folks have the most direct experience with SPARQL’s capabilities, caveats, and inabilities to solve different classes of problems as they apply to solving their users’ business/scientific/social/consumer problems. SPARQL users can and will surely contribute valuable experience along the lines of what extensions might make SPARQL easier to learn, easier to use, more powerful, and more productive when building solutions on the Semantic Web technology stack.
The difficulty here is that it’s often very, very hard for SPARQL developers to be selective in what features they’d like to see added to the landscape. SPARQL is their toolbox, and from their perspective (and understandably so), there’s little downside in stuffing as many tools as possible into SPARQL, just in case.
Things get more complicated. I (very) often joke (and will now write down for the first time) that if you get 10 Semantic Web advocates in a room, you’ll probably have 15 or 20 opinions as to what the Semantic Web is and what it’s for. When we zoom in on just the SPARQL corner of the Semantic Web world, things are no different. Some people are using SPARQL to query large knowledge bases. Some people are using SPARQL to answer ontologically-informed queries. Some people are using SPARQL to query an emerging Web of linked data. Some people are using SPARQL for business intelligence. Some people are using SPARQL in XML pipelines. Some people are using SPARQL as a de facto rules language. Some people are using SPARQL as a federated query language. And much more. No wonder then, that the Working Group might have difficulties reaching consensus on a significantly whittled-down list of features to standardize.
Why not do it all? Or, at least, why not come up with some sort of priority list for all of the features and work our way down that one at a time? It’s tempting, given the high quality of the suggestions, but I’m pretty sure it’s not feasible. Different groups of features interact with each other in different ways, and it’s exactly these interactions that need to be formally written down in a specification. Furthermore, the W3C process requires that as we enter and exit the Candidate Recommendation stage we demonstrate multiple interoperable implementations of our specifications—this becomes extremely challenging to achieve when the language, protocol, etc. are constantly moving targets. Add to that the need to build test cases, gather substantive reviews from inside and outside the Working Group, and (where appropriate) work together with other Working Groups. Now consider that Working Group participants are (for the most part) giving no more than 20% of their time to the Working Group. Believe me, 18 months flies by.
So what do I think is reasonable? I think we’ll have done great work if we produce high quality specifications for maybe three, four, or five new SPARQL features/extensions. That’s it.
(I’m not against prioritizing some others on the chance that my time estimates are way off; that seems prudent to me. And I also recognize that we’ve got some completely orthogonal extensions that can easily be worked on in parallel with one another. So there’s some wiggle room. But I hold a pretty firm conviction that the vast majority of the features that have been suggested are going to end up on the proverbial cutting-room floor.)
Here’s what I (personally) think should go into our decisions of what features to standardize:
- Implementation experience. It’s easy to get in trouble when a Working Group resorts to design-by-committee; I prefer features that already exist in multiple, independent implementations. (They need not be interoperable already, of course: that’s what standards work is for!)
- Enabling value. I’m more interested in working on features that enable capabilities that don’t already exist within SPARQL, compared to those features which are largely about making things easier. I’m also interested in working on those extensions that help substantial communities of SPARQL users (and, as above, their users). But in some cases this criterion may be trumped by…
- Ease of specification. Writing down a formal specification for a new feature takes time and effort, and we’ve only a limited amount of both with which to work. I’m inclined to give preference to those features which are easy to get right in a formal specification (perhaps because a draft specification or formal documentation already exists) compared to those that have many tricky details yet to be worked out.
- Ease/likelihood of implementation. I think this is often overlooked. There are a wide range of SPARQL implementations out there, and—particularly given the emerging cloud of linked data that can easily be fronted by multiple SPARQL implementations—there are a large number of SPARQL users that regularly write queries against different implementations. The SPARQL Working Group can add features until we’re blue in the face, but if many implementations are unable or choose not to support the new features, then interoperability remains nothing but a pipe dream for users.
One potential compromise, of sorts, is to define a standard extensibility mechanism for SPARQL. SPARQL already has one extensibility point in the form of allowing implementations to support arbitrary filter functions. There are a variety of forms that more sophisticated extensibility points might take. At the most general, Eric Prud’hommeaux mentioned to me the possibility of an EXTENSION keyword that would take an identifying URI, arbitrary arguments, and perhaps even arbitrary syntax within curly braces. Less extreme than that might be a formal service description that allows implementations to explore and converge on non-standard functionality while providing a standard way for users and applications to discover what features a given SPARQL endpoint supports. The first SPARQL Working Group (the DAWG) seems to have been very successful in designing a language that provided ample scope for implementers to try out new extensions. I think if our new Working Group can keep that freedom while also providing some structure to encourage convergence on the syntax and semantics of SPARQL extensions, we’ll be in great shape for the future evolution of SPARQL.
There’s one final topic that I’ve alluded to but also wanted to explicitly mention: energy. We’ve got a lot of Working Group members with a variety of perspectives and a large number of potential work items around which we need to reach consensus. And then we need to reach consensus on the syntax and semantics of our work items, as well as the specification text used to describe them. We need editors and reviewers and test cases and test harnesses and W3C liaisons and community outreach and comment responders. All of this takes energy. The DAWG nearly ground to a premature halt as the standardization process dragged on for year after year. We can’t allow for that to happen this time around, so we need to keep the energy up. An enthusiastic Working Group, frequent contributions from the broader community, occasional face-to-face meetings, and noticeable progress indications can all help to keep our energy from flagging. And, of course, sticking to our 18-month schedule is as important as anything.
What do you think? I’m eager to hear from anyone with suggestions for how the Working Group can best meet its objectives. Do you disagree with some of my underlying assumptions? How about my criterion for considering features? Do you see any extensibility/evolutionary mechanisms that you think would ease the future growth of SPARQL? Please let me know.