At SemTech a couple of weeks ago, I participated in the jam-packed lightning talk session, 90 minutes packed with 5-minute talks and moderated with great aplomb by Paul Miller. While most of the speakers presented pithy, informative, and witty slide decks, I opted to go a different route: I've long believed that some of the biggest value in Semantic Web technologies lies in their ability to dramatically change the timescales involved in traditional IT projects—to this end, I used my 5 minute slot to give a live demo of using our Anzo software suite to build a solution for running a conference such as SemTech using just Excel and a Web browser.

When I got back to Boston, I made a recording of the same lightning demo for posterity. Please enjoy it here and drop me a note if you have any questions or would like to learn more.

(Best viewed in full screen, 720p.)

Early SPARQL Reviews

| No Comments

sw-sparql-orange[1]The SPARQL Working Group is still working on all of our specifications. None are yet at Last Call, though we feel our designs are quite stable  and we're hoping to reach Last Call within a few months. Standard W3C process encourages interested community members to review Working Drafts as they're produced, but especially encourages reviews of Last Call drafts.

While we will of course do this (solicit as widespread review of our Last Call drafts as possible), I'd like to put out a call for reviews of our current set of Working Drafts. If you can only do one review, you're probably best off waiting for Last Call; but if you have the inclination and time, it would be great to receive reviews of our current set of Working Drafts at our comments list at public-rdf-dawg-comments@w3.org. The Working Group has committed to responding formally to all comments received from hereon out.

Here is our current set of documents, along with a few explicit areas/issues that the Working Group and editors would love to receive feedback about (of course, all reviews & all feedback is welcome):

SPARQL 1.1 Query

  • Feedback on MINUS and NOT EXISTS, the two new negation constructs in SPARQL 1.1 (section 8)
  • Feedback on the new functions in SPARQL 1.1 (15.4.14 through 15.4.21)
  • Feedback on the aggregates ("set functions") included in SPARQL 1.1 (section 10.2.1)
  • Feedback on property paths (currently in its own document)

SPARQL 1.1 Update

  • Handling of RDF datasets in SPARQL Update (particularly the WITH, USING, and USING NAMED clauses)

SPARQL 1.1 Service Description

  • Discovery mechanism for service descriptions (section 2)
  • Modeling of graphs and RDF datasets (3.2.7 through 3.2.10 and 3.4.11 through 3.4.17)
  • Service description as related to entailment (3.2.5, 3.2.6 and 3.4.3 through 3.4.5)

SPARQL 1.1 Entailment Regimes

  • The mechanisms for restricting solutions in all regimes
  • Are the OWL Direct Semantics too general? E.g. it allows for variables in complex class expressions

SPARQL 1.1 Federation Extensions

  • Should support for SERVICE be mandatory in SPARQL 1.1 Query implementations?
  • Should support for BINDINGS be mandatory in SPARQL 1.1 Query implementations?

SPARQL 1.1 Uniform HTTP Protocol for Managing RDF Graphs

  • Interpretation/translation of HTTP verbs into SPARQL Update statements
  • Handling of indirect graph identification (section 4.2 et al.)

Windows Software I Pay For

| No Comments

After four months of sitting on the installation discs but being too busy to act, I’ve finally said goodbye to Windows XP and upgraded to Win7. My jury’s still out as to the new operating system, but in the process of reinstalling all of my software applications I took particular note of a few pieces of software that are not mainstream but that I find indispensable in getting my work done. I pay for all of these (rather than use the free versions) because they make my life better.

  • ScreenHunter. ScreenHunter lets me take screen shots. Pick a hot key (I use F6, the suggested default). When you hit the hot key, ScreenHunter takes a screenshot of either the current window or a rectangular region of the screen of you’re choosing. This screenshot is copied to the clipboard and saved to a directory of your choosing (one reason ScreenHunter surpasses Windows’s built-in print screen capabilities). I pay ($19.95) for ScreenHunter Plus imageprimarily for its support for multiple hot keys: I assign one to the active window and another to a rectangular region and I’m set.
  • ScreenSteps. At Cambridge Semantics we use ScreenSteps to produce much of our documentation. ScreenSteps has a clunky, sluggish user interface and is not the most robust software in the world, but it does one thing really, really well: ScreenSteps makes it a breeze to create documents that interleave text and annotated screenshots. Thanks to William Hayes for alerting me to ScreenSteps.
  • Pdf995. Pdf995 installs a printer driver that lets you print to a PDF file. It’s trivial to use, creates good looking PDF documents, and requires no setup/configuration. It just works. I use it all the time for producing documents, saving Web pages, archiving receipts, and more. I pay ($9.95) for it to avoid the ads every time I use it.
  • SimpleDiagrams. SimpleDiagrams is an Adobe AIR application that lets me create rough diagrams of processes, organizations, etc. in seconds. It’s purposefully low on functionality—no anchor points, grids, etcimage. Just a few palettes of common shapes and a blank canvas onto which to drag them. The diagrams look great and are easy to make. (Last week I was sitting around a meeting in which 4 other people were furiously scribbling down the contents of the whiteboard in their notebooks; I used SimpleDiagrams to capture an electronic version of the drawing in a fraction of the time.) I pay ($19.00) for SimpleDiagrams to have access to a few additional symbol libraries, to get rid of the annoying nag windows, and because I think it’s quite cool software.

Does anyone use SPARQL over SOAP?

| 4 Comments

The SPARQL Working Group would like to know if anyone uses SPARQL over SOAP. Please leave a comment if you do. (We know that several implementations support a SOAP implementation of the SPARQL protocol, but we don’t have much evidence that this part of such implementations is ever used.)

Thanks!

CONSTRUCTing Quads

| 1 Comment

I promised Danny that I’d write this up, so here’s to making good on promises.

Open Anzo is a quad store. (I’ve written about this before.) All of the services Open Anzo offers—versioning, replication, real-time updates, access control, etc.—are oriented around named graphs. Time and time again we’ve found named graphs to be invaluable in building applications atop an RDF repository.

And while SPARQL took the first steps towards standardizing quads via the named graphs component of the RDF dataset, the CONSTRUCT query result form only returned triples.

For our purposes in Open Anzo, this severely limits the usefulness of CONSTRUCT. We can’t use it to pull out a subset of the server’s data, as any data returned has been stripped of its named graph component. The solution was pretty simple, and is a good example of practicing what I’ve been preaching recently: a key part of the standards process is for implementations to extend the standards.

In this case, we simply extended Glitter’s (Open Anzo’s SPARQL engine) CONSTRUCT templates to support a GRAPH clause, in exactly the same way that SPARQL query patterns support GRAPH clauses. This means that any triple pattern within a CONSTRUCT template will now either output a triple (if its outside any GRAPH clause) or a quad (if its inside a GRAPH clause).

Key to making this happen is the fact that both the Open Anzo server and the three client APIs (Java, JavaScript, and .NET) support serializing and deserializing quads to/from the TriG RDF serialization format. TriG’s a very straightforward extension of Turtle, and I’d like to see it used more and more throughout Semantic Web circles.

Anyway, here are a few simple examples of CONSTRUCTing quads in practice:

# fix up typo'ed predicates
CONSTRUCT {
  GRAPH ?g {
    ?s rdf:type ?o
  }
} WHERE {
  GRAPH ?g {
    ?s rdf:typo ?o
  }
}

# copy triples into a new graph
CONSTRUCT {
  GRAPH ex:newGraph {
    ?s ?p ?o
  }
} WHERE {
  ?s ?p ?o
}

# more complicated -- place constructed triples in
# a new “inferred” graph and indicate this fact in
# an Open Anzo metadata graph associated with the
CONSTRUCT {
  GRAPH ex:inferredGraph {
    ?p ex:uncle ?uncle
  }
  GRAPH ?mdg {
    ?mdg anzo:hasInferredGraph true
  }
} WHERE {
  GRAPH ?g {
    ?p ex:parent [ ex:brother ?uncle ] .
  }
  GRAPH ?mdg {
    ?mdg a anzo:metadatagraph ; anzo:namedGraph ?g
  }
}   

Of course, combine this with some of the other SPARQL extensions that Glitter supports—subqueries, projected expressions, assignment, and aggregates being my favorites—and you’ve got a powerful way to transform and extract quad-based RDF data.

SPARQLing at SemTech

| 3 Comments

SemTech 2009 has come and gone, and it was great. I was concerned—as were others—that the state of the economy would depress the turnout and enthusiasm for the show, but it seems that any such effects were at least counterbalanced by a growing interest in semantic technologies. Early reports are that attendance was up about 20% from last year, and at sessions, coffee breaks, and the exhibit hall there seemed to always be more people than I expected. Good stuff.

Eric P. and I gave our SPARQL By Example tutorial to a crowd of about 50 people on Monday. From the feedback I’ve received, it seems that people found the session beneficial, and at least a couple of people remarked on the fact that Eric and I seemed to be having fun. If this whole semantic thing doesn’t work out, at least we can fall back on our ad-hoc comedy routines.

Anyways, I wanted to share a couple of links with everyone. I think they work nicely to supplement other SPARQL tutorials in helping teach SPARQL to newcomers and infrequent practitioners.

  1. SPARQL By Example slides. I’ve probably posted this link before, but the slides have now been updated with some new examples and with a series of exercises that help reinforce each piece of SPARQL that the reader encounters. Thanks to Eric P. for putting together all of the exercises and to Leigh Dodds for the excellent space exploration data set.
  2. SPARQL Cheat Sheet slides. This is a short set of about 10 slides intended to be a concise reference for people learning to write SPARQL queries. It includes things like common prefixes, the structure of queries, how to encode SPARQL into an HTTP URL, and more.

Enjoy, and, as always, I’d welcome any feedback, suggestions for improvements, or pointers to how/where you’re able to make use of these materials.

We’ll be releasing the first versions of our Anzo products in July. Between now and then I’m going to try to do some blogging showing various parts of the products. But before I begin that, I’ve been thinking a bunch recently about how to characterize our use of Semantic Web technologies, and I wanted to write a bit on that.

Our software views the world of enterprise data in a pretty straightforward way:

  1. Bring together as much data as possible.
  2. Do stuff with the data.
  3. Allow anyone to consume the data however (& whenever) they want.

This is a very simple take on what we do, but it gets to the heart of why we care about semantics: We love semantics because semantics is the “secret sauce” that makes possible each of these three aspects of what we do.

Here’s how:

Bring together as much data as possible

First of all, in most cases we don’t actually physically copy data around. That sort of warehouse approach is appropriate in some cases, but in general we prefer to leave data where it is and bring it together virtually. Our semantic middleware product, the Anzo Data Collaboration Server, provides unified read, write, and query interfaces to whatever data sources we’re able to connect to. We often refer to the unified view of heterogeneous enterprise data as a semantic fabric, but really it’s linked data for the enterprise.

Semantic Web technologies make this approach feasible. RDF is a data standard that is both expressive enough to represent any type of data that’s connected to the server and also flexible enough to handle new data sources incrementally. URIs provide a foundation for minting identifiers that don’t clash unexpectedly as new data sources are brought into the fold. Named graphs give us a simple abstraction upon which we can engineer practical concerns like security, audit trails, offline access, real-time updates, and caching. And, of course, GRDDL gives us a standard way to weave XML source data into the fabric.

Without Semantic Web technologies we’d need to worry about defining a master relational schema up front, or we’d have to constantly figuring out how to structurally relate or merge XML documents. And when we’re talking about data that originates not only in one or two big relational databases but also in hundreds or thousands or hundreds of thousands of Excel spreadsheets, the old ways just don’t cut it at all. Semantic Web technologies, on the other hand, provide the agile data foundation we need to bring data together.

But bringing together as much data as possible is not an end in itself. What’s the point of doing this?

Do stuff with the data

This one’s intentionally vague, because there are lots of things that lots of different people want—and need—to do with data, and Anzo is a platform that accommodates many of those things. In general, though, Semantic Web standards again lay the groundwork for the types of things that we want to do with data:

  • Data access. SPARQL gives us a way to query information from multiple data sources at once.
  • Describing data. RDF Schema and OWL are extremely expressive ways to describe (the structure of) data, particularly compared to alternatives like relational DDL or XML Schema. We can (and do) use data descriptions to do things like build user interfaces, generate pick lists (controlled vocabularies), validate data entry, and more.
  • Transform data. There are all kinds of ways in which we need to derive new data from existing data. We might do this via inference (enabled by RDFS and OWL) or via rules (enabled by SPARQL CONSTRUCT queries, by RIF, or by SWRL) or simply via something like SPARQL/Update.

Without Semantic Web technologies, we’d probably end up using a proprietary approach for querying across data sources. We’d have to hardcode all of our user interface or else invent or adopt a non-standard way of describing our data beyond what a relational schema gives us. And then we might choose a hodgepodge of rules engines, SQL triggers, and application-specific APIs to handle transforming our data. And this might all work just fine, but we’d have to put in all the time, effort, and money to make all the pieces work together.

To me, that’s the beauty of the much-maligned Semantic Web layer cake. The fact that semantic technologies represent a coherent set of standards (i.e. a set of disparate technologies that have been designed to play nice together) means that I can benefit from all of the “glue” work that’s already been done by the standards community. I don’t need to invent ways to handle different identifier schemes across technologies or how to transform from one data model to another and back again: the standards stack has already done that.

Allow anyone to consume the data however (& whenever) they want

Once we’ve put in place the ability to bring data together and do stuff to that data, the remaining task is to get that information in front of anyone who needs it when they need it. We’ve put in a lot of effort to make bringing data into the fabric and acting on that data easy, and it would be a shame if every time someone needs to consume some information they need to put in a request and wait 6 months for IT to build the right queries, views, and forms for them.

To this end, Anzo on the Web takes the increasingly popular faceted-browsing paradigm and puts it in the hands of non-technical users. Anyone can visually choose the data that they need to see in a grid, a scatter plot, a pie chart, a timeline, etc. and the right view is created immediately. Anyone can choose what properties of the data should be available as facets to filter through the data set via whatever attributes he or she wants.

Once again it’s the flexibility of the Semantic Web technology stacks that makes this possible for us. RDF makes it trivial for us to create, store, and discover customized lenses with arbitrary properties. RDF also lets us introspect on the data to present visual choices to users when configuring views and adding filters. SPARQL is a great vehicle for building the queries that back faceted browsing.

In summary

It bears repeating that as with most technology standards, the things that we accomplish with Semantic Web standards could be done with other technology choices. But using a coherent set of standards backed by a thriving community of both research and practice means that:

  1. We don’t have to invent all the glue that ties different technologies together
  2. Any new standards that evolve within this stack immediately give our software new capabilities (see #1)
  3. There’s a wide range of 3rd party software that will easily interoperate with Anzo (other RDF stores, OWL reasoners, etc.)
  4. We can focus on enabling solutions, rather than on the core technology bits. All of the above frees us up to do things like build an easy to use faceted browsing tool, build Anzo for Excel to collect and share spreadsheet data, build security and versioning and real-time updates, and much more.

Again, the semantics is really the secret sauce that makes much of what we do possible, but there’s a lot more innovation and engineering that turns that secret sauce into practical solutions. I’ll have some takes on what this looks like in practice in the coming weeks, and we’d love to show you in person if you’ll be in the Boston, MA area or if you’ll be at SemTech in San Jose, CA.

I’m looking forward to this year’s Semantic Technology Conference in San Jose the week of June 14-18. I saw lots of fantastic sessions at last year’s SemTech and met tons of great people, and I imagine that this year will be even better. My colleagues at Cambridge Semantics and I will be giving a few talks, running the gamut from tutorial to technology survey to project report to our vision of how to build practical semantic solutions:

  • SPARQL By Example tutorial. I’ll be giving this half-day tutorial on Monday afternoon. We’ll use actual SPARQL queries that can be run on the (public) Semantic Web today as a means to learning SPARQL from the ground up.
  • Making Sense of Spreadsheets in Merck Basic Research. Jaime Melendez of Merck and I will be giving this talk bright and early on Tuesday morning. We’ll be reporting on the results of a joint innovation project that we completed last year using our Anzo software to address several challenges facing Merck basic research.
  • Enterprise Scalable Semantic Solutions in Five Days. Mike Cataldo will be talking later Tuesday morning about how Anzo makes use of semantic technologies to help our customers build practical, production-ready solutions in a matter of days.
  • Faceted Browsing Tools. Jordi Albornoz will be talking on Tuesday afternoon about the power and simplicity of faceted browsing and semantic lens technologies. He’ll be comparing and contrasting Exhibit, Fresnel, and our own Anzo on the Web.

I know that people have been saying this for a few years now, but I keep seeing the Semantic Web taking significant steps forward both inside of and outside of corporate firewalls. I fully expect this year’s SemTech to reaffirm this point of view. If you’ll be in San Jose, come by some of our talks and see what I mean. We’ll also have a space in the exhibit hall, so you can come and say hi there as well. See you there!

Find recent content on the main index or look in the archives to find all content.

Recent Comments

  • Dan Smith: Nice Lee, I really like it! I passed this video read more
  • Fabien Gandon: Great, I always considered Anzo was one of the greatest read more
  • Paul Thomas: We have implemented both client and server sides of the read more
  • Simon Gibbs: I tried to use the draft WSDL 1.1, as recommended read more
  • Dan Brickley: I worked on a pre-SPARQL RDF query binding for Squish, read more
  • Kris: Sounds very interesting. Unfortunately I can't help you. I'm planning read more
  • Danny: Thanks Lee! Good stuff. read more
  • Danny: Nice one Lee! Have you got the examples in a read more
  • Elias Torres: Will the video of the comedy routines be available online? read more
  • Prateek: Thanks for these! The examples will be very useful for read more

Recent Assets

Powered by Movable Type 4.23-en