Life sciences on the web with SPARQL
I've been meaning to write up my experiences from WWW2006 in Edinburgh since getting back at the end of May, but the arrival of heaps of summer interns and the projects that accompany them (including Queso, a semantic-web-powered web-application framework) seems to have defeated that desire.
At the least, though, I wanted to mention the SPARQL/RDF life-sciences web mashup that I demoed at the Advancements in Semantic Web session of the W3C track on Friday. (And in doing so, follow my own example.) In his demo, we use RDF representations and SPARQL queries to integrate protein data from the NCBI with antibodies information from the Alzheimer Research Forum Antibody Database The presentation that I gave in Edinburgh has some more information on how the demo is put together and what Elias and I learned from our work on it.
How to use the demo
- Navigate to the demo at http://thefigtrees.net/lee/sw/demos/antibodies/.
- Enter a search term to find related proteins. For the purposes of trying out the demo, enter
p53
and click Find Proteins.1 - Up to twelve proteins found in the search are rendered on the display, along with the protein's species, description, and NCBI number. Click on a protein to search for antibodies that target that protein. For the purposes of trying out the demo, click on
NP_000537
.2 - Any antibodies found are displayed in a column on the right-half of the page. The information displayed includes the distributor of the antibody, the distributor's catalog number, the immunogen used to generate the antibody, the specificity of the antibody, and the uses for which the antibody is appropriate.
This demo makes use of some of the early work being done by Alan Ruttenberg in conjunction with the BioRDF subgroup of the W3C's Semantic Web in Health care and Life Sciences Interest Group.
Behind the scenes
1 Two SPARQL queries are used to do this initial search. First, we use a service written by Ben Szekely which performs an NCBI Entrez search and returns the LSIDs of the resulting objects within a simple RDF graph. For each of these LSIDs, we make use of a second one of Ben's services which allows us to resolve the metadata for an LSID via a simple HTTP GET. We use the URLs to this service as the graphs for a second SPARQL query which retrieves the details of the proteins. We take the results of this second SPARQL query as JSON and bind them to a microtemplate to render the protein information.
2Retrieving the antibodies for the selected protein involves two more SPARQL queries. First, we query against a map created by Alan Ruttenberg in order to find AlzForum antibody IDs that correspond to the target protein. We need the results of this query to generate HTTP URLs which search the AlzForm antibody database for the proper antibodies. (If we had a full RDF representation of the antibody database, this query would be unnecessary.) These search URLs are wrapped in a service we created that scrapes the HTML from the antibody search results Web page and generates RDF (how I yearn for RDFa adoption) and then uses these wrapped URLs as the graphs for a second SPARQL query. This query joins the NCBI data with Alan's mapping and the antibody details to retrieve the information that is rendered for each antibody of the target protein.