09.21.06

Mash-ups and Mash-ins

Posted in development, semantic web, technology at 1:04 am by wingerz

mash-in

With more and more websites providing access to data via public APIs, mash-ups have become quite popular. The canonical mash-up takes data from one or more sources and plots it on a Google Map; three of our four summer project demos included a map component. MIT Simile’s Timeline provides a similarly draggable interface for time-based data. In most cases, a mash-up usually involves creating a new page or set of pages for displaying data.

Lately at work we’ve been tossing around the idea of a mash-in. Rather than creating a new page for displaying data, a mash-in supplements an existing page with additional data. For example, our summer demo involved supplementing corporate directory profiles with bookmarks from an intranet social bookmarking system and a map showing a business address. Profiles normally display several tabs containing various categories of information, and without access to the corporate directory webserver we added additional tabs containing our information. This is one way to add new functionality without forcing the user to adapt to a new interface.

We did this by setting up a proxy server to add a few JavaScript includes to the page. The included code effectively adds a tab to the page, grabs an email address from the contents of the page, and populates the tab with the results of an asynchronous data fetch based on the email address. We also could have written a Firefox extension or Greasemonkey script to do the includes instead.

Scraping the page for an email address is quite primitive; we envision that one day pages will be written in RDFa, which allows the embedding of RDF triples in XHTML. Rather than matching something that looks like an email address, we can run an RDFa parser to find RDF triples, then get the object of a triple with predicate “hasEmailAddress” or something to that effect. Instead of just getting a single value, it would also be easy to check to see if the page contains RDF describing an event, an address, a book, or something else. We could choose widgets to display based on the content of the page.

It gets even more interesting when you throw Queso into the mix as a backend storage system. Queso can store any sort of structure you’d like (without you having to pre-define the structure beforehand). This makes it easy to store user-specific information. In our demo we used Queso to store employee home addresses (for calculating map directions) and del.icio.us usernames (for fetching non-intranet bookmarks). These were two simple, but illustrative, examples.

It’s difficult to go beyond simple mash-ups and mash-ins without a flexible storage system because either you have to set up your own database to store a particular data structure (like storing gas prices and locations in gasbuddy) or work with siloed data with no simple means of bringing it together (for example, if you wanted to compare a particular user’s usage of tags in both flickr and del.icio.us — never mind that they’re both owned by Yahoo). In both cases you’d be able to get something working, but you’d probably end up doing it with your own database with custom database tables and code. Using Queso, you could store the data that you wanted without having to do this (of course, you would have to become familiar with RDF, RDFa, and SPARQL).

We are continuing work on mash-ins. Although they require more trust from the user (proxy / Greasemonkey / FF extension), we believe that this is offset by adding value to existing applications already familiar to users. And as RDFa becomes more pervasive, we’ll be able to add even more interesting functionality after analyzing the contents of a page.

08.18.06

End of the summer

Posted in personal, semantic web at 9:18 am by wingerz

On Wednesday our interns, Ben, and I drove down to Southbury, CT to demo Queso, our summer project. The final part of our demo showed a Queso-backed TiddlyWiki containing an entry with some RDFa encoding a calendar event. It displayed nicely as rendered HTML, but it was also query-able via SPARQL, so we dumped it (along with some other events) onto a Google Map as well as a Timeline. Mashup heaven!

Yesterday Sean hosted his annual Adtech summer barbecue at his lake house in Webster, MA. The entire day was incredibly pleasant. We hung out and ate on the deck, rested in the hammock, swam in the lake, canoed around a nearby island, went for a boat ride, and were dragged about mercilessly in the Megabird. The water was calm, comfortable, and beautiful.

Today is the interns’ last day. It’s been a fun, productive summer. Will definitely miss having them around, especially since they increased the size of our group by over 50%.

07.19.06

A Queso Example

Posted in development, semantic web, web at 11:17 pm by wingerz

I received an email from a friend who wanted to learn more about my work. He is not that familiar with the Semantic Web, and I tweaked my reply to him because it does a decent job of explaining what Queso does. Pre-reading: Queso introduction, RDFa in Queso.

Core technologies, in order of appearance:

  • RDF – Resource Description Framework – representing data in subject-predicate-object triples, heart and soul of the Semantic Web
  • Atom – most widely known as a feed format, also for publishing content
  • JSON – Javascript Object Notation – representing objects as strings that can be eval’ed in Javascript (there are also libraries for parsing JSON in other languages) – takes the pain out of parsing responses on the client
  • XHTML – HTML as XML, which means that you can’t have open tags, (p, I’m looking at you), attribute values must have quotation marks around them, etc.
  • RDFa – a standard for embedding RDF in XHTML
  • SPARQL – SPARQL Protocol and RDF Query Language – I like it even more because 1) it’s a recursive acronym and 2) Elias and Lee are on the working group.
  • Microtemplates – templates for HTML – bind Javascript objects to them to display your data

Our high-level goal this summer is to put a web front-end onto Boca, our RDF store. Most of our group’s work over the past few years has focused on building infrastructure to support the Semantic Web. Our UI work has primarily involved creating libraries for RCP-based applications, but we’ve always known that making things web-accessible is important.

In Queso, we want to store both data and XHTML/Javascript application code. We want web application designers to be able to dump their (potentially structured) data into the system, which stores everything as RDF triples. SPARQL provides an easy, standardized way to query for data in a flexible, precise manner. The data in the system can be easily extracted and repurposed for use in other applications, such as mash-ups. The following is a simple example where I’ve tried to improve readability by leaving out namespaces and base URIs. Don’t follow this as a copy-and-paste tutorial.

There’s a site called mywikimap that shows local gas prices on a Google Map. If you wanted to store data for an application like this using Queso, you could post Atom entries of the following format as XHTML:

<div about="[queso:thisEntry]">
	price: 		<span property="price">2.95</span>
	latitude: 	<span property="latitude">7</span>
	longitude: 	<span property="longitude">9</span>
	time found: 	<span property="time">2006-07-19T10:27:00</span>
</div>

While this will display in a (somewhat ugly) human readable form, it is also valid RDFa, so Queso extracts RDF triples from it (semicolon at the end of the line just means that the following triple has the same subject):


_:gasEntry  "2.95" ;
   "7" ;
   "9" ;
  

These triples will be stored along with the triples that represent the Atom entry. Now you can query them using SPARQL, with the following query (which restricts the results to entries from July 17 or later located in a certain area):


select ?price ?lat ?lon ?time
where {
  ?entry  ?price ;
     ?lat ;
     ?lon ;
    

What’s more, you can get this back as a JSON object that looks something like this (according to this standard):


{ price : "2.95", lat : "7" ... }

Then you can eval it in Javascript (in the real world you’d do some checking on it before eval’ing it):


var gasEntry = eval(jsonResult);
var str = "Price: " + gasEntry.price 
  + ", latitude: " + gasEntry.lat 
  + ", longitude: " + gasEntry.lon;

Rather than constructing a string on the client via string concatentation, we can use microtemplates to display the data. An example template (Note that the names of the classes match the names of the variables in the SPARQL query above):

<div id="gastemplate">
	price: <span class="price"></span>
	latitude: <span class="lat"></span>
	longitude: <span class="lon"></span>
	time found: <span class="time"></span>
</div>

In Javascript, the data can be bound to the template with the following line:


new IBM.ET4A.Template("gastemplate").bind(entry);

And now the div will show up with data values filled in.

So what’s so great about all of this? We didn’t have to deal with anything database related, like setting up tables and writing SQL code for inserting data into the system or querying. If we wanted to add additional data (say, the gas station name, the street address, etc.), we could encode additional triples very easily. We also didn’t have to write a web API to expose our data to the world. On the client, we didn’t have to parse any XML or RDF in Javascript. Anyone can use the data, either for its originally intended purpose (display on a Google Map) or something else (trend analysis across time and location or something more interesting).

Now that we’ve got the data set up, we move on to application development. In this (simple) case, one would develop the XHTML and Javascript files (including our libraries for Atom publishing and SPARQL querying) necessary for a web UI for this application and upload them to Queso. They would be posted to Queso as Atom entries containing the appropriate content types. And that’s it for application deployment – pointing your web browser to the appropriate URL would give you the content as XHTML. These files can also contain some RDFa markup so that additional metadata about the application can be stored. This would give us a server application registry for free via a SPARQL query for everything on the system that is of type application. For more complex applications we can set up traditional web frameworks and have them interact with Queso on behalf of a client.

Of course, there are a lot of problems as well. We haven’t looked much at security, though our store has built-in access control. We’re looking into how to make this as scalable as possible, and the thought of having an open SPARQL endpoint is frightening. But for now we hope that this can serve as a sandbox for people to experiment with RDF, Atom, RDFa, and SPARQL.

07.18.06

Atom/XHTML/RDFa in Queso

Posted in development, semantic web, web at 12:03 am by wingerz

Elias posted about Queso, our summer project that mixes the Web2.0 (REST-ful URLs, JSON, some nice Ajax-y UI libraries) and Semantic Web (data stored as RDF, queryable via SPARQL) worlds.

Content is inserted into the system is by posting Atom entries. The entries are stored as RDF (converted via Atom OWL by Henry Story), so their content and metadata is accessible via SPARQL queries. Because Atom relies on both an ID and a timestamp to establish the uniqueness of an entry, the ID cannot be used as the resource that serves as the subject of statements describing the particular Atom entry. Instead we use a blank node that has and http://www.w3.org/2005/10/23/Atom#updated> properties. All of the triples for a particular Atom id are stored in a named graph that has this id as its name.

Of course, we might want to store some additional RDF triples in addition to the ones that represent the Atom entry. For this we use RDFa which gives us a means of embedding RDF triples in XHTML. Using RDFa is one way to transition from the current web towards a more semantic web.

If the Atom entry is posted with the content type set to “xhtml”, Queso will run an RDFa parser on the content, extracting any embedded triples and storing them along with the triples for the entry (in the same named graph). For example, posting the following as content type “xhtml” (if you are posting with the ATOMbrowser, leave off the enclosing divs):


Wing Yung

will result in the following triples being added to the store:


_:thisEntry a Person ;
	  .
	  .
	 "Wing Yung"^^

along with other triples describing the author, title, content, summary, etc. of the Atom entry:


_:thisEntry a ;
	  .
	 "2006-08-25T23:22:00.995Z^^ .
	...

Note that we use in the XHTML as a magic resource that gets replaced by the blank node that represents this particular Atom entry.

Once the content has been posted it can be accessed in several ways. The first one is the ATOMbrowser. Navigate to the appropriate collection (“wingerz”) and entry (“RDFa test”). The content and other attributes are viewable in the right-most column. Grabbing id (urn:lsid:abdera.watson.ibm.com:entries:397859049), we can visit the entry’s content, which is output as XHTML.

Finally, if you go to the SPARQL browser, you can query the store for the triples. Click the “Graphs” tab and add the id (urn:lsid:abdera.watson.ibm.com:entries:397859049). Uncheck the introspection document and click back over to the “Query” tab. Run the default query (which brings back all of the triples in this named graph). For less data try the following query instead:


SELECT ?person ?blog ?pic
WHERE {graph ?g {
    ?person a  ;
       ?blog ;
       ?pic .
  }
} 

And of course, if you hit the SPARQL endpoint, preferably with Lee‘s SPARQL Javascript library, you can request your output in JSON format so that your client never has to parse RDF or Atom.

So what good is this? We hope to serve both data (content as Atom and RDF) and applications (content as Javascript and XHTML) off the server. Application deployment should be trivial. Semantic Web-aware developers can easily create, remove, update, and delete data, speeding up development time. And because everything is accessible through SPARQL, others can use the data or combine data from different applications in interesting ways.

See also:
Ben‘s entry about posting Atom entries using Abdera.

« Previous Page « Previous Page Next entries »