JavaScript SPARQL Editor Improvement

Posted in development, semantic web at 9:00 pm by wingerz


Last week Danny Ayers put up a JavaScript SPARQL editor. This afternoon I grabbed his code and pasted it into our SPARQL web UI (by Elias Torres and Lee Feigenbaum), tweaking a few things in our code along the way.

You can construct your query on the ‘Query’ tab and set the endpoint and add graphs on the ‘Graphs’ tab.

The code still needs a bit of clean-up and I’d like to tweak a few more things, but I thought I’d throw it out there. If you click on ‘Get Results’ with all of the default settings, you should get some results. Note that it currently only works in Firefox and requires the UniversalBrowserRead privilege to run.

Give it a try: JavaScript SPARQL Editor/Submitter.


Mash-ups and Mash-ins

Posted in development, semantic web, technology at 1:04 am by wingerz


With more and more websites providing access to data via public APIs, mash-ups have become quite popular. The canonical mash-up takes data from one or more sources and plots it on a Google Map; three of our four summer project demos included a map component. MIT Simile’s Timeline provides a similarly draggable interface for time-based data. In most cases, a mash-up usually involves creating a new page or set of pages for displaying data.

Lately at work we’ve been tossing around the idea of a mash-in. Rather than creating a new page for displaying data, a mash-in supplements an existing page with additional data. For example, our summer demo involved supplementing corporate directory profiles with bookmarks from an intranet social bookmarking system and a map showing a business address. Profiles normally display several tabs containing various categories of information, and without access to the corporate directory webserver we added additional tabs containing our information. This is one way to add new functionality without forcing the user to adapt to a new interface.

We did this by setting up a proxy server to add a few JavaScript includes to the page. The included code effectively adds a tab to the page, grabs an email address from the contents of the page, and populates the tab with the results of an asynchronous data fetch based on the email address. We also could have written a Firefox extension or Greasemonkey script to do the includes instead.

Scraping the page for an email address is quite primitive; we envision that one day pages will be written in RDFa, which allows the embedding of RDF triples in XHTML. Rather than matching something that looks like an email address, we can run an RDFa parser to find RDF triples, then get the object of a triple with predicate “hasEmailAddress” or something to that effect. Instead of just getting a single value, it would also be easy to check to see if the page contains RDF describing an event, an address, a book, or something else. We could choose widgets to display based on the content of the page.

It gets even more interesting when you throw Queso into the mix as a backend storage system. Queso can store any sort of structure you’d like (without you having to pre-define the structure beforehand). This makes it easy to store user-specific information. In our demo we used Queso to store employee home addresses (for calculating map directions) and del.icio.us usernames (for fetching non-intranet bookmarks). These were two simple, but illustrative, examples.

It’s difficult to go beyond simple mash-ups and mash-ins without a flexible storage system because either you have to set up your own database to store a particular data structure (like storing gas prices and locations in gasbuddy) or work with siloed data with no simple means of bringing it together (for example, if you wanted to compare a particular user’s usage of tags in both flickr and del.icio.us — never mind that they’re both owned by Yahoo). In both cases you’d be able to get something working, but you’d probably end up doing it with your own database with custom database tables and code. Using Queso, you could store the data that you wanted without having to do this (of course, you would have to become familiar with RDF, RDFa, and SPARQL).

We are continuing work on mash-ins. Although they require more trust from the user (proxy / Greasemonkey / FF extension), we believe that this is offset by adding value to existing applications already familiar to users. And as RDFa becomes more pervasive, we’ll be able to add even more interesting functionality after analyzing the contents of a page.


Virtualization and VMWare Disk Management

Posted in development, virtualization at 12:39 am by wingerz


Virtualization offers a number of nice benefits with a degradation in performance. Lee and I enjoy joking about how running a computer inside a computer is mind-blowing. It’s pretty cool that products like Xen and VMWare work so well.

Our group has been pretty interested in virtualization of late. One of our summer projects involved virtualization. Additionally, we’ve been running our RCP-based research platform demo (which runs on top of our RDF store which runs on top of DB2) off a VMWare image. We configured and started up the components necessary for the demo, then took a snapshot of the system so that whoever is running the demo (usually not one of us) can start things off from the snapshot, do whatever they want, and revert to the beginning state at any point. It’s also nice because we don’t have to worry as much about keeping whatever we’re actively working on demo-able at all times. No matter what’s happened to the code base or the underlying database schema, we can always fire up the demo virtual machine.

One particularly attractive scenario involves running customer applications in virtual machines. If the system goes down, the entire image can be shipped out for problem diagnosis. Various solutions can be attempted and if one happen to fail miserably, the changes can be discarded in favor of another approach.

Today I was setting up an image for some of our summer demos. The image I was working with had a paltry 4GB of pre-allocated disk space. It turns out that increasing the size of the disk is not something that can be done from the VMWare GUI. Fortunately, I found helpful blog post. vmware-vdiskmanager is a command-line utility used to resize disks. Not sure why this stuff hasn’t been incorporated into the GUI, but it’s reasonably easy to work with.

To expand the size of the disk:

vmware-vdiskmanager -x 10gb disk.vmdk

Then I installed Partition Magic on the virtual machine and expanded the partition so that it took up the entire disk.

Then I noticed that my virtual disk files were huge – they took up 10GB even though only about 5GB of the disk space was actually being used. I discovered that this was because the existing image that I started with was pre-allocated, which meant that it was not growable, which meant that the disk files could not be compressed.

To change the disk type (to type 1, growable):

vmware-diskmanager -r disk.vmdk -t 1 new.vmdk

Then I edited the vmx file to point to the newly created new.vmdk.

Defragged the disk to maximize compressibility (this can be done either from the VMWare Tools or from inside the virtual machine), and shrank the disk so that it takes up less room on the host machine.

I started out with an image with Windows XP and a few utilities installed. I added DB2 and Eclipse and set up the RDF store. Then I saved the image so that other people in the group can use it as a base to install their demos.


A Queso Example

Posted in development, semantic web, web at 11:17 pm by wingerz

I received an email from a friend who wanted to learn more about my work. He is not that familiar with the Semantic Web, and I tweaked my reply to him because it does a decent job of explaining what Queso does. Pre-reading: Queso introduction, RDFa in Queso.

Core technologies, in order of appearance:

  • RDF – Resource Description Framework – representing data in subject-predicate-object triples, heart and soul of the Semantic Web
  • Atom – most widely known as a feed format, also for publishing content
  • JSON – Javascript Object Notation – representing objects as strings that can be eval’ed in Javascript (there are also libraries for parsing JSON in other languages) – takes the pain out of parsing responses on the client
  • XHTML – HTML as XML, which means that you can’t have open tags, (p, I’m looking at you), attribute values must have quotation marks around them, etc.
  • RDFa – a standard for embedding RDF in XHTML
  • SPARQL – SPARQL Protocol and RDF Query Language – I like it even more because 1) it’s a recursive acronym and 2) Elias and Lee are on the working group.
  • Microtemplates – templates for HTML – bind Javascript objects to them to display your data

Our high-level goal this summer is to put a web front-end onto Boca, our RDF store. Most of our group’s work over the past few years has focused on building infrastructure to support the Semantic Web. Our UI work has primarily involved creating libraries for RCP-based applications, but we’ve always known that making things web-accessible is important.

In Queso, we want to store both data and XHTML/Javascript application code. We want web application designers to be able to dump their (potentially structured) data into the system, which stores everything as RDF triples. SPARQL provides an easy, standardized way to query for data in a flexible, precise manner. The data in the system can be easily extracted and repurposed for use in other applications, such as mash-ups. The following is a simple example where I’ve tried to improve readability by leaving out namespaces and base URIs. Don’t follow this as a copy-and-paste tutorial.

There’s a site called mywikimap that shows local gas prices on a Google Map. If you wanted to store data for an application like this using Queso, you could post Atom entries of the following format as XHTML:

<div about="[queso:thisEntry]">
	price: 		<span property="price">2.95</span>
	latitude: 	<span property="latitude">7</span>
	longitude: 	<span property="longitude">9</span>
	time found: 	<span property="time">2006-07-19T10:27:00</span>

While this will display in a (somewhat ugly) human readable form, it is also valid RDFa, so Queso extracts RDF triples from it (semicolon at the end of the line just means that the following triple has the same subject):

_:gasEntry  "2.95" ;
   "7" ;
   "9" ;

These triples will be stored along with the triples that represent the Atom entry. Now you can query them using SPARQL, with the following query (which restricts the results to entries from July 17 or later located in a certain area):

select ?price ?lat ?lon ?time
where {
  ?entry  ?price ;
     ?lat ;
     ?lon ;

What’s more, you can get this back as a JSON object that looks something like this (according to this standard):

{ price : "2.95", lat : "7" ... }

Then you can eval it in Javascript (in the real world you’d do some checking on it before eval’ing it):

var gasEntry = eval(jsonResult);
var str = "Price: " + gasEntry.price 
  + ", latitude: " + gasEntry.lat 
  + ", longitude: " + gasEntry.lon;

Rather than constructing a string on the client via string concatentation, we can use microtemplates to display the data. An example template (Note that the names of the classes match the names of the variables in the SPARQL query above):

<div id="gastemplate">
	price: <span class="price"></span>
	latitude: <span class="lat"></span>
	longitude: <span class="lon"></span>
	time found: <span class="time"></span>

In Javascript, the data can be bound to the template with the following line:

new IBM.ET4A.Template("gastemplate").bind(entry);

And now the div will show up with data values filled in.

So what’s so great about all of this? We didn’t have to deal with anything database related, like setting up tables and writing SQL code for inserting data into the system or querying. If we wanted to add additional data (say, the gas station name, the street address, etc.), we could encode additional triples very easily. We also didn’t have to write a web API to expose our data to the world. On the client, we didn’t have to parse any XML or RDF in Javascript. Anyone can use the data, either for its originally intended purpose (display on a Google Map) or something else (trend analysis across time and location or something more interesting).

Now that we’ve got the data set up, we move on to application development. In this (simple) case, one would develop the XHTML and Javascript files (including our libraries for Atom publishing and SPARQL querying) necessary for a web UI for this application and upload them to Queso. They would be posted to Queso as Atom entries containing the appropriate content types. And that’s it for application deployment – pointing your web browser to the appropriate URL would give you the content as XHTML. These files can also contain some RDFa markup so that additional metadata about the application can be stored. This would give us a server application registry for free via a SPARQL query for everything on the system that is of type application. For more complex applications we can set up traditional web frameworks and have them interact with Queso on behalf of a client.

Of course, there are a lot of problems as well. We haven’t looked much at security, though our store has built-in access control. We’re looking into how to make this as scalable as possible, and the thought of having an open SPARQL endpoint is frightening. But for now we hope that this can serve as a sandbox for people to experiment with RDF, Atom, RDFa, and SPARQL.


Atom/XHTML/RDFa in Queso

Posted in development, semantic web, web at 12:03 am by wingerz

Elias posted about Queso, our summer project that mixes the Web2.0 (REST-ful URLs, JSON, some nice Ajax-y UI libraries) and Semantic Web (data stored as RDF, queryable via SPARQL) worlds.

Content is inserted into the system is by posting Atom entries. The entries are stored as RDF (converted via Atom OWL by Henry Story), so their content and metadata is accessible via SPARQL queries. Because Atom relies on both an ID and a timestamp to establish the uniqueness of an entry, the ID cannot be used as the resource that serves as the subject of statements describing the particular Atom entry. Instead we use a blank node that has and http://www.w3.org/2005/10/23/Atom#updated> properties. All of the triples for a particular Atom id are stored in a named graph that has this id as its name.

Of course, we might want to store some additional RDF triples in addition to the ones that represent the Atom entry. For this we use RDFa which gives us a means of embedding RDF triples in XHTML. Using RDFa is one way to transition from the current web towards a more semantic web.

If the Atom entry is posted with the content type set to “xhtml”, Queso will run an RDFa parser on the content, extracting any embedded triples and storing them along with the triples for the entry (in the same named graph). For example, posting the following as content type “xhtml” (if you are posting with the ATOMbrowser, leave off the enclosing divs):

Wing Yung

will result in the following triples being added to the store:

_:thisEntry a Person ;
	 "Wing Yung"^^

along with other triples describing the author, title, content, summary, etc. of the Atom entry:

_:thisEntry a ;
	 "2006-08-25T23:22:00.995Z^^ .

Note that we use in the XHTML as a magic resource that gets replaced by the blank node that represents this particular Atom entry.

Once the content has been posted it can be accessed in several ways. The first one is the ATOMbrowser. Navigate to the appropriate collection (“wingerz”) and entry (“RDFa test”). The content and other attributes are viewable in the right-most column. Grabbing id (urn:lsid:abdera.watson.ibm.com:entries:397859049), we can visit the entry’s content, which is output as XHTML.

Finally, if you go to the SPARQL browser, you can query the store for the triples. Click the “Graphs” tab and add the id (urn:lsid:abdera.watson.ibm.com:entries:397859049). Uncheck the introspection document and click back over to the “Query” tab. Run the default query (which brings back all of the triples in this named graph). For less data try the following query instead:

SELECT ?person ?blog ?pic
WHERE {graph ?g {
    ?person a  ;
       ?blog ;
       ?pic .

And of course, if you hit the SPARQL endpoint, preferably with Lee‘s SPARQL Javascript library, you can request your output in JSON format so that your client never has to parse RDF or Atom.

So what good is this? We hope to serve both data (content as Atom and RDF) and applications (content as Javascript and XHTML) off the server. Application deployment should be trivial. Semantic Web-aware developers can easily create, remove, update, and delete data, speeding up development time. And because everything is accessible through SPARQL, others can use the data or combine data from different applications in interesting ways.

See also:
Ben‘s entry about posting Atom entries using Abdera.


304 or 200?

Posted in development, firefox, web at 10:20 am by wingerz

When you make a request over HTTP, you can get a number of status codes returned (like 200 (OK) and 404 (not found)). One such code is 304, which tells the client that the page requested was not modified, meaning that the client can use the local cached version.

Unfortunately, when you make a XMLHttpRequest in Firefox (allowing you to make a HTTP request without refreshing the page) and the return status set by the web server is 304, the XMLHttpRequest.status property is set to 200. In most cases this is fine because it doesn’t matter whether the page came from a webserver or from the local cache – in either case, the content is available for use.

In our case we want to be able to query our server about a particular entity and only take additional action running some SPARQL queries) if the status code is 200. When our server returns a 304, we won’t do anything. But since Firefox only gives a 200 in this case, we’ve had to add a custom HTTP header.

I found some mention of this problem elsewhere. In my opinion, if a webserver returns a 304, that should get bubbled up to the client somehow since it has meaning beyond 200, so I filed a bug report. Sounds like not everyone agrees with me.


Long Eclipse 3.2 classpath in Windows

Posted in development at 12:12 pm by wingerz

A few weeks ago I installed the latest Eclipse 3.2 milestone. I had trouble running some of my programs (“Exception executing command line”) and tracked it down to a very long classpath in a command being passed to Windows to execute (it got cut off a bit past the 1000th char). To fix it:

  • Navigate to the Run configuration of your program.
  • Open the Classpath tab.
  • Navigate to your project under “User Entries”.
  • Click “Edit…”
  • Check “Only include exported entries”



Posted in development, dogooder, web at 8:32 pm by wingerz

I was adding links to feeds on doGooder a few nights ago. I wanted to create some ‘subscribe to this feed’ links for various RSS readers, so I started hunting around for the appropriate icons. I stumbled upon FeedBurner (via several sources, including vitamin).

Point Feedburner at your feed and point your users to Feedburner – they’ll see a page like this (all good deeds) or this (approved good deeds). The original feed is in Atom, but Feedburner converts it to whatever format is required. And it provides all of the nice little icons.

Feedburner does other stuff as well – the subscription tracking looks quite nice. Will be playing around with it more.


Firefox Extension Keysets

Posted in development, firefox at 8:43 pm by wingerz

I helped my friend and colleague, Elias, debug his Google Calendar Quick-Add Firefox extension a a few weeks ago. The problem was that on some machines (including mine) the keyboard shortcut was not registered correctly (even though it showed up when we ran KeyConfig). Since KeyConfig itself successfully added a keyboard shortcut, I extracted the code to take a look.

Apparently in some configurations newly created keysets in the extension’s overlay.xul file are not added to existing global keysets. I changed the code from this:

to this:

« Previous Page « Previous Page Next entries »