02.06.07

Text indexing and query in Boca

Posted in semantic web, technology at 10:14 am by wingerz

slrplucene.png

Just about every computer user is very familiar with and competent in text search. While end users may not be writing custom search queries, they appreciate UIs that allow them to search with more accuracy and precision. Occasionally users want to find something very specific by searching across people’s names, or book titles, or paper abstracts instead of all of the indexed text in a system. Clever keyword searching and luck can only get you part of the way there.

Sleuth, Boca‘s text indexing component, addresses this problem (in the Boca world). We’ve been using it for quite a while. Similar to LARQ, Boca uses Lucene to index string literals when the feature is enabled. We’ve designated a magic predicate for querying the text index with SPARQL and hooked it into Glitter, our wonderfully-named SPARQL engine. So now we can do SPARQL queries with integrated text queries, like “find me people (not airplane components or animal appendages) where the name matches ‘Wing’”:


PREFIX foaf:    
PREFIX boca: 
SELECT ?person ?name
WHERE {
	?person foaf:name ?name .
	?name boca:textmatch "Wing" .
}

This powerful feature allows SPARQL-aware developers to roll their own APIs. It’s easy to whip up a search across the all literals for traditional text search behavior. With a little more work, you can craft more sophisticated searches, like one for authors of a paper that mentions a specific search term in the abstract (say, “march madness”).

For more details on how to set this up, please see our documentation on Boca text indexing.

1 Comment »

  1. carmen said,

    February 6, 2007 at 9:58 pm

    cool. i blogged about this a few weeks ago – except ive taken out all the stops and put the text index right into the triple-store, in line with TimBL’s “Dictionaries in the Library” note – retaining platform agnosticity at the SPARQL-support level and not requiring an additional text lib.. its plenty fast for me. but then i probably dont work with the amount of data that you do.. just 50 MB or so of stuff in MySQL..

Leave a Comment