Why I hate Twitter

Posted in family, personal, technology, web at 3:30 am by wingerz


I’ve never been into Twitter, though I recognize its appeal as a communication medium. It’s quick and convenient; a good place to share a transient thought and to record every freaking thing you’re up to.

My protest is against my wife’s usage of Twitter as a substitute for blogging. Some people should not be reduced to expressing their thoughts in 140 characters. It’s deprived me of one of the great joys of my life – reading my wife’s beautiful prose. Consider the following examples of words that would have gone unwritten and unread:

And Grant’s favorite (which I just spent some time hunting down):

For those of you who don’t know, Jen has started a new blog but thinks that no one is going to read it. Please give her some encouragement.

jcliao, consider yourself called-out.


Looking at game data through Many Eyes

Posted in development, games, semantic web, technology, web at 1:53 pm by wingerz


Yesterday I blogged about creating an Exhibit for a list of the 100 best-selling games of 2006. Exhibit is great for looking at how data items fall into categories, but it’s not as good for visualizing quantities. IBM’s own Many Eyes provides several very nice visualization tools (Swivel allows data upload and visualization as well, but I am not that familiar with it, and it looks like someone beat me to it).

I uploaded my text data and created a few quick visualizations.
Review score vs. sales. As people have already remarked, a well-reviewed game won’t necessarily sell that well. Alas.
Release month. This is a recreation of one of the charts that appeared in the original Next Generation article. Summer is always kind of quiet and things get more exciting towards the holidays.
Categorization treemap. This is one of my favorite data viewers. Each game is a rectangle. The area is the number of sales. You can drag the labels (next to “Treemap organization”) in order to redraw the treemap. Drag “publisher” all the way to the left to see why EA cranks out annual releases of their sports titles. Drag “genre” over to see the portion of sales that are sports titles or games based on licenses. Dragging “systems” over doesn’t give you a great view of the data because the original data wasn’t all that clean and Many Eyes doesn’t seem to handle multi-value properties. I’m not sure why it’s showing a quote about the game by default instead of the title.

My other favorite data viewer (that I was not able to use) is the Stacked Graph viewer, made popular by the Baby Name Voyager.

One last note: I wasn’t allowed to edit the visualizations after I created them, so keep that in mind as you think of titles and tags for them.

Popular video games of 2006 Exhibit

Posted in development, games, semantic web, technology, web at 12:29 am by wingerz


A few weeks ago I came across an article about the top-selling games of 2006. There’s some analysis, then a list of the top 100 games spread across 10 web pages (starting, of course, with games ranked 100 to 91). Unfortunately, there isn’t a great way to really take a close look at the data. For example, I really wanted to see some Nintendo-specific analysis.

The data was screaming to be let out, so I scraped it and put it into an Exhibit. It was not a quick and easy process. I am quite certain that the HTML was hand-coded – the quotes start with “, ", |, or nothing at all, and some of the other elements are mixed up.The game platforms are not very well specified so I may need to go through and clean it up later; for this reason the portable/homeconsole sections are not 100% accurate.

Anyhow, now I have a perl Data::Dumper file, tab-delimited text file, and a JSON representation. Will probably upload the text file to Many Eyes for kicks.


Applications Running on a Local Webserver

Posted in technology, web at 10:04 am by wingerz


A few months ago Lee downloaded and installed Tracks, a task organizer in the Getting Things Done flavor. It runs as as a Ruby on Rails application on Dreamhost. There’s no reason that it has to be running on a server somewhere; I’m the only one who accesses the data and I only access it from one machine (well, besides Lee). It got me thinking more about software that you connect through a web browser that can fake an offline mode by running a local server on your desktop.

Yesterday I read about Parakey, which installs a local webserver that syncs with an online server for sharing when the host machine is connected to the Internet. There’s another article on RWW about the convergence of online and desktop applications. The web approach to this has some interesting possibilities – if the local and remote servers are abstracted into datasources that web clients (HTML/JavaScript/Greasemonkey/FF extensions) consume (rather than HTML providers), savvy web developers can adapt, theme, and share the application client code. Users can choose to expose features that they use commonly. They’ll have more flexibility and they’ll also be able to see the same UI while offline, since those customized files will be hosted locally.


Political blog tracker

Posted in development, technology, web at 4:26 pm by wingerz


Elias was telling me about his final project for a distributed systems class that he’s currently taking – he set up a political blog crawler. Check it out. And digg it!

The crawler is based on Nutch and Hadoop. It finds entries from thousands of blogs about candidates in the upcoming elections. He suckered me into writing some Python to transform the output into nicely-formatted HTML (with some help from Alister). Contrary to Elias’s blog post, I’m not wildly into politics, but was more interested in playing with the data and learning Python. Feeds for states and individual races should hopefully be up by tomorrow morning.


Firefox 2.0 Release

Posted in firefox, technology, web at 11:48 pm by wingerz


I downloaded Firefox 2.0 earlier today but I guess I wasn’t supposed to. I’ll know better next time. The official release is tomorrow.

A few nice things I’ve noticed so far: 1) spellcheck in textareas (incidentally, but “spellcheck” and “textareas” are marked as misspelled) and 2) integrated feed subscription in the address bar (the orange feed icon is visible when you’re on a page with links to feeds, just click it subscribe), making the bookmarklets I mentioned in a previous post unnecessary. I’m sure some of the other new stuff will be be interesting to try out, like JavaScript 1.7 (what about other browsers?) and client-side persistent storage.

Newspapers and the Semantic Web

Posted in semantic web, technology, web at 11:30 pm by wingerz


Recently I read a blog entry by Adrian Holovaty on how newspaper publishers should focus on providing news in a somewhat structured form instead of plain text blobs so that it can be analyzed in bulk, mashed-up, and repurposed. The entire entry screams Semantic Web, as a few of the commenters pointed out. So here’s another Semantic Web daydream, like the ones put forth by Lee (in law) and Elias (in everyday life).

Modeling data: In the article Holovaty mentions several types of articles that have a specific structure (like wedding announcements, obituaries, etc.). It’s not difficult to imagine the design for a traditional database system for storing this. For example, consider two tables: one containing data about articles (date published, title) and one containing data about people (name, contact information, etc.). To link articles to people (a many-to-many relationship), we need to create a third table with two columns: one to hold an article ID and one to hold a person ID. Every row in this table would represent a link between and article and a person. Of course, articles and people can be linked in several ways; some possibilities include the person as a major character, minor character, editor, writer, or interviewee. We could create additional tables, one per type of relationship, or we could add a third column to our join table and keep track of the relationship between article and person.

Of course, it gets even more complicated when you realize that each type of article needs its own table since it has its own set of defining traits. And perhaps you want a table of places (name, street address, latitude, longitude). All of these things need to linked together.

Our previously mentioned table with three columns (article, person, and relationship) is a shadow of the Semantic Web. This particular table is quite limited because the article and person are identified by an integer id that is only unique to those tables. In RDF, the core Semantic Web standard, data is expressed as a set of subject-predicate-object triples. In this example, (subject, predicate, object) = (article, relationship, person). Globally unique, resolvable URIs are used instead of integers to identify entities (called resources, the R in RDF), and they are also used to identify predicates.

Now, adding a new relationship between two resources is easy – just pick a predicate to link them up and add the new (s, p, o) triple. Resources can also be linked to literal data like strings and numbers so any data object can be modelled. There’s no more jumping through data modeling hoops. Because it’s so easy to model and create data, there’s going to be a lot more of it and it will be more descriptive.

Analyzing the data with SPARQL: Using a traditional system, you’d have to spend some time designing a data access API which probably would not be as expressive as you would like. It would also be quite brittle; changes to your data schema would need to be bubbled up to the access API. Opening up RDF data via a SPARQL endpoint would give users a powerful tool to analyze news – instead of being limited to restrictive APIs they are allowed to freely explore the data. And because the data is encoded in RDF, following the relationships between different resources is a trivial matter (that doesn’t involve joining three database tables). Assuming the appropriate triples had been encoded, you could write the following queries: “Find all articles from 2006 mentioning Microsoft that quote Sam Palmisano” and “Find recaps of Laker games where they won by three or fewer points.” Note that both of these queries aren’t easy to do via a text search, but are quite straightforward in SPARQL.

Getting data into the Semantic Web: One of the problems Holovaty cites is that journalists are resistant to change. Fortunately, research on semantic wikis (like Semantic Mediawiki) should lead to some interesting and intuitive text-based systems for writing prose and entering the relevant RDF triples in a simple manner.

It’s fun to do these thought experiments, and they go a long way towards convincing us that we’re onto something here. A system like this would be relatively easy to maintain and provide a great service for analyzing current events and mashing them up.



Posted in technology, web at 7:03 pm by wingerz


Over the past few weeks several people have asked me about the significance of RSS. In short, it’s one of several standards that will change the way that you get content from the Web. Instead of you having to periodically visit various sites to look for the latest content, sites publish feeds that are aggregated by feed readers, and you simply go to the feed reader to view new content from all of the sites. A feed is structured data that describes (and usually contains) recently posted content – it is a list of items (called entries), each of which usually has a title, author, publishing time, content, and some other data. RSS is one format for this structured content, Atom is another. Feeds and feed readers save you the trouble of having to poll sites for new content, and it also ensures that you won’t miss anything on a high-volume site or lose interest in a low-volume site. More and more sites are syndicating their content, but not enough people are taking advantage of this.

Feed readers frequently request feeds from the sites that you have told them about, keeping track of new content since the last time you read. You can tell your reader to start tracking a feed by finding a feed’s URL (usually something that has some combination of feed, rss, atom, and xml) and giving it to the reader directly. This is annoying because you have to deal with finding, cutting, and pasting. Some feeds have reader-specific buttons next to them to allow you to subscribe using your particular reader, but this isn’t always something you can count on.


I’ve found that the best way to subscribe to new feeds is with a bookmarklet in my bookmarks toolbar. When I’m on a site that I’d like to subscribe to, I click the bookmarklet. This sends a request to my feed reader’s site that includes the URL of the current page. The reader can then analyze the page to find links to available feeds associated with the page (which were put there by a developer). If you look around on your feed reader’s site you should be able to find something that says “drag this to your bookmark toolbar to create a bookmarklet.” (I’ve also included the Bloglines and Google Reader ones at the bottom of this post.) Here’s my toolbar, which includes bookmarklets for Google Reader and Bloglines (and two other bookmarklets for posting links to del.icio.us and dogear).

Note that if the site’s developer was lazy and didn’t include links to the feed in the page, you’ll have to find it yourself. Look for “feed”, “rss”, “syndicate”, “atom”, or the pretty orange icon. Once you find it you will have to copy the URL of the feed and paste it into your reader. Of course, it’s also possible that the page doesn’t have any feeds associated with it at all, in which case you’re stuck checking back every so often the old-fashioned way.

Some of the more popular web-based feed readers:

  • Bloglines: What I currently use. It’s simple. The main drawback is a 200-post limit on every feed. Fortunately I don’t go on vacation very often so the feeds don’t have a chance to accumulate.
  • Google Reader: What I’m trying out for a few weeks. Also web-based, doesn’t have the 200-post limit, plus I’ve seen some good reviews for it. My two nitpicky complaints so far deal with subscribing to feeds: 1) when links to multiple feeds are present in a page, the GR bookmarklet doesn’t let me choose which feed I want to subscribe to and 2) When I subscribe a feed it doesn’t let me put it in a folder – I can only do that later when I’m organizing my feeds.
  • Netvibes: I’ve never tried it before, but Grant loves it. The UI looks pretty snazzy.

There are also some other non-web-based feed readers, like RSS Bandit, which Lee uses. I’d recommend setting up an account with a few of them and seeing which one feels the best.

So, that should be enough to get you started. Subscribe to this blog’s feed. Then just surf as you normally would, but subscribe to sites using your bookmarklet. In a few days you’ll be letting the syndicated content come to you. And you’ll probably be addicted to your feed reader. The technology is definitely mature and prevalent enough to be useful to just about everyone. For example, my non-techy wife uses Bloglines to track all of those great celebrity gossip blogs. Isn’t technology great?

For you feed power-users, feel free to jump in with comments about your favorite tools and blogs.

[Bookmarklets (drag to your bookmarks toolbar):
Bloglines | Google Reader]


Trivial Web2.0 Thought of the Day

Posted in technology, web at 5:33 pm by wingerz


If I see another Web2.0 site name that ends in [^aieou]r that isn’t associated with flickr, I’m going to vomit. Today, seeing weekendr pushed me over the edge, and I’m not even going to dignify it with a link.


Sudoku Slam

Posted in games, web at 10:41 pm by wingerz

sudoku slam

My good friend AJ and his buddy Bill have finally announced Sudoku Slam, their ridiculously full-featured Sudoku web application. It takes a lot of the tedium out of solving sudoku puzzles and features a great hint system that will help you to become a better sudoku player as it explains the reasoning behind the hints it provides.

Very slick, a ton of work went into this.

« Previous entries Next Page » Next Page »