Elias was telling me about his final project for a distributed systems class that he’s currently taking – he set up a political blog crawler. Check it out. And digg it!
The crawler is based on Nutch and Hadoop. It finds entries from thousands of blogs about candidates in the upcoming elections. He suckered me into writing some Python to transform the output into nicely-formatted HTML (with some help from Alister). Contrary to Elias’s blog post, I’m not wildly into politics, but was more interested in playing with the data and learning Python. Feeds for states and individual races should hopefully be up by tomorrow morning.
At Compusa in Woburn, I found a few copies of FFXII on the shelves. Underneath was a yellow sign saying “$49.99, price valid from 10/29-11/4, release 10/31.” Knowing that Lee was planning on picking up a copy on Tuesday, I tried to purchase it. Unfortunately, it rang up as over $1000. The cashier went to check with a manager and told me that she couldn’t sell it to me. Oh well.
Jen’s in Chicago this weekend. I haven’t been taking the best care of myself – I stayed up until 3am last night watching Kill Bill Vol. 2 and I’m still up tonight. Between writing code and watching some episodes of The Office, I’ve basically been in front of my laptop for the past 15 hours (with a few breaks). My eyes are quite tired, but it’s been a productive day. The sometimes incremental nature of debugging and coding makes it so easy to envelop yourself in the task at hand and completely lose track of time – it’s a satisfying feeling.
On Saturday Jen and I went to Wilson Farms in Lexington. Jen had heard a bit about it previously, but neither of us really knew what to expect. After being greeted by the ponies, we wandered through an outdoor market where we were drawn to the delightful smell of fresh cider donuts. We sampled some delicious carrot soup. Inside was a gourmet grocery store with a lot of farm-grown produce, baked goods, and prepared foods. We tried not to go too crazy, eventually ending up with a sourdough baguette, fresh spinach pasta (which we cooked and immediately ate with olive oil, tomatoes, basil, and fresh mozzarella when we got home), apple cider, chocolate milk, eggs, a dozen cider donuts, and some potato leek soup (which we ate for dinner). It was a bit too far for me to make it a regular shopping destination, but I’ll definitely go back at some point; the food was very tasty and fresh. I picked up their cookbook too (only $5) so hopefully I’ll get around to making a few recipes out of it in the next few weeks.
This past weekend was Head of the Charles. I’ve never been a huge crew fan, but I was looking for an excuse to get outside to enjoy a crisp autumn Sunday afternoon. Jeff and I went down to the river in the late afternoon to catch the last round of races. Of course, we couldn’t really tell the teams apart and we didn’t know who won, but we did get to see a few boats bump each other as they passed under the Weeks’ Footbridge. I snapped a few photos as well.
With the Wii &PS3 launches less than a month away, it’s easy to get caught up in the hype surrounding the new hardware and games. There have been quite a few blog entries about saving yourself some money by waiting for the first round of price drops on both the consoles and the games. Of course, that isn’t going to be a lot of fun. At the same time, a Wii, extra controller, and two launch games will set you back about $430. Fortunately, nearly all of the great GameCube games can be purchased for $15-20, so you could put your money towards Zelda (it’s the only game guaranteed to be awesome at launch, though Rayman, Monkeyball, and Red Steel could potentially be good) and 3-4 spectacular GameCube games. So, here are some suggestions.
Mario Kart: My favorite GC multiplayer game, which basically makes it my favorite GC game overall. Grant and I played through all of the courses together and have enjoyed many, many hours of racing against Jess, Cy, and Nick.
Donkey Kong Jungle Beat: One of the most creative, colorful, fun games I’ve played, with its own innovative controller. Use the bongos to navigate DK through beautiful 2D environments.
Legend of Zelda: Four Swords Adventures: A 2D Zelda in the style of A Link to the Past, but with four players. Solving puzzles together is great, but making platforms disappear while your teammates are standing on them is much, much better. It isn’t easy to track down four GBAs, but Jen, Grant, Joe and I found it to be well worth the trouble.
Metroid Prime: Captures the wonder of exploring the Metroid world in 3D. You’ve seen it all before: tight controls, huge bosses, collecting upgrades, but now from a first person perspective.
Luigi’s Mansion: Critics slammed it because it wasn’t a Mario platformer at launch. It’s a bit on the easy and short side, but it’s incredibly charming. Watch out for fake doors!
Resident Evil 4: Creepy, beautiful, amazing. I hate survival horror games in general, but loved every second of this game. It helped that Jen would sleep on the couch while I played until 5am. One of the best games I’ve ever played.
Viewtiful Joe: A 2D beat-em-up that looks like a comic book and oozes energy and style. VJ2 is more of the same, so you really only need to pick up one.
Soul Calibur 2: My favorite fighting game, by far. Great controls and a ridiculous number of moves. The GC version features Link, who can annoy people with his bombs, boomerangs, and arrows.
THPS4: Grant and I were pretty much addicted to THPS3 and THPS4, which we liked a lot better than THUG. A lot of fun goals and a rewarding combo/trick system.
Eternal Darkness: Play as characters scattered across the world and through several hundred years of history united by an intriguing story.
Super Smash Bros. Melee: This would have been higher up if we had adopted this as our multiplayer game of choice over MK.
Star Wars Rogue Leader: Challenging, compelling missions featuring rebel spacecraft.
Pikmin 2: I never really got into Pikmin but Cy and Grant did. Fun RTS-style gameplay featuring cute little colored aliens.
Super Monkey Ball: Control monkeys in balls through a series of courses. This game demands ridiculous manual dexterity, which Matt certainly possesses.
Paper Mario: A fun RPG bursting with creativity and fun graphical effects based on the premise that everything in the world is constructed of paper cutouts.
Ikaruga: Shooter nirvana. The core gameplay element (aside from millions of bullets and enemy ships) is being able to switch the color of your ship (which is invulnerable to bullets of the same color).
Look out for sales and pick up a few of these games to keep you busy until the release of Super Mario Galaxy and Metroid Prime 3.
Overall, it’s been a great run for the GameCube. It may not have been the most popular or most powerful console, but it certainly had its own charm. Writing this brought back a lot of fond memories of living with the guys of 21r and gaming sessions here at home. Like the time I lightning’ed Jess off of Rainbow Road a few weeks ago – being able to see each other’s screens does have its advantages.
[Update: I need to thank John, who left me all of his GameCube games when he went to grad school, and Jen, who buys me new games and doesn't make me feel guilty for playing them. And also my friends, because playing alone just isn't as fun.]
I downloaded Firefox 2.0 earlier today but I guess I wasn’t supposed to. I’ll know better next time. The official release is tomorrow.
Recently I read a blog entry by Adrian Holovaty on how newspaper publishers should focus on providing news in a somewhat structured form instead of plain text blobs so that it can be analyzed in bulk, mashed-up, and repurposed. The entire entry screams Semantic Web, as a few of the commenters pointed out. So here’s another Semantic Web daydream, like the ones put forth by Lee (in law) and Elias (in everyday life).
Modeling data: In the article Holovaty mentions several types of articles that have a specific structure (like wedding announcements, obituaries, etc.). It’s not difficult to imagine the design for a traditional database system for storing this. For example, consider two tables: one containing data about articles (date published, title) and one containing data about people (name, contact information, etc.). To link articles to people (a many-to-many relationship), we need to create a third table with two columns: one to hold an article ID and one to hold a person ID. Every row in this table would represent a link between and article and a person. Of course, articles and people can be linked in several ways; some possibilities include the person as a major character, minor character, editor, writer, or interviewee. We could create additional tables, one per type of relationship, or we could add a third column to our join table and keep track of the relationship between article and person.
Of course, it gets even more complicated when you realize that each type of article needs its own table since it has its own set of defining traits. And perhaps you want a table of places (name, street address, latitude, longitude). All of these things need to linked together.
Our previously mentioned table with three columns (article, person, and relationship) is a shadow of the Semantic Web. This particular table is quite limited because the article and person are identified by an integer id that is only unique to those tables. In RDF, the core Semantic Web standard, data is expressed as a set of subject-predicate-object triples. In this example, (subject, predicate, object) = (article, relationship, person). Globally unique, resolvable URIs are used instead of integers to identify entities (called resources, the R in RDF), and they are also used to identify predicates.
Now, adding a new relationship between two resources is easy – just pick a predicate to link them up and add the new (s, p, o) triple. Resources can also be linked to literal data like strings and numbers so any data object can be modelled. There’s no more jumping through data modeling hoops. Because it’s so easy to model and create data, there’s going to be a lot more of it and it will be more descriptive.
Analyzing the data with SPARQL: Using a traditional system, you’d have to spend some time designing a data access API which probably would not be as expressive as you would like. It would also be quite brittle; changes to your data schema would need to be bubbled up to the access API. Opening up RDF data via a SPARQL endpoint would give users a powerful tool to analyze news – instead of being limited to restrictive APIs they are allowed to freely explore the data. And because the data is encoded in RDF, following the relationships between different resources is a trivial matter (that doesn’t involve joining three database tables). Assuming the appropriate triples had been encoded, you could write the following queries: “Find all articles from 2006 mentioning Microsoft that quote Sam Palmisano” and “Find recaps of Laker games where they won by three or fewer points.” Note that both of these queries aren’t easy to do via a text search, but are quite straightforward in SPARQL.
Getting data into the Semantic Web: One of the problems Holovaty cites is that journalists are resistant to change. Fortunately, research on semantic wikis (like Semantic Mediawiki) should lead to some interesting and intuitive text-based systems for writing prose and entering the relevant RDF triples in a simple manner.
It’s fun to do these thought experiments, and they go a long way towards convincing us that we’re onto something here. A system like this would be relatively easy to maintain and provide a great service for analyzing current events and mashing them up.
On Thursday I brought one of these into work. It was finished off pretty quickly, and some people asked for the recipe. I originally got it from Lee several years ago, who got it from his mom.
Preheat the oven to 350 degrees F.
12 oz. (or more) semisweet chocolate chips
1/4 cp. sugar
1 tsp. cinnamon (liberal)
Combine. Set Aside.
1 cp. sour cream
1 tsp. baking soda
Mix. Set Aside.
2 cps. sifted flour
1/2 tsp. baking powder
Mix. Set Aside.
1/2 cp. margarine, softened (or butter)
1 cp. sugar
1/2 tsp. vanilla
In mixing bowl, cream the margarine. Gradually beat in cup of sugar. Add eggs. Add vanilla.
Add flour mixture and sour cream mixture alternately, beginning and ending with the flour mixture. Mix thoroughly to get batter.
Spray a 9 inch square pan.
Spoon half the batter into pan evenly.
Sprinkle half the chocolate chips over the batter, and half the cinnamon/sugar mixture.
Carefully cover with the rest of the batter, and then the rest of the chips and cinnamon/sugar mixture.
Bake for 45 minutes, or until a toothpick inserted into the middle comes out clean (or almost clean – some people prefer a slightly undercooked middle).
Normally I don’t measure out the chocolate chips, sugar, and cinnamon that end up in the middle and on top of the cake. I tend to go easy on sprinkling the sugar because it’s already a very sweet cake. I’ve made this many times over the years, and every time it has resulted in a delicious, moist cake.
Over the past few weeks several people have asked me about the significance of RSS. In short, it’s one of several standards that will change the way that you get content from the Web. Instead of you having to periodically visit various sites to look for the latest content, sites publish feeds that are aggregated by feed readers, and you simply go to the feed reader to view new content from all of the sites. A feed is structured data that describes (and usually contains) recently posted content – it is a list of items (called entries), each of which usually has a title, author, publishing time, content, and some other data. RSS is one format for this structured content, Atom is another. Feeds and feed readers save you the trouble of having to poll sites for new content, and it also ensures that you won’t miss anything on a high-volume site or lose interest in a low-volume site. More and more sites are syndicating their content, but not enough people are taking advantage of this.
Feed readers frequently request feeds from the sites that you have told them about, keeping track of new content since the last time you read. You can tell your reader to start tracking a feed by finding a feed’s URL (usually something that has some combination of
xml) and giving it to the reader directly. This is annoying because you have to deal with finding, cutting, and pasting. Some feeds have reader-specific buttons next to them to allow you to subscribe using your particular reader, but this isn’t always something you can count on.
I’ve found that the best way to subscribe to new feeds is with a bookmarklet in my bookmarks toolbar. When I’m on a site that I’d like to subscribe to, I click the bookmarklet. This sends a request to my feed reader’s site that includes the URL of the current page. The reader can then analyze the page to find links to available feeds associated with the page (which were put there by a developer). If you look around on your feed reader’s site you should be able to find something that says “drag this to your bookmark toolbar to create a bookmarklet.” (I’ve also included the Bloglines and Google Reader ones at the bottom of this post.) Here’s my toolbar, which includes bookmarklets for Google Reader and Bloglines (and two other bookmarklets for posting links to del.icio.us and dogear).
Note that if the site’s developer was lazy and didn’t include links to the feed in the page, you’ll have to find it yourself. Look for “feed”, “rss”, “syndicate”, “atom”, or the pretty orange icon. Once you find it you will have to copy the URL of the feed and paste it into your reader. Of course, it’s also possible that the page doesn’t have any feeds associated with it at all, in which case you’re stuck checking back every so often the old-fashioned way.
Some of the more popular web-based feed readers:
- Bloglines: What I currently use. It’s simple. The main drawback is a 200-post limit on every feed. Fortunately I don’t go on vacation very often so the feeds don’t have a chance to accumulate.
- Google Reader: What I’m trying out for a few weeks. Also web-based, doesn’t have the 200-post limit, plus I’ve seen some good reviews for it. My two nitpicky complaints so far deal with subscribing to feeds: 1) when links to multiple feeds are present in a page, the GR bookmarklet doesn’t let me choose which feed I want to subscribe to and 2) When I subscribe a feed it doesn’t let me put it in a folder – I can only do that later when I’m organizing my feeds.
- Netvibes: I’ve never tried it before, but Grant loves it. The UI looks pretty snazzy.
There are also some other non-web-based feed readers, like RSS Bandit, which Lee uses. I’d recommend setting up an account with a few of them and seeing which one feels the best.
So, that should be enough to get you started. Subscribe to this blog’s feed. Then just surf as you normally would, but subscribe to sites using your bookmarklet. In a few days you’ll be letting the syndicated content come to you. And you’ll probably be addicted to your feed reader. The technology is definitely mature and prevalent enough to be useful to just about everyone. For example, my non-techy wife uses Bloglines to track all of those great celebrity gossip blogs. Isn’t technology great?
For you feed power-users, feel free to jump in with comments about your favorite tools and blogs.
[Bookmarklets (drag to your bookmarks toolbar):
Bloglines | Google Reader]
« Previous entries Next Page » Next Page »