03.24.07

Citizen Schools, Week 4

Posted in community, technology at 8:56 am by wingerz

citizenschools.jpg

At this point we look forward to the surprises that every Monday brings. This past week, we got two brand new students, and a student who had supposedly been dismissed from the program for behavior issues was back in class. We shuffled our pre-arranged seating to accommodate the changes as students were finding their seats. The new students were great – they picked up the material quickly and worked well together.

The class went reasonably well – we covered boolean functions (AND and OR). Students are still struggling with writing functions; it’s a pretty big leap in abstraction that they haven’t quite mastered yet. Fortunately, disruptions were kept to a minimum so that we could focus on teaching the content.

I think that I’ve become more comfortable with the idea of not trying to befriend all of the students. I gave out my first strike of the year, and I’ve definitely tried to not humor students who are obviously trying to push the behavior envelope.

03.12.07

Slowly (but surely) getting to WOW

Posted in community, technology at 11:23 pm by wingerz

monster.jpg

I’ve always had a lot of respect for teachers, but the last three weeks have boosted that respect to stratospheric heights. We’ve done the first three lessons of our Citizen Schools Scheme curriculum, covering numbers, strings, graphics, and booleans. The students on the whole are very bright and eager to learn, but challenges are posed by the non-ideal classroom setup (it’s the school’s “library”/computer lab), frequently disruptive students (hitting each other and calling each other names), and a challenging curriculum full of abstract concepts.

We try to address challenges from week to week. For example, our seating strategy has changed (and hopefully improved). During week 1, we left the chairs more or less as they were in the room, so of course students came in and sat as far apart as possible from each other. Before week 2 started, we laid out the chairs beforehand, two chairs to a computer for partner work, but we made the mistake of letting students choose their seats and partners (leading to a lot of “Well I don’t want to work with X!”). Today we put students’ notebooks where we wanted them to sit but created a bad pairing, leading to day-long trouble. Now we’ve figured out a few rules about how to best arrange the students to maximize cooperation and learning while minimizing disruptions and are eager to try them in Week 4.

Behavior problems aside, it’s hard to keep students engaged, especially when it isn’t clear how (+ 1 2) is related to the video games they play at home. In trying to make the link between simple code and games more tangible, today’s lesson included a screenshot from StarFox to introduce simple graphics and a monster from Doom 2 to introduce booleans. After all, who cares about booleans – unless they’re the only thing saving you from this wretched beast. The images definitely captured everyone’s attention, so it’s something that we’ll be including with every lesson. I was surprised that one of the students recognized both games since they were way before his time.

While it has been frustrating at times, we’re always eager to get back into the classroom to try out new techniques.The Citizen Schools staff (including Emmanuel, who co-taught with me while Lee was away this week) have been amazing, spending a lot of time exchanging ideas over the phone and in emails. We’ve learned to roll with the punches and have been embracing both the successes and failures (opportunities) in the classroom.

And to think, we have it so easy compared to full-time teachers – we only teach 90 minutes a week from a great curriculum prepared by someone else and an in-class Team Leader to help with discipline.

03.01.07

Using Solvent to extract data from structured pages

Posted in development, semantic web, technology at 10:06 am by wingerz

solvent_logo.png

I’ve put together a short tutorial on Solvent, a very nice web page parsing utility. It is still a little rough around the edges, but I wanted to throw it out there and continue working on it since there isn’t a whole lot of existing documentation.

02.19.07

Looking at game data through Many Eyes

Posted in development, games, semantic web, technology, web at 1:53 pm by wingerz

manyeyes.jpg

Yesterday I blogged about creating an Exhibit for a list of the 100 best-selling games of 2006. Exhibit is great for looking at how data items fall into categories, but it’s not as good for visualizing quantities. IBM’s own Many Eyes provides several very nice visualization tools (Swivel allows data upload and visualization as well, but I am not that familiar with it, and it looks like someone beat me to it).

I uploaded my text data and created a few quick visualizations.
Review score vs. sales. As people have already remarked, a well-reviewed game won’t necessarily sell that well. Alas.
Release month. This is a recreation of one of the charts that appeared in the original Next Generation article. Summer is always kind of quiet and things get more exciting towards the holidays.
Categorization treemap. This is one of my favorite data viewers. Each game is a rectangle. The area is the number of sales. You can drag the labels (next to “Treemap organization”) in order to redraw the treemap. Drag “publisher” all the way to the left to see why EA cranks out annual releases of their sports titles. Drag “genre” over to see the portion of sales that are sports titles or games based on licenses. Dragging “systems” over doesn’t give you a great view of the data because the original data wasn’t all that clean and Many Eyes doesn’t seem to handle multi-value properties. I’m not sure why it’s showing a quote about the game by default instead of the title.

My other favorite data viewer (that I was not able to use) is the Stacked Graph viewer, made popular by the Baby Name Voyager.

One last note: I wasn’t allowed to edit the visualizations after I created them, so keep that in mind as you think of titles and tags for them.

Popular video games of 2006 Exhibit

Posted in development, games, semantic web, technology, web at 12:29 am by wingerz

gameexhibit.jpg

A few weeks ago I came across an article about the top-selling games of 2006. There’s some analysis, then a list of the top 100 games spread across 10 web pages (starting, of course, with games ranked 100 to 91). Unfortunately, there isn’t a great way to really take a close look at the data. For example, I really wanted to see some Nintendo-specific analysis.

The data was screaming to be let out, so I scraped it and put it into an Exhibit. It was not a quick and easy process. I am quite certain that the HTML was hand-coded – the quotes start with “, ", |, or nothing at all, and some of the other elements are mixed up.The game platforms are not very well specified so I may need to go through and clean it up later; for this reason the portable/homeconsole sections are not 100% accurate.

Anyhow, now I have a perl Data::Dumper file, tab-delimited text file, and a JSON representation. Will probably upload the text file to Many Eyes for kicks.

02.12.07

Becoming a Citizen Teacher

Posted in community, technology at 11:49 pm by wingerz

citizenschools.jpg

Citizen Schools is a nonprofit middle school after-school program that was started in Boston 12 years ago. Since then it has grown to include 2,000 students and 24 campuses. The core of the program relies on Citizen Teachers: these are volunteers who come in to teach a class, which can be on just about anything: past classes have included finance, photography, oceanography, astronomy, and quilting. The class is taught in 10 once-a-week sessions that are structured to reach a “Wow!”: a tangible end goal that the students can be proud of assembling. All of this is fit into a structure that teaches students important life skills and values.

This semester Lee and I are teaching a class on computer programming. It’s based on an existing curriculum composed by Emmanuel Schanzer that teaches Scheme programming with the Wow! being a computer game. So far we’ve been very impressed by the support for volunteers: we’ve attended two hours of Citizen Schools training (where we learned about the philosophy behind Citizen Schools), two mornings of curriculum training with Emmanuel (to familiarize ourselves with the curriculum and to discuss teaching strategies), and a meeting with our CS campus director. We were assigned to a Team Leader, who is present in all of the classes to handle classroom discipline and help us with our weekly lesson plans.

Today we had our first interaction with students at the school. All of the Citizen Teachers got an opportunity to pitch their classes (called apprenticeships – there is a lot of CS vocabulary) to the students. We delivered our pitch to three different groups of students, getting a great response from each group. After all, who doesn’t like video games? Of course, we tried to highlight other aspects of our apprenticeship. We emphasized the development of problem solving skills by giving the students a situation puzzle. Overall it was successful except for the class where someone blurted out the answer about 30 seconds into the exercise. During the closing ceremonies, “Fred,” the character in our puzzle, was given one of two student shout-outs, much to our delight.

It was a lot of fun to be in the classroom again. The students are in for quite a surprise when they find out that most of the apprenticeship will be spent playing with parentheses, doing lots of math, and paying close attention to detail. Hopefully we can keep them entertained and teach them something while we’re at it. Class starts on February 26.

02.10.07

Preserving EXIF data with exiftool

Posted in photos, technology at 11:57 pm by wingerz

exiftool.jpg

One of my dreams is to be able to do a good job of tracking photo metadata. I’d love to be able to find pictures based on people, places, and things. I built a simple Ruby on Rails application for our wedding photos. It was a bit cumbersome to enter data, but it was oh-so-helpful when finding pictures of Mom to put together an album. If I were to do it over again I’d do it with a RDF backend (maybe using ActiveRDF). Or better yet, Flickr could go a bit further with their machine tag implementation and allow machine tag queries across all of the metadata they have. Maybe then I’d pay for their service.

Anyhow, I haven’t been great about tracking metadata yet, but it’s always on my mind. One important source of metadata is the EXIF data written into the image file by your camera every time you take a picture. It includes vital information like focal length, aperture, shutter speed, and ISO, along with what camera was used, whether a flash was used, what the image sensor size is, and much more. Photographers (especially budding ones like myself) love to look at these numbers (especially the first four) because they can learn a lot from them. I am hoping to do some analysis in the not-so-distant future to see whether the aperture values and focal lengths of my images will tell me what new lens purchase will give me the greatest satisfaction.

In any case, one problem I’ve noticed is that when I crop my photos in Photoshop (by cutting out of one image and pasting into a new one), all of that wonderful EXIF data is gone. It’s not Photoshop’s fault, but it is annoying to me to lose that data. I poked around on the web for a bit and came across exiftool, a Perl library that has also been wrapped as a Windows executable. It extracts EXIF data from a variety of sources, and even better, you can use it to copy EXIF data from one image to another. Some highlights from an image:


ExifTool Version Number         : 6.74
File Name                       : IMG_4807.jpg
Camera Model Name               : Canon EOS DIGITAL REBEL XT
Exposure Program                : Manual
ISO                             : 200
Shutter Speed Value             : 1/80
Aperture Value                  : 5.0
Max Aperture Value              : 3.5
Flash                           : On
Focal Length                    : 60.0mm
Lens                            : 28.0 - 105.0mm

The full file is 277 lines long. Some of the properties repeat, and others aren’t all that interesting to me.

EXIF data can be replaced one file at a time or in a batch process, so that I can copy data from the original image into the cropped one. The following lines are from exiftool’s help:


    exiftool -tagsfromfile src.crw dst.jpg
         Copy the values of all writable tags from "src.crw" to "dst.jpg",
         writing the information to the preferred groups.

    exiftool -tagsfromfile %d%f.CRW -r -ext JPG dir
         Recursively rewrite all "JPG" images in "dir" with information
         copied from the corresponding "CRW" images in the same directories.

Note that the second call (for batch processing) expects dir to have the same structure as the current directory (which should contain the source images). The filenames also need to match. In a simple case where your source images are all in one directory and your target images are all in another directory (dir), run the following from the directory containing the source images:


> exiftool -tagsfromfile %f.CRW -ext JPG dir

I also came across on online EXIF to RDF extractor, which may come in handy some day.

02.07.07

Remote volunteering with Machine Science

Posted in personal, technology at 10:27 pm by wingerz

machinescience.png

Over the past two years I’ve been working with Machine Science, a small nonprofit based in Cambridge. Their focus is providing materials and a curriculum for middle and high schoolers to learn basic engineering skills by building and programming robots. They distribute the robot kits, which include a breadboard, processor, wires, and some fun components (like motors and thermometers). Their website has a series of easy-to-follow lessons. Development is done in C via an online tool that saves files and compiles source code for download onto the processor.

Machine Science is interested in starting up an online help group where students can post questions. Currently a single teacher is responsible for taking care of an entire class, which can be quite overwhelming for the teacher. We’re looking for some extra people to troubleshoot programming problems. I am wondering if anyone is interested in participating in the pilot. I’m thinking that it’ll be about a 1 hour per week commitment.

In any case, please contact me (wingerz at gmail dot com) if you are interested. If things work out it should be a pretty cool opportunity because 1) you’ll be able to make learning how to program a lot less frustrating for a beginner, 2) you can share your experiences and serve as a role model to budding engineers, and 3) you don’t even have to leave your desk.

02.06.07

Text indexing and query in Boca

Posted in semantic web, technology at 10:14 am by wingerz

slrplucene.png

Just about every computer user is very familiar with and competent in text search. While end users may not be writing custom search queries, they appreciate UIs that allow them to search with more accuracy and precision. Occasionally users want to find something very specific by searching across people’s names, or book titles, or paper abstracts instead of all of the indexed text in a system. Clever keyword searching and luck can only get you part of the way there.

Sleuth, Boca‘s text indexing component, addresses this problem (in the Boca world). We’ve been using it for quite a while. Similar to LARQ, Boca uses Lucene to index string literals when the feature is enabled. We’ve designated a magic predicate for querying the text index with SPARQL and hooked it into Glitter, our wonderfully-named SPARQL engine. So now we can do SPARQL queries with integrated text queries, like “find me people (not airplane components or animal appendages) where the name matches ‘Wing'”:


PREFIX foaf:    
PREFIX boca: 
SELECT ?person ?name
WHERE {
	?person foaf:name ?name .
	?name boca:textmatch "Wing" .
}

This powerful feature allows SPARQL-aware developers to roll their own APIs. It’s easy to whip up a search across the all literals for traditional text search behavior. With a little more work, you can craft more sophisticated searches, like one for authors of a paper that mentions a specific search term in the abstract (say, “march madness”).

For more details on how to set this up, please see our documentation on Boca text indexing.

12.17.06

SPARQL for Flickr: Picture the Possibilities

Posted in photos, semantic web, technology at 2:49 am by wingerz

flickr.jpg

I recently purchased my first DSLR camera. It wasn’t an easy decision, and at some point I was looking for sample photos taken by a non-DSLR under a certain condition (wide aperture). I started with the Flickr Camera Finder. There is so much wonderful data on pages like the list of Canon cameras and the individual camera pages. The data can be viewed in several ways, but it all just leaves me wanting more. Sure, for a particular camera I can search for pictures tagged with “food”, but what if I want to specify photos with a wide aperture that were taken on November 23, 2005?

They’re sitting on a gold mine of data, but the only way to get at it is through the web API (The advanced search is not very powerful). It’s possible to get at some of the EXIF data (photo metadata), but only if you have the ID for a photo; there’s no way to search across all of the images. Even if they managed to implement this particular interface, what if I want to search for photos that satisfy these restrictions that were posted by users within three friend-links of me?

If Flickr slaps a SPARQL endpoint on its data, it opens up all sorts of amazing possibilities. Using API keys, they could allow paid access to the data from photo equipment sellers (and free access to web hackers), who would be able to offer their customers the ability to find pictures taken with particular cameras and lenses and the people who own them (possibly restricting this set of people to friends or foafs). Of course, Flickr could put together a proprietary web API and do this now, but then they would have to code up every new API method request themselves rather than letting data subscribers write their own queries. And SPARQL-able data has the additional benefit of being easier to integrate with other sources.

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »