What’s Going On At Sun

Hoare’s Dictum: Premature optimization is the root of all evil.

— C.A.R. Hoare

I’ve been spending the last couple days reading about collaborative filters again. I got sidetracked for a bit trying to load some data into the distributed data store. I’ve been spending the last couple days wandering through papers about collaborative filtering. In particular Herlocker’s Evaluating Collaborative Filtering Recommender Systems is an excellent summary of the issues surrounding writing a collaborative filter.

One of the things I’ve been realizing is I simply don’t know enough about the field to design a general purpose framework. I’ve looked at the structure of systems like Taste, Cofi and CoFE, but I don’t really have the background other than in a broad systems design sense to evaluate what they’ve done.

The reason I’ve been thinking about Hoare’s Dictum is the ideal computer program would run instantaneously using no resources and do everything. Premature optimization is frequently discussed only in terms of execution time or resource allocation, but if I attempt to create a general model for collaborative filtering before I really understand the field I am optimizing the axis of flexibility before I really have the conceptual background to do so.

I’m a programmer. The way I get a conceptual background in something is to write a program. I’m not going to say I’m going to write a throwaway program (since people debate that most throwaway programs aren’t [thrown away]), but secretly that’s what I want.

Because I’m not shooting for a general framework and rather a specific program for the purpose of learning, it allows me to make a specific demonstrable goal which is much more manageable from a research perspective. So, what is a good program for a collaborative filter?

I figure a good choice would be something that is already being done by Project Aura so that we can compare their text mining techniques with a collaborative filter. So, what is Project Aura doing? Here’s what I know of from my month (has it been a month?!?) here:

  • Document Similarity Based Recommendations:
    • Tagomendations — Finding artists that are similar to each other based on the tags provided by last.fm. The tags are “cleaned” prior to clustering so that distinctive tags will be more influential.
    • User-Based Recommendations — Instead of using the tags as the criteria for determining if artists are similar, use the listeners who enjoy a particular artist.
  • Aardvark — Generating blog recommendations based a RSS feed of entries. The RSS feed is generally generated from Google Reader‘s shared items.
  • Tastebroker:
  • GUI Visualizations — Populating 3D interfaces with similarities of both blogs and music using dimensionality reductions and color and size to represent some of the characteristics.

It seems like a good initial project is simply take a last.fm user profile and recommend an artist to them based on the tag space. Been done before, but that’s fine for a learning project.

Leave a Reply

Your email address will not be published. Required fields are marked *