June 6th, 2008 — computers, society
June 6th, 2008 — psychology, recommendations
One of the arguments put forth in the Herlocker survey is that if you ask a person to rate a song several times over the course of a few months, they are highly unlikely to give the same answer every time.
They describe this as a “natural variability” in human preference that perhaps represents a hard limit to how effective recommender systems can be.
While I do agree that preference is the product of a chaotic system influenced by variables many of which are unavailable to the computer system, it is true that much of the variability is encompassed by simple and easy to capture information. For example, I sometimes listen to dub when I’m writing code, because I can ignore it, but it would be completely inappropriate for working out. If the computer knew the types of music I liked while coding it could do a better job of pulling stuff for that category.
François is working on stuff to address the problem by allowing an explicit weighting of tags. Paul mentioned automatic characterization of tags such that I might have a “coding music” tag that is recognized as being situational rather than genre or mood, and specifically that the computer will figure out the category of that tag rather than me specifying it.
Another tact that I think will ultimately be necessary is to model preference as a time series characteristic rather than something static. What I like today is simply not the same thing that I will like tomorrow. The plasticity of the mind is undeniable (though there is certainly a neophyte / neophobe continuum along which most people lie).
I would not at all be surprised if there are characteristics common to songs that I have continued to like over the course of years and other characteristics common to songs that I liked for bit but have fallen out of favor. The changes are not just noise, they are important predictive data.
Maybe I’ll tackle that. Right after, I manage to write a baby collaborative filter. Who am I to let a complete lack of knowledge prevent me from doing something. ☺
June 6th, 2008 — computers, recommendations
Hoare’s Dictum: Premature optimization is the root of all evil.
— C.A.R. Hoare
I’ve been spending the last couple days reading about collaborative filters again. I got sidetracked for a bit trying to load some data into the distributed data store. I’ve been spending the last couple days wandering through papers about collaborative filtering. In particular Herlocker’s Evaluating Collaborative Filtering Recommender Systems is an excellent summary of the issues surrounding writing a collaborative filter.
One of the things I’ve been realizing is I simply don’t know enough about the field to design a general purpose framework. I’ve looked at the structure of systems like Taste, Cofi and CoFE, but I don’t really have the background other than in a broad systems design sense to evaluate what they’ve done.
The reason I’ve been thinking about Hoare’s Dictum is the ideal computer program would run instantaneously using no resources and do everything. Premature optimization is frequently discussed only in terms of execution time or resource allocation, but if I attempt to create a general model for collaborative filtering before I really understand the field I am optimizing the axis of flexibility before I really have the conceptual background to do so.
I’m a programmer. The way I get a conceptual background in something is to write a program. I’m not going to say I’m going to write a throwaway program (since people debate that most throwaway programs aren’t [thrown away]), but secretly that’s what I want.
Because I’m not shooting for a general framework and rather a specific program for the purpose of learning, it allows me to make a specific demonstrable goal which is much more manageable from a research perspective. So, what is a good program for a collaborative filter?
I figure a good choice would be something that is already being done by Project Aura so that we can compare their text mining techniques with a collaborative filter. So, what is Project Aura doing? Here’s what I know of from my month (has it been a month?!?) here:
- Document Similarity Based Recommendations:
- Tagomendations — Finding artists that are similar to each other based on the tags provided by last.fm. The tags are “cleaned” prior to clustering so that distinctive tags will be more influential.
- User-Based Recommendations — Instead of using the tags as the criteria for determining if artists are similar, use the listeners who enjoy a particular artist.
- Aardvark — Generating blog recommendations based a RSS feed of entries. The RSS feed is generally generated from Google Reader’s shared items.
- Tastebroker:
- GUI Visualizations — Populating 3D interfaces with similarities of both blogs and music using dimensionality reductions and color and size to represent some of the characteristics.
It seems like a good initial project is simply take a last.fm user profile and recommend an artist to them based on the tag space. Been done before, but that’s fine for a learning project.
June 6th, 2008 — professional, tinkering
This last week has been less than stellar for me productivity-wise. The issues have been, to some extent, systemic.
One of my biggest problems has been a simple one of biology. I do pretty well in the morning. Getting settled in, catching up on e-mail, doing some coding… I cruise along until around lunchtime when I hit a lull and, if I’m lucky, end up in a stupor staring off into the space about two feet behind my monitor. Equally unproductive, but slightly less discrete is lolled back in my chair snoring slightly and drooling on myself.
I’ve tried various methods for combating this phenomena. It isn’t just that I’m worried someone is going to catch me, it’s how pointless it is. If I’m going to be productive, I want to be productive. If I’m going to rest, I want to rest. What I’m doing with this half-breed amalgam is the worst of both worlds — being unproductive in a really uncomfortable way.
I thought for a while that it might be the act of eating. Maybe energy necessary for running my brain was being redirected to my stomach, so if I reduce resources going to the stomach, I can keep the brain going stronger. This line of reasoning led to the not terribly successful experiments in boosting energy levels by not eating.
I did have some luck with the grazing pattern where I make a sandwich and eat it a bite at a time over the course of five or six hours. (A really good diet strategy, fyi. It significantly reduced my overall caloric intake.) Part of the reason I’m here at Sun for the summer is the people I’m around. When I go down to lunch I get to hear interesting people espouse unique ideas, and I think it might seem a bit odd if I just came to lunch and took two bites out of my sandwich in half an hour.
Grazing wasn’t a complete solution in any case. The central issue is thinking is taxing. If I was loading hay all day, I wouldn’t try to come up with some magical plan whereby at the end of the day I’m not tired. Because the work of an engineer goes on inside our heads, we are more apt to assume that we can simply change the ramifications of doing it with moral resolve. Just because the action isn’t visible however doesn’t mean it is less real.
The solution I’ve been relying on for the last week has been a tasty one: chocolate covered espresso beans. Coffee might taste like dirty water, but the magical font of Goodness that is chocolate manages to make it delicious. Honestly, I think if I stuck with it for long enough I could condition myself to enjoy the taste of coffee (much like I now enjoy beer, which pretty much every first-time drinker agrees tastes like horse pee).
The problem isn’t really solved though. I do manage to stay conscious through the afternoon, but my focus is sharp right after a shot of caffeine and sugar, and drops off again with increasing rapidity. The big problem is there isn’t such a thing as a free lunch — three nights this week I got home around 6:00 and was asleep by 7:00 only to wake up at 2am unable to get back to sleep. (And being up from 2-5am leaves one with the reactive efficiency of roadkill the following day.)
I suppose I could view making it through the work day as a success and say that it is unprofessional of me to consider my discomfort at home when structuring my schedule, but I’m pretty sure that is the express train to getting your soul sucked out. (Something that would ultimately not only be bad for me, but for Sun as well.)
I figure though that my specialty is systemization, and if there is solution to be had, I can find it. I’ve got a more formal set of ideas on the process for doing that, but for the sake of brevity I’ll not go into all that. I’ll just mention that the plan for the next week is to work 7am-2pm, go home, probably nap and then work another hour or so in the evening. I’ve not got the criteria yet for doing a more formal evaluation, but I figure I’ll at least get a sense of how it leaves me feeling.