Archive for recommendations
Part of the video shows Brother Ali by the southeast wall of the Jefferson Memorial. That inscription reads:
I am not an advocate for frequent changes in laws and constitutions. But laws and institutions must go hand in hand with the progress of the human mind. As that becomes more developed, anabolics more enlightened, ambulance as new discoveries are made, new truths discovered and manners and opinions change, with the change of circumstances, institutions must advance also to keep pace with the times. We might as well require a man to wear still the coat which fitted him when a boy as civilized society to remain ever under the regimen of their barbarous ancestors.
I’m home again after my time in Boston with Project Aura. As I look back over my summer I am certainly nowhere near where I expected to be at the outset.
One part of the difference is simply that once I took the concrete step of saying “I don’t like robots and am going to actively pursue finding something I really do like,” it shifted how I see the world significantly. That process has largely been internal, but the external experience of working for Sun and being an intern was also significantly different than I expected it to be.
HR sent around an intern survey to ask us about how much we enjoyed our work experience and how we would rate the organization and whatnot. I filled in the blanks and was not looking to make any serious commentary. At the end, however, the survey wouldn’t let me finish without putting something into the blank for “What would you recommend that management do to make this a better place to work?”
The problem is the question is not a simple one to answer, so as is my wont, I gave them likely far more information than they wanted:
I’ve been thinking about the upcoming end of my internship and heading back to school and the direction of my life and my upcoming marriage and my health and my meditation practice and after a bit I think I burned out the clutch on my brain.
Instead of the 800 important things I had on my plate for the evening, I spent the last five hours reformatting chunks of a 600 page compendium of interesting literature.
I’ve been attempting to thieve some background reading on sociology to get ready for the ASA meeting, but it turns out that the thieves have tens of thousands of books of which nines of thousands are science fiction.
After screwing around for a while, I happened upon a reading list on Scribd (which has about as many books as the IRC channel) that is a combination of book lists from Harvard, Oxfam and a couple other places. The rote cleaning and proofing turned out to be much more relaxing than actually dealing with stuff I needed to do.
My hours of cleaning got the XML to validate, but not much more. Of the couple sections I did finish, I thought my many fecund friends might enjoy the selection of children’s literature. (The adventurous might also be interested in the ALA‘s 100 most challenged books.)
There’s about 60 suggestions total, so I’ll just include a few as a sample:
Item to item similarity is a method popularized by Amazon for computing the similarity of items in its catalog. The reasoning is that item similarities are more static than user similarities and so in situations where finding the similarity requires extensive computation they have an advantage in robustness to infrequent updates.
I’ve been considering tags and what exactly it means when I “tag” a song. It has a different meaning than rating. I think I have an idea of how to design a collaborative filter using tags, but I lack the vocabulary to really work out the idea.
I think the terms I need exist in the field of semiotics. This post is to define them so I can use them. To be precise, this is a combination of selected definitions with some additional interpretation. Semiotics is large, complex and controversial, and this is in no way authoritative.
Semotics attempts define a terminology to take complex inferences underlying interactions and make them explicit. The primary thesis is that interactions are significantly more complex than they seem at first glance, and as a result semtiotic writings frequently end up taking something seemingly simple and describing it in excruciating detail.
The basic building block of semiotics is the “sign:”
One of the arguments put forth in the Herlocker survey is that if you ask a person to rate a song several times over the course of a few months, they are highly unlikely to give the same answer every time.
They describe this as a “natural variability” in human preference that perhaps represents a hard limit to how effective recommender systems can be.
While I do agree that preference is the product of a chaotic system influenced by variables many of which are unavailable to the computer system, it is true that much of the variability is encompassed by simple and easy to capture information. For example, I sometimes listen to dub when I’m writing code, because I can ignore it, but it would be completely inappropriate for working out. If the computer knew the types of music I liked while coding it could do a better job of pulling stuff for that category.
François is working on stuff to address the problem by allowing an explicit weighting of tags. Paul mentioned automatic characterization of tags such that I might have a “coding music” tag that is recognized as being situational rather than genre or mood, and specifically that the computer will figure out the category of that tag rather than me specifying it.
Another tact that I think will ultimately be necessary is to model preference as a time series characteristic rather than something static. What I like today is simply not the same thing that I will like tomorrow. The plasticity of the mind is undeniable (though there is certainly a neophyte / neophobe continuum along which most people lie).
I would not at all be surprised if there are characteristics common to songs that I have continued to like over the course of years and other characteristics common to songs that I liked for bit but have fallen out of favor. The changes are not just noise, they are important predictive data.
Maybe I’ll tackle that. Right after, I manage to write a baby collaborative filter. Who am I to let a complete lack of knowledge prevent me from doing something. ☺
Hoare’s Dictum: Premature optimization is the root of all evil.
— C.A.R. Hoare
I’ve been spending the last couple days reading about collaborative filters again. I got sidetracked for a bit trying to load some data into the distributed data store. I’ve been spending the last couple days wandering through papers about collaborative filtering. In particular Herlocker’s Evaluating Collaborative Filtering Recommender Systems is an excellent summary of the issues surrounding writing a collaborative filter.
One of the things I’ve been realizing is I simply don’t know enough about the field to design a general purpose framework. I’ve looked at the structure of systems like Taste, Cofi and CoFE, but I don’t really have the background other than in a broad systems design sense to evaluate what they’ve done.
The reason I’ve been thinking about Hoare’s Dictum is the ideal computer program would run instantaneously using no resources and do everything. Premature optimization is frequently discussed only in terms of execution time or resource allocation, but if I attempt to create a general model for collaborative filtering before I really understand the field I am optimizing the axis of flexibility before I really have the conceptual background to do so.
I’m a programmer. The way I get a conceptual background in something is to write a program. I’m not going to say I’m going to write a throwaway program (since people debate that most throwaway programs aren’t [thrown away]), but secretly that’s what I want.
Because I’m not shooting for a general framework and rather a specific program for the purpose of learning, it allows me to make a specific demonstrable goal which is much more manageable from a research perspective. So, what is a good program for a collaborative filter?
I figure a good choice would be something that is already being done by Project Aura so that we can compare their text mining techniques with a collaborative filter. So, what is Project Aura doing? Here’s what I know of from my month (has it been a month?!?) here:
- Document Similarity Based Recommendations:
- Tagomendations — Finding artists that are similar to each other based on the tags provided by last.fm. The tags are “cleaned” prior to clustering so that distinctive tags will be more influential.
- User-Based Recommendations — Instead of using the tags as the criteria for determining if artists are similar, use the listeners who enjoy a particular artist.
- Aardvark — Generating blog recommendations based a RSS feed of entries. The RSS feed is generally generated from Google Reader‘s shared items.
- GUI Visualizations — Populating 3D interfaces with similarities of both blogs and music using dimensionality reductions and color and size to represent some of the characteristics.
It seems like a good initial project is simply take a last.fm user profile and recommend an artist to them based on the tag space. Been done before, but that’s fine for a learning project.