Leaving the Ocean Unboiled

I’ve been wandering a bit about intellectually. There was a method to this madness. Here was my plan for profit:

  1. Semiotics — Separate items and their meanings. Rather than considering a song a discrete thing that a user has a preference for, think of it as a complex symbol that has meaning for a user.
  2. Memetics — Examine shared cultural myths as philosophies of human nature and argue that the process guiding their specification is the same as the one driving philosophies about the world toward sciences.
  3. Preference as Conditioning — Distinguish between cultural symbols (guys wear pants) and simple symbols (the sun meaning warmth), argue that music communicates both types and that a unified perspective of messages can incorporate both.
  4. Hidden Markov Models — Posit that preference arises from the conditioning of a relatively small number of elements. Attempt to use patterns in the expressed preferences to guess the layout of this hidden network. Introduce the concept of ego as a state maintenance function on a stateless network.
  5. Vector Distance — Come up with some sort of unified way to train a Markov model on cepestral coefficients, tags and lyrics and use the weights of the nodes as a vector for computing user similarity (or, by examining the networks of the users liking a certain thing, compute a vector representing the messages communicated by a complex sign).

So, kinda out there as an idea. I didn’t really mean for it to come out quite that strangely. The task I was given was, “come up with a collaborative filter.” Recall that my definition of “collaborative filter” is pretty amorphous. I’ve read several papers on collaborative filtering, but none of them is particularly explicit. The definition I got from wikipedia was:

Collaborative Filtering — The process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc.

I started thinking about tags and lyrics and audio and social statements — the meaning of music — and I ended up with a complex layered model.

There are two main problems with what I have been doing:

  • My understanding of the words “collaborative filter” didn’t really line up with the understanding of the people who told me to make a collaborative filter. They sent me off to get something resembling a teddy bear and I came back with an octopus stuffed in a venus flytrap. I don’t know that I would say that there was a sense of disapproval as much as one of confusion.
  • I am midway through week five of thirteen. It would likely take me another week, at least, to simply lay the intellectual framework for what could end up being completely unworkable or conceptually flawed idea. The only way to determine the ideas viability is to build the system. While the idea may be a good one, it is completely time inappropriate for this internship.

So, now I’ve written it down, and maybe I’ll make it back to it at some point in the future. For right now I have some better defined (and achievable) tasks. Specifically to compare two methods of determining artist similarity:

  • Document Similarity — Use cosine distance document similarity with the artists as documents and the users as terms weighted by the number of times that they have listened. Use tf–idf to normalize the influence of users.
  • Item to Item Similarity — Use the method described in Linden’s item-to-item collaborative filtering to generate similarity. Still in the process of figuring out exactly what that means…

Within a ten minute span I heard the phrase “don’t try to boil the ocean” from two separate people. The basic idea being there is an ounce of gold per cubic kilometer of seawater, and were I but able to boil off all the water, I would be a rich man. The tractability and cost/benefit ratio are a problem, however.

That this came up as I was discussing my plans was coincidental, I’m sure. ☺

Leave a Reply

Your email address will not be published. Required fields are marked *