Archive for April, 2008

Viral Licenses

Matt reminded me of this and I wanted to mention it.

I mentioned I think we will be entering an area where the data that a company has collected about me rivals the actual functional characteristics of their software in a consumers investment in the product.

From a market perspective the best situation would be if the data were open and accessible to whichever competitor wanted to make the strongest product using it. What though is a company’s impetus to give up the competitive advantage of keeping data secret? Calling on companies to be noble enough to support the market is naive and expecting customers to invest in a company simply because it doesn’t lock them is unlikely given my understanding of the current users view of the market. People want stuff to work now and don’t think about the future all that much.

What if you did something like the GPL‘s viral licensing? I have some data and you can use it but only if you agree to give me any changes to that data and any additional data that you collect about this person.

What the structure would look like to support that exchange I’m uncertain. Some APIs and lots of XML I assume, but it’s a potential solution that could work well for the market overall.

Comments (1)

Thin Servers

Real quick because I ought to be doing my psychology homework…

I’ve been considering social feedback in increasing user loyalty and community involvement in online systems.

I was a Pandora listener for a while. I liked Pandora because it gave me interesting songs, but I like a really broad range of types of music and Pandora seemed to narrow for my tastes.

Jango is what I went to next because the interface was better for adding artists to my profile. It seems to stick to a genre for a while and then switch over and do another. I generally like it and I’ve gotten a couple of new songs from it. It gives some community feedback by having stations associated with individuals. You only stay within one person’s recommendations though. Maybe there’s community stations I just haven’t found yet.

Grooveshark seems to be really close, but I’m waiting to actually be approved to use it. I think that they’re dealing with the P2P issues and attempting to avoid the RIAA by being selective in their listenership.

I suspect though that they’re addressing a question that I’ve already had. I have a bunch of songs on my computer at home that I legitimately own. I have playlists there that I would like to have access to (and I would like to be able to create playlists while at work that I could listen to at home). Why, since I own this music, can’t I listen to it?

There’s a Y-Combinator startup that allows streaming of songs. Streaming is legal even to other people so long as only one person is streaming at a time and the group is small.

The idea of thin servers solves four problems:

  1. I own a bunch of songs and I should be allowed to use interfaces that allow queuing and seeking those songs in whatever method I want.
  2. I have an awesome song from a great local band that isn’t in any of the recommender systems.
  3. My band has a great new song that we want other people to potentially have recommended to them.
  4. My huge server is completely swamped by 10,000 people all attempting to listen to music simultaneously.

Recommendation interfaces based around confederations of thin servers increase their potential network issues exponentially, but the state of broadband may well be such that this is a surmountable challenge. Music is nowhere near the bandwidth hog that video is. In exchange for those issues you allow anyone who wants to begin serving music, and you allow individuals access to the greatest source of songs that they are likely to like: their personal collection.

Leave a Comment


I’ve been on the road the last few days working on various projects and making my way slowly (over a week and a half span) to Boston. Being carless, much of my transport and housing has relied on the generosity of others. As Matt carted me from Cookeville to Bristol, we discussed the problem of “griefers.” Griefers being those inevitable folks who, when presented with a system, will try to break it. Bruce Schneier defends many of their characteristics under the auspices of “hackers”.

We were discussing interesting A/B tests you could run if you had a successful recommender system. He mentioned Joel Splosky who writes on building communities in software from time to time. One of the things Joel supposedly does when deleting a post is to delete it for everyone except the original poster. So far as the poster is concerned, nothing has happened.

The idea of building communities online is one I want to explore and I hadn’t really considered the all the axes along which things can vary.

I know that the stuff I write about scientific determinism seems disjoint from the contemplations of recommender systems. The reason I keep coming back to it is the way science works is we design systems in terms of fundamental assumptions. If the assumptions are far enough away from the design they can seem disjoint, but I hold they aren’t.

Consider griefers. The traditional way to see them is as a problem to be eliminated. What they actually are is a population that has energy they are willing to devote to the system. As a population there is likely a trend in the motivations.

Read the rest of this entry »

Comments (2)

Building the Matrix

I made a snippy comment the other day to Jenni (my fiancée who’s doing her Ph.D. in public health at Johns Hopkins) about the difficulty of “soft” sciences like history and psychology versus “hard” sciences like physics and chemistry.

In the ensuing amicable discussion, she effectively proved her primary thesis — I was being an ass. ☺

Soft sciences represent the forefront of human knowledge. The models don’t lack precision because the people making them aren’t bright enough to make them. They lack precision because they’re so amazingly complex that the sum total of human knowledge hasn’t given us a precise model.

Consider the process of immunology. Long long ago we ate whatever the hell we wanted if it didn’t smell too funky. After a while people started to recognize a trend that certain things smell ok, but will still kill you. For example trichinosis can affect seemingly healthy pork. So, we come up with a model where God tells us to not eat pigs.

If you follow a religion that still doesn’t dig on swine, I’m cool with that, but most of the Western world has decided God’s ok with it. I blame bacon, how long could we be expected to hold out against that delicious temptation?

Read the rest of this entry »

Comments (4)

Dictator for a Day

I’ve been talking to Matt about our economics situation. He is quite a bit better informed on the issue and the subject of macroeconomics in general than I am.

He put up a post of his actions as dictator of the United States for a day.

Both of us have issues with the distribution of wealth being unequal. Whereas I say take away the ungodly piles of money (who needs to be paid a billion dollars?) to reduce incentives to be greedy, he, reasonably, asks what happens to that money when not concentrated in the hands of a few people.

To some extent there would simply be less money since much of our problem is a whole lot of the money we theoretically have currently is fictitious. Bankers created it by managing to sell things at far beyond their actual value. Coming to terms with the reality of how much actual value there is in things is an adjustment that has to be felt somewhere.

I think my favorite of his suggestions has to do with an increase in shareholder rights. It would help dilute the concentration of power in the hands of a few so that even if they were inclined to let their greed to get them to do something shady, they can’t.

He also discusses the extent to which the government should do things like bail out Bear Stearns — “privatize profits and publicize losses.” He argues that so long as there’s money to be made someone will come to take their place.

My position is less certain for the same reason that I’m more of a proponent of public policy changes to affect corporate behavior. We are very likely headed into a recession not because of the mortgages or even because of the over leveraging. There has certainly been an over extension beyond real value, but what will make the readjustment painful is the speed at which it is likely to happen.

The reason economists discuss consumer confidence is business is about making things to be consumed. If people are insecure about needing stuff then businesses get insecure about having bunches of stuff on their shelves. If businesses order less stuff then manufacturing needs less people to make stuff and eventually start laying people off. This cycle slows the economy down in a variety of ways.

Note that this starts only tangentially related to actual money. The economy isn’t simply individual workers fighting for their piece of the pie. Particularly in the financial markets, bad behavior doesn’t just risk the livelihoods of the people making bad decisions, as we are seeing right now it can put middle class families on the street without a job.

The public has a right to control how corporations are run, not only because corporations only exist because we decided as a society to make them pseudo-people, but because the system is now intertwined to an extent that their actions affect everyone.

Comments (1)

Eat the Rich

My dad and I have a long standing argument on what exactly would happen if we had a progressive tax system pitched so steeply it was essentially a salary cap.

I argue the moral position that you have no right to have $3 billion at your disposal while there are people who can’t afford to feed their children. He argues the moral position that a person has the right to their earnings. Our country is predicated on free enterprise and by limiting how much a person can make you restrict the invisible hand of capitalism.

Honestly though, it seems like the invisible hand of capitalism has slowly been throttling our country, particularly in the area of executive compensation. I was reading an article today on similarities between the Enron debacle and Merrill Lynch. Both of them were done with knowledge aforethought by executives to protect their compensation.

The same trends are present in the predatory lending practices and over-leveraging of companies that is currently pulling our economy down. The people doing it weren’t simply confused and thought that these people were going to be able to pay their mortgages or that their companies were going to be able to meet their inflated worth. They expected to make huge amounts of money and took steps to make it happen.

I know some people who are starting work in financial services. I had a friend who worked 12+ hour days six days a week at an unpaid internship. Did she love money management so much that she wanted to do that? No, she liked the field, but would have been more than happy to just do it 40 hours a week. She was investing in a future where she might earn a million dollars a year, and that she is sacrificing her youth to get there will affect how attached she is to making it happen.

I was reading Cialdini’s Influence yesterday discuss how taking an economics class will short-circuit your sense of reciprocity.

Recessions frequently follow periods of expanded growth. The economy takes off for a bit, gets a bit over extended and then ratchets back. This recession is different from past recessions in that though the economy overall did see the expected expansion, middle class buying power actually fell. The group that saw the benefits of this expansion was the top 20% who experienced 9% growth.

The invisible hand of capitalism gives to each according to his worth in the market. Robert Skidelsky in “The Moral Vulnerability of Markets” points out that this theoretically means the average CEO is approximately 50,000% more productive than the average worker.

So, you not only need to believe that a person has the moral right to a huge portion of the available wealth, you also need to believe that this system is working in the face of an increasingly convincing argument that when faced with a choice between the public good and stacks of money, most people are going to choose the cash.

So I say tax the hell out of them. I’m sure that out of the tens of thousands of aspiring young investors we can find at least a couple to work for a paltry $500,000 a year.

Comments (1)

Dearest Overlord

Fading are the days when being born the son of a doctor means I’ll be a doctor. Or being born a woman means that I will be a homemaker. The increased freedom is wonderful, but even in this modern world these decisions have consequences and have to be considered in terms of the raw meat that backs behavior: the brain.

From an evolutionary psychology perspective, brains are complex structures with one job: keeping their genetic code in existence. Human brains do this in large part by connecting observations together and recognizing patterns that can be exploited. Maybe I notice that little trees start from around the area where bit trees drop nuts. I connect nuts to trees, abstract things out to other types of seeds and, voila, agriculture. This gives me a definite survival advantage over the squirrels that just see trees and nuts.

The problem though is that there’s simply a limit to how much our gray matter can handle. Studies on working memory have fairly consistently limited us to about 7 ± 2 (2.5 bits) of information in our head at a time.

Read the rest of this entry »

Leave a Comment

Survivor Bias and Recommendations

I have been discussing the idea of Survivor Bias from Taleb’s The Black Swan.

The basic idea of survivor bias is we generally abstract the characterisitics that make up a set only from the members of that set. The unknown component of the analysis is frequently the extent to which those characterisitics were present in elements that didn’t make it into the set.

The example he gave is a researcher who has been tasked with fortifying the planes going out to fight the Nazis in WWII. Simply adding more plating all over will make the planes unreasonably heavy, so he looked at all planes coming back from missions and put plating wherever those planes hadn’t been shot.

It may seem as though you want to take your planes and shore them up in the places they’re getting shot so they’ll be stronger. He realized though that he was working with the set of planes that had been shot in places unimportant enough to take them down.

The basic idea is that you study can’t tell you what caused someone to succeed. It can only probably tell you things that contributed to failure.

And it can only do that if there’s only one factor at play. Imagine that planes shot through one wing have a 5% chance of going down, planes shot through both have a 25% chance of going down and planes shot through both and the tail have a 75% chance of going down. The set of planes you look at are going to have holes in all those places and you might not shore up any one of them. The analysis necessary to see the trends increases expodentially with the number of factors you consider.

One of the themes of The Black Swan is because there are so many factors and our models are so limited there will always be “black swans” — events completely outside of reasonable expectations that change the ways of thinking.

How does this relate to recommendations? Well, I have some info on some songs and the sort of person I think likes them. The cold start problem is what to do when I either have a song I know nothing about or a person I know nothing about. The person is tricky, but with the song I can look at the auditory characterisitics of the song and compare them to other songs a person likes. This guess though is going to potentially be skewed because of survivor bias.

People like music for a variety of reasons: the song at my prom, the song I associate with my first infatuation, a song from a particularly awesome party, a song a sang in a choir — songs that have to do with the moment rather than the actual musical characteristics of the song. Using those songs to guess the characteristics of what I like musically isn’t going to work.

Pragmatically, this issue encourages a system that asks the question “how likely is someone to dislike this song?” as well as “how likely is someone to like this song?” It also reinforces the drive to incorporate external information sources that can help build a profile of the user that is independent from simply their listening profile.

Comments (1)

Blog Recommenders

On the subject of my previous post about economic modeled music recommendations — I think a really good application of this would also be to blogging.

Imagine an app something like Google Reader, but where instead of me manually adding in bunches of feeds by myself, I log in and the program gives me a feed of items I am likely to like.

It’s related to the service that Stumble is doing, but collected in one place and with a more visible data model. Since the entrance to creating blog entries is lower than with music, you’d have a new factor. You’d have your audience, your exemplars and, if the application was popular, the bloggers would start to react as well.

Thinking about it in terms of blogging made me realize an assumption I made about music geeks. I assumed that a music geek would just start to slowly wider their horizons and start to like new genres. That the set of optimal songs for a given genre though would stay the same.

That’s not necessarily the case though. Imagine that as someone expands their musical horizons they begin to recognize good musical form. Peppy but sloppy songs they used to like may fall out of favor. Musicianship doesn’t always correlate to popularity.

Mathematically what this means is that a song isn’t simply a member of a single cluster because depending on the cluster the quality of that song will differ. A song is essentially in every cluster simultaneously to a varying (and frequently very small) amount. It suggests a different method for finding clusters by looking at patterns across axes rather than something like k-centroids that looks at all of them as a whole.

Leave a Comment

Trading Songs

I’ve been considering a recommender system that attempts to identify individuals that are exemplars of a particular set of musical tastes. The data on clustering suggests that there are definite groupings of preference (which I’m betting map to genres). So, I’m hunting for the prototypical country listener or rap or whatever.

That idea has been done before to some extent. I know I’ve seen something on the idea of creating a sort of composite profile to represent a cluster, but I don’t know if I’ve read anything about attempting to identify an actual person who fits a cluster and then paying particular attention to that individual’s preferences.

In any case, I identify a set of exemplars. How many is a function of the data and how tight the clusters. With these people I create a system where these exemplars get positive social or emotional or financial feedback introducing new music to the system. Because their preference patterns are prototypical for the group, my reasoning is that the songs they pick will hit on whatever the key characteristics are and their recommendations will be “better” (as a function amount that a random user from the group will like the song).

Note that a convenient function of this system is that we aren’t trying to model what makes a song good even though our ultimate goal is to pick “good” songs to recommend. We are simply trying to model what makes a set of songs similar in human perception. Even if what actually makes a song good is not captured within the system, if that characteristic correlates to things that we are measuring then we can still get a correct grouping.

It’s a semantic point as much as anything, but it shifts the focus somewhat. It lends credence to the direction that Project Aura is headed because how much someone likes a song isn’t simply a function of the timbral or melodic characteristics. I know several feminists that don’t like rap. Honestly they have some pretty good arguments about misogyny and the objectification of women within that culture. A salient grouping characteristic though between a set of people and whether they like rap of not is going to be if they are feminists. That’s not the only axis, but it’s an example of a grouping that has nothing to do with the music itself that is useful if I am going to recommend a song to someone.

Read the rest of this entry »

Leave a Comment

Older Posts »