Where Is the Collaborative Filtering? 1
Rob Malda recently discussed why Digg, reddit, etc. all stink. He’s bang on the money, but this brings to mind the question the thing that has been driving me batty about these news sites: what’s going wrong with the collaborative filtering?
In theory, collaborative filtering algorithms should effectively work like this: lots of people of people label different bits of a dataset based on their tastes. The collaborative filtering algorithm chews through all the labels in the dataset and then predicts how you would label other bits of the dataset based on those whose labels most closely resemble yours have labeled them. When it comes to news, this should mean the engine selects news items based on what is interesting to other people who usually find the same news interesting as you do. My experience on reddit is that this somehow means that no matter how many Ron Paul articles I rate as totally uninteresting, it still seems to find new Ron Paul articles which reddit believes I will find completely fascinating.
Now, I’m well aware that creative people will find ways to game the system, but frankly, proper collaborative filtering should make it really hard to game the so overwhelmingly. If you spam the system much, other people most similar to you start labeling your spam the opposite to how you have, and very quickly your recommendations don’t impact them any more. The only way they can get back in the game would be to create a new account and quickly try to label a whole bunch of data to get in to a trusted position again. This will get increasingly difficult as a community matures, as there will be more and more labeled data for at least the older accounts. I’ve been on reddit for ages, faithful labeling articles most ever day, and the recommended page is still completely useless.
I’ve seen comments from the folks behind reddit that describe their collaborative filtering algorithm as “not working well when there are wide divergences between segments of the community”, which makes me think it isn’t collaborative filtering at all, but rather a machine learning algorithm that is trying to predict an overall “interesting” score, with a karma system to boost the weights of redditor’s labels. When you read the faq this is how it all seems to work.
So is that the deal? The term is just horribly abused, and nobody has bothered to put together a proper collaborative filtering news site, or is there some inherent problem with the concept that I’m missing out on?
Hell Hath No Fury Like the President of the State Chapter of a SIG Whose Endorsement Is Different From That of a High Profile Senator From Another State
In case you’ve had your head in the sand for the last 24 hours, Ted Kennedy announced his support for Barack Obama for President yesterday in what will likely be the biggest endorsement (aside, of course, from when Chuck Norris moved the earth) of this Presidential campaign, unless the Pope decides to endorse Hillary Clinton or Osama bin Laden endorses Rudi Giuliani.
Despite today’s counter-endorsement from another chunk of the Kennedy clan, the ripples from this one will probably continue at least through Super Tuesday. One of the seemingly more curious responses was this tepid statement from NOW’s President Kim Gandy (doesn’t that last paragraph almost read like Sen. Kennedy is a woman? ;-). That only seems curious though if you haven’t read this press release from Marcia Pappas, President of the New York State chapter of NOW, who apparently feels that if you aren’t for Hillary, you are just taking women’s money but ignoring women and children when talking about “…poverty or human needs or America’s future or whatever…”. I’m sure Ms. Gandy called Ms. Pappas to thank her for such a dignified response that in no way plays to negative stereotypes about women.
Shouting "Nuke!" In a Crowded Theatre
Two interesting takes on how to deal with threats. Folks at the University of Purdue want to equip every cell phone with a what is effectively a Geiger counter. Think of it as “security through ubiquity”. Meanwhile the NYPD wants to carefully restrict the distribution of Geiger counters and related devices because of fears of problems with false alarms and such.
Honestly, I think there is wisdom in both approaches. Probably the sanest thing is to have sensors everywhere and to learn not to overreact to false positives…. only people aren’t so good at that last part.
Don't Trust Anyone Under 50 2
Ugh, I don’t know where to start with this one. Let’s see, we’ve got the Feds and the States going head to head on a security measure. We’ve got a security measure that isn’t much of a security measure for it’s supposed intended target (does anyone think the problem was we didn’t have people’s ID’s?). I think the one that takes the cake though is the temporary exemption for folks over 50 (“because they aren’t as likely to be a terrorist, illegal immigrant or con artist”). Yes, that generation that coined the phrase “we don’t trust anybody over 30”, now basically doesn’t trust anyone who wasn’t already alive back then. It’s hard to argue with logic that says it is unlikely to find people interested in suicidal missions amongst a group that includes activists for Euthanasia and people with limited life expectancy.
Is anyone else having visions of pigs marching around saying, “four legs good, two legs better”? At what point do we declare that the baby boomers to have not only abandoned the causes of their youth, but to have in the most profoundly hypocritical manner rejected them and literally become the forces they were fighting?
Steal This Wi-Fi
It’s always cool when you are doing something that people feel is unconventional, and then you discover that one of the more respected minds out there basically thinks the same way. This was my happy discovery today as I read Bruce Schneier’s Steal This Wi-Fi, having gone through pretty much exactly the same thought process. I still find it truly bizarre to think of access to the Internet as the gatekeeper of sorts. The bottom line is that getting on to the Internet is trivially easy, even if your name is Osama Bin Laden. There network is just too large and too unregulated. The trick is limiting what unauthorized people can do once they are on the network.
6.001 007'ed
I have been swamped lately so I haven’t blogged much, but I couldn’t miss observing the passing of MIT’s 6.001. When I was at MIT, this course was generally considered the one that separated the men (or women ;-) from the boys, and of course SICP is the thing of legends. Anyway, it is clearly the end of an era at MIT, but if you missed it, you can still download the videos. The course is also still available through MIT courseware (I’ll find the link later).
As If Programmers Needed Another Reason to Think They Are God.... 2
This is what happens when a programmer starts to dabble in biology. Keep this in mind the next time you meet someone with a bioinformatics background. Even if they aren’t dangerous, they probably think they are, and that’s reason enough to give them a wide birth.
I’m pretty sure one day we’re going to really understand DNA/RNA/etc., and then we’ll find a segment that translates to:
// Worst hack ever! // This doesn't actually make a rational decision, but instead just does what_I_want_to_do // without even looking at relevant_factors. // Returns a rationalization for what_I_want_to_do. struct rationalization make_decision(struct decision what_I_want_to_do, struct factors* relevant_factors)
Now Someone Needs To Make a Language Called Mongoose 1
Check Esterbrook has recently shared with the Kernel Panic Programming Study Group a new programming language he’s developing called Cobra. I haven’t had a chance to look at it closely yet, but I like that his goals are so eminently practical. He gets bonus points for employing Python’s “whitespace is part of the syntax” principle, but then one upping it by insisting that one always use tabs, never spaces, for indenting blocks. That alone is probably going to convince me to use it.
Ruby Going Off The Rails And In To A Deep, Gaping Chasm
Rants can be fun. There is the garden variety rant where someone just blows off some steam. Then there is the next level up where you try to turn indignation (real or otherwise) in to an art form. However, the truly great rants are the built upon white hot anger that’s been quietly simmering, with new spices of hatred periodically added to taste, all unleashed in a vengeful strike of unfiltered fury so over the top it crosses over in to self parody.
Honestly, I can’t say that reading it is particularly enlightening, but it is quite entertaining in a “damn that is the most awesome accident I’ve ever seen” kind of way.