<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Xblog: Where Is the Collaborative Filtering?</title>
    <link>http://xblog.xman.org/articles/2008/01/30/where-is-the-collaborative-filtering</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>hey, if it has a capital X in it, it has to be great!</description>
    <item>
      <title>Where Is the Collaborative Filtering?</title>
      <description>&lt;p&gt;Rob Malda &lt;a href="http://bits.blogs.nytimes.com/2008/01/29/slashdot-founder-questions-crowds-wisdom/index.html" title="Slashdot Founder Questions Crowd's Wisdom"&gt;recently discussed why Digg, reddit, etc. all stink&lt;/a&gt;. He&amp;#8217;s bang on the money, but this brings to mind the question the thing that has been driving me batty about these news sites: what&amp;#8217;s going wrong with the &lt;a href="http://en.wikipedia.org/wiki/Collaborative_filtering" title="Wikipedia: Collaborative filtering"&gt;collaborative filtering&lt;/a&gt;?&lt;/p&gt;

&lt;p&gt;In theory, collaborative filtering algorithms should effectively work like this: lots of people of people label different bits of a dataset based on their tastes. The collaborative filtering algorithm chews through all the labels in the dataset and then predicts how you would label other bits of the dataset based on those whose labels most closely resemble yours have labeled them. When it comes to news, this &lt;em&gt;should&lt;/em&gt; mean the engine selects news items based on what is interesting to other people who usually find the same news interesting as you do. My experience on reddit is that this somehow means that no matter how many Ron Paul articles I rate as totally uninteresting, it still seems to find new Ron Paul articles which reddit believes I will find completely fascinating.&lt;/p&gt;

&lt;p&gt;Now, I&amp;#8217;m well aware that creative people will find ways to game the system, but frankly, proper collaborative filtering &lt;em&gt;should&lt;/em&gt; make it really hard to game the so overwhelmingly. If you spam the system much, other people most similar to you start labeling your spam the opposite to how you have, and very quickly your recommendations don&amp;#8217;t impact them any more. The only way they can get back in the game would be to create a new account and quickly try to label a whole bunch of data to get in to a trusted position again. This will get increasingly difficult as a community matures, as there will be more and more labeled data for at least the older accounts. I&amp;#8217;ve been on reddit for ages, faithful labeling articles most ever day, and the recommended page is still completely useless.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;ve seen comments from the folks behind reddit that describe their collaborative filtering algorithm as &amp;#8220;not working well when there are wide divergences between segments of the community&amp;#8221;, which makes me think it isn&amp;#8217;t collaborative filtering at all, but rather a machine learning algorithm that is trying to predict an overall &amp;#8220;interesting&amp;#8221; score, with a karma system to boost the weights of redditor&amp;#8217;s labels. When you &lt;a href="http://reddit.com/help/faq" title="reddit FAQ"&gt;read the faq&lt;/a&gt; this is how it all seems to work.&lt;/p&gt;

&lt;p&gt;So is that the deal? The term is just horribly abused, and nobody has bothered to put together a proper collaborative filtering news site, or is there some inherent problem with the concept that I&amp;#8217;m missing out on?&lt;/p&gt;</description>
      <pubDate>Wed, 30 Jan 2008 21:39:00 -0800</pubDate>
      <guid isPermaLink="false">urn:uuid:0f647906-3f0b-475d-95c5-df61e0f70244</guid>
      <author>Christopher Smith</author>
      <link>http://xblog.xman.org/articles/2008/01/30/where-is-the-collaborative-filtering</link>
      <category>Programming</category>
      <category>reddit</category>
      <category>digg</category>
      <category>slashdot</category>
      <category>collaborative</category>
      <category>filtering</category>
    </item>
    <item>
      <title>"Where Is the Collaborative Filtering?" by mx</title>
      <description>&lt;p&gt;Greed, ego, and hubris.&lt;/p&gt;

&lt;p&gt;More people are gaming these systems, either for profit (or to stroke ego) than seems initially obvious.  The draw to game the system is greater than the power of everyone else filtering it.  This seems counter-intuitive at first, but I suspect that it&amp;#8217;s our in-built optimism hoping our peers are either smarter, or less destructive. &lt;/p&gt;

&lt;p&gt;I would be curious to see how few people unbalance these systems, and if they can be filtered algorithmically or not.&lt;/p&gt;</description>
      <pubDate>Thu, 31 Jan 2008 08:05:42 -0800</pubDate>
      <guid isPermaLink="false">urn:uuid:9df89ec2-8174-4c7b-bf20-397d56d60621</guid>
      <link>http://xblog.xman.org/articles/2008/01/30/where-is-the-collaborative-filtering#comment-323</link>
    </item>
  </channel>
</rss>
