<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: To make smarter systems, it’s all about the data</title>
	<atom:link href="http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/feed/" rel="self" type="application/rss+xml" />
	<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/</link>
	<description></description>
	<lastBuildDate>Tue, 22 May 2012 10:52:00 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: The Trendrr Blog &#187; Blog Archive &#187; Facts Before Feelings</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-11220</link>
		<dc:creator>The Trendrr Blog &#187; Blog Archive &#187; Facts Before Feelings</dc:creator>
		<pubDate>Tue, 05 Oct 2010 15:33:06 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-11220</guid>
		<description>[...] Technology is always commoditized at some point.&#160;&#160; As C. Dixon points out in a recent blog post titled: &#160; To make smarter systems, it&#8217;s all about the [...]</description>
		<content:encoded><![CDATA[<p>[...] Technology is always commoditized at some point.&nbsp;&nbsp; As C. Dixon points out in a recent blog post titled: &nbsp; To make smarter systems, it&#8217;s all about the [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Why Facebook's Open Graph Will Fall Short In Converting "Social Proof" Sales &#124; rsedmak</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-9762</link>
		<dc:creator>Why Facebook's Open Graph Will Fall Short In Converting "Social Proof" Sales &#124; rsedmak</dc:creator>
		<pubDate>Fri, 23 Jul 2010 14:14:48 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-9762</guid>
		<description>[...] predict your tastes and influence your future purchasing behavior.  (As Chris Dixon points out in a blog post from last year, systems get smarter not by inventing new algorithms but by creating new sources of data). [...]</description>
		<content:encoded><![CDATA[<p>[...] predict your tastes and influence your future purchasing behavior.  (As Chris Dixon points out in a blog post from last year, systems get smarter not by inventing new algorithms but by creating new sources of data). [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: AI should stand for Aggregated Intelligence &#171; A Work in Progress</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-5261</link>
		<dc:creator>AI should stand for Aggregated Intelligence &#171; A Work in Progress</dc:creator>
		<pubDate>Tue, 22 Dec 2009 18:31:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-5261</guid>
		<description>[...] To make smarter systems, it&#8217;s all about the data (cdixon.org) [...]</description>
		<content:encoded><![CDATA[<p>[...] To make smarter systems, it&#8217;s all about the data (cdixon.org) [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Most popular posts &#124; Igniting Startups - nPost</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-4920</link>
		<dc:creator>Most popular posts &#124; Igniting Startups - nPost</dc:creator>
		<pubDate>Wed, 02 Dec 2009 15:21:12 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-4920</guid>
		<description>[...] To make smarter systems, it’s all about the data link [...]</description>
		<content:encoded><![CDATA[<p>[...] To make smarter systems, it’s all about the data link [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Essel</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-5811</link>
		<dc:creator>Mark Essel</dc:creator>
		<pubDate>Mon, 30 Nov 2009 01:42:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-5811</guid>
		<description>I&#039;d like to add an orthogonal viewpoint here. Yes, there&#039;s tremendous value in building databases. But it is in the leveraging of existing processing techniques where great value explosions will occur. &lt;br&gt;&lt;br&gt;Take for instance the coupling of a dozen software services that all independently produce improved customer purchase action. The combination of these independent sources in a novel way, can give a multiplicative value ad of 1.1^12 or a factor of 3 improvement. Now expand this to hundreds or even thousands of independent techniques or connections within a network and you can reveal massive improvements in quality.&lt;br&gt;&lt;br&gt;A brain with only a couple of nodes is pretty weak. A mind with billions of nodes and hundreds of billions of connections is capable of advanced conscious connections, creativity, and unpredictable value advancement. The potent software applications of the future will exist and thrive by utilizing the network of APIs optimally in the construction of their databases and decision architectures.&lt;br&gt;&lt;br&gt;If your curiosity is piqued but you&#039;re still not convinced, check out some of Kevin Kelly&#039;s swarm and emergence concepts. I really enjoy some of his far out predictions and their closer than most folks would guess (in the 10-20 year horizon).&lt;br&gt;&lt;br&gt;10th popular post, and I give it a 7/10 only because you didn&#039;t tie in network effects.</description>
		<content:encoded><![CDATA[<p>I&#39;d like to add an orthogonal viewpoint here. Yes, there&#39;s tremendous value in building databases. But it is in the leveraging of existing processing techniques where great value explosions will occur. </p>
<p>Take for instance the coupling of a dozen software services that all independently produce improved customer purchase action. The combination of these independent sources in a novel way, can give a multiplicative value ad of 1.1^12 or a factor of 3 improvement. Now expand this to hundreds or even thousands of independent techniques or connections within a network and you can reveal massive improvements in quality.</p>
<p>A brain with only a couple of nodes is pretty weak. A mind with billions of nodes and hundreds of billions of connections is capable of advanced conscious connections, creativity, and unpredictable value advancement. The potent software applications of the future will exist and thrive by utilizing the network of APIs optimally in the construction of their databases and decision architectures.</p>
<p>If your curiosity is piqued but you&#39;re still not convinced, check out some of Kevin Kelly&#39;s swarm and emergence concepts. I really enjoy some of his far out predictions and their closer than most folks would guess (in the 10-20 year horizon).</p>
<p>10th popular post, and I give it a 7/10 only because you didn&#39;t tie in network effects.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Essel</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-4890</link>
		<dc:creator>Mark Essel</dc:creator>
		<pubDate>Sun, 29 Nov 2009 17:42:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-4890</guid>
		<description>I&#039;d like to add an orthogonal viewpoint here. Yes, there&#039;s tremendous value in building databases. But it is in the leveraging of existing processing techniques where great value explosions will occur. &lt;br&gt;&lt;br&gt;Take for instance the coupling of a dozen software services that all independently produce improved customer purchase action. The combination of these independent sources in a novel way, can give a multiplicative value ad of 1.1^12 or a factor of 3 improvement. Now expand this to hundreds or even thousands of independent techniques or connections within a network and you can reveal massive improvements in quality.&lt;br&gt;&lt;br&gt;A brain with only a couple of nodes is pretty weak. A mind with billions of nodes and hundreds of billions of connections is capable of advanced conscious connections, creativity, and unpredictable value advancement. The potent software applications of the future will exist and thrive by utilizing the network of APIs optimally in the construction of their databases and decision architectures.&lt;br&gt;&lt;br&gt;If your curiosity is piqued but you&#039;re still not convinced, check out some of Kevin Kelly&#039;s swarm and emergence concepts. I really enjoy some of his far out predictions and their closer than most folks would guess (in the 10-20 year horizon).</description>
		<content:encoded><![CDATA[<p>I&#39;d like to add an orthogonal viewpoint here. Yes, there&#39;s tremendous value in building databases. But it is in the leveraging of existing processing techniques where great value explosions will occur. </p>
<p>Take for instance the coupling of a dozen software services that all independently produce improved customer purchase action. The combination of these independent sources in a novel way, can give a multiplicative value ad of 1.1^12 or a factor of 3 improvement. Now expand this to hundreds or even thousands of independent techniques or connections within a network and you can reveal massive improvements in quality.</p>
<p>A brain with only a couple of nodes is pretty weak. A mind with billions of nodes and hundreds of billions of connections is capable of advanced conscious connections, creativity, and unpredictable value advancement. The potent software applications of the future will exist and thrive by utilizing the network of APIs optimally in the construction of their databases and decision architectures.</p>
<p>If your curiosity is piqued but you&#39;re still not convinced, check out some of Kevin Kelly&#39;s swarm and emergence concepts. I really enjoy some of his far out predictions and their closer than most folks would guess (in the 10-20 year horizon).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: xamat</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-4480</link>
		<dc:creator>xamat</dc:creator>
		<pubDate>Tue, 27 Oct 2009 21:26:48 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-4480</guid>
		<description>Quite a coincidence. I recently gave a talk with the same title. See my blog &lt;a href=&quot;http://technocalifornia.blogspot.com/2009/07/its-all-about-data.html&quot; rel=&quot;nofollow&quot;&gt;http://technocalifornia.blogspot.com/2009/07/it...&lt;/a&gt; or my slides on slideshare &lt;a href=&quot;http://www.slideshare.net/xamat/its-all-about-the-data&quot; rel=&quot;nofollow&quot;&gt;http://www.slideshare.net/xamat/its-all-about-t...&lt;/a&gt;</description>
		<content:encoded><![CDATA[<p>Quite a coincidence. I recently gave a talk with the same title. See my blog <a href="http://technocalifornia.blogspot.com/2009/07/its-all-about-data.html" rel="nofollow">http://technocalifornia.blogspot.com/2009/07/it&#8230;</a> or my slides on slideshare <a href="http://www.slideshare.net/xamat/its-all-about-the-data" rel="nofollow">http://www.slideshare.net/xamat/its-all-about-t&#8230;</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: INDEX // mb - Against Forecasting: A Case for More Agility in Book Publishing</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-3747</link>
		<dc:creator>INDEX // mb - Against Forecasting: A Case for More Agility in Book Publishing</dc:creator>
		<pubDate>Mon, 05 Oct 2009 04:43:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-3747</guid>
		<description>[...] Chris Dixon, in a great post on commoditized forecasting algorithms points out that studies have shown that naive bayes is as good or better than fancy algorithms in a [...]</description>
		<content:encoded><![CDATA[<p>[...] Chris Dixon, in a great post on commoditized forecasting algorithms points out that studies have shown that naive bayes is as good or better than fancy algorithms in a [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: chris dixon</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-2024</link>
		<dc:creator>chris dixon</dc:creator>
		<pubDate>Thu, 03 Sep 2009 19:47:39 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-2024</guid>
		<description>Thanks - looks very interesting - I&#039;ll check it out.</description>
		<content:encoded><![CDATA[<p>Thanks &#8211; looks very interesting &#8211; I&#39;ll check it out.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: terrycojones</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1974</link>
		<dc:creator>terrycojones</dc:creator>
		<pubDate>Wed, 02 Sep 2009 19:25:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1974</guid>
		<description>Actually, most of &lt;a href=&quot;http://blogs.fluidinfo.com/terry/category/representation/&quot; rel=&quot;nofollow&quot;&gt;http://blogs.fluidinfo.com/terry/category/repre...&lt;/a&gt; is relevant to this.</description>
		<content:encoded><![CDATA[<p>Actually, most of <a href="http://blogs.fluidinfo.com/terry/category/representation/" rel="nofollow">http://blogs.fluidinfo.com/terry/category/repre&#8230;</a> is relevant to this.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: terrycojones</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1971</link>
		<dc:creator>terrycojones</dc:creator>
		<pubDate>Wed, 02 Sep 2009 19:18:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1971</guid>
		<description>Hi Chris&lt;br&gt;&lt;br&gt;I agree. You might like to check out FluidDB, which is all about using a better data representation to change how we work with information. See &lt;a href=&quot;http://fluidinfo.com&quot; rel=&quot;nofollow&quot;&gt;http://fluidinfo.com&lt;/a&gt; and &lt;a href=&quot;http://blogs.fluidinfo.com/fluidDB&quot; rel=&quot;nofollow&quot;&gt;http://blogs.fuidinfo.com/fluidDB&lt;/a&gt;&lt;br&gt;&lt;br&gt;I&#039;ve also written about this exact subject a few times. A starting link:&lt;br&gt;&lt;a href=&quot;http://blogs.fluidinfo.com/terry/2007/03/19/why-data-information-representation-is-the-key-to-the-coming-semantic-web/&quot; rel=&quot;nofollow&quot;&gt;http://blogs.fluidinfo.com/terry/2007/03/19/why...&lt;/a&gt;&lt;br&gt;&lt;br&gt;And I talk a little about Alex Wright at &lt;a href=&quot;http://blogs.fluidinfo.com/terry/2008/01/04/tagging-in-the-year-3000-bc/&quot; rel=&quot;nofollow&quot;&gt;http://blogs.fluidinfo.com/terry/2008/01/04/tag...&lt;/a&gt;&lt;br&gt;&lt;br&gt;Please feel free to get in touch, I&#039;m terry fluidinfo com and would be happy to go into more depth / hear more from you, etc.</description>
		<content:encoded><![CDATA[<p>Hi Chris</p>
<p>I agree. You might like to check out FluidDB, which is all about using a better data representation to change how we work with information. See <a href="http://fluidinfo.com" rel="nofollow">http://fluidinfo.com</a> and <a href="http://blogs.fluidinfo.com/fluidDB" rel="nofollow">http://blogs.fuidinfo.com/fluidDB</a></p>
<p>I&#39;ve also written about this exact subject a few times. A starting link:<br /><a href="http://blogs.fluidinfo.com/terry/2007/03/19/why-data-information-representation-is-the-key-to-the-coming-semantic-web/" rel="nofollow">http://blogs.fluidinfo.com/terry/2007/03/19/why&#8230;</a></p>
<p>And I talk a little about Alex Wright at <a href="http://blogs.fluidinfo.com/terry/2008/01/04/tagging-in-the-year-3000-bc/" rel="nofollow">http://blogs.fluidinfo.com/terry/2008/01/04/tag&#8230;</a></p>
<p>Please feel free to get in touch, I&#39;m terry fluidinfo com and would be happy to go into more depth / hear more from you, etc.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Stalking people to structure conversations. &#124; Taylor Davidson</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1951</link>
		<dc:creator>Stalking people to structure conversations. &#124; Taylor Davidson</dc:creator>
		<pubDate>Wed, 02 Sep 2009 09:53:33 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1951</guid>
		<description>[...] efficiency and serendipity is where the fun (and the business opportunities) lie in compressing better data. [...]</description>
		<content:encoded><![CDATA[<p>[...] efficiency and serendipity is where the fun (and the business opportunities) lie in compressing better data. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1930</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Tue, 01 Sep 2009 20:41:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1930</guid>
		<description>Actually, I was driving at something different than having humans label the data.  What I was talking about was adjusting the machine learning, so that domain-specific knowledge is built into the learning algorithm.  &lt;br&gt;&lt;br&gt;For example, I had some work a few years ago that used HMMs to label chord information on a set of Beatles tunes.  However, instead of using massive amounts of training data on the HMMs, I used zero training data.  Zero.  Instead, I initialized the HMM using musicologically-sensible initial conditions, and then I adjusted the standard HMM E-M algorithm so that the [B] output probability matrix did NOT get updated; it stayed stable.  &lt;br&gt;&lt;br&gt;All of the intelligence was in the algorithm.  There was no human labeled data.  And the algorithm performed best in class -- better than solutions that had been trained on lots of human-labeled data.&lt;br&gt;&lt;br&gt;So to say we can&#039;t make smarter algorithms, or that breakthroughs will only come via data, simply doesn&#039;t sit right with me.  &lt;br&gt;&lt;br&gt;I think what people tend to mean when they say &quot;it&#039;s all about the data&quot; is that there is no purpose coming up with better general purpose machine learning algorithms, i.e. SVMs vs. gaussian mixture models vs. Markov random fields vs. whatever.  If that&#039;s your main point, then I agree.&lt;br&gt;&lt;br&gt;But we can also create more intelligent, specialized, intelligent algorithms by building our own smarts into a general purpose ML algorithm, thereby making the algorithm smarter.  And we can do it without the need for massive amounts of data.  Again, imho.&lt;br&gt;&lt;br&gt;So I just don&#039;t buy that</description>
		<content:encoded><![CDATA[<p>Actually, I was driving at something different than having humans label the data.  What I was talking about was adjusting the machine learning, so that domain-specific knowledge is built into the learning algorithm.  </p>
<p>For example, I had some work a few years ago that used HMMs to label chord information on a set of Beatles tunes.  However, instead of using massive amounts of training data on the HMMs, I used zero training data.  Zero.  Instead, I initialized the HMM using musicologically-sensible initial conditions, and then I adjusted the standard HMM E-M algorithm so that the [B] output probability matrix did NOT get updated; it stayed stable.  </p>
<p>All of the intelligence was in the algorithm.  There was no human labeled data.  And the algorithm performed best in class &#8212; better than solutions that had been trained on lots of human-labeled data.</p>
<p>So to say we can&#39;t make smarter algorithms, or that breakthroughs will only come via data, simply doesn&#39;t sit right with me.  </p>
<p>I think what people tend to mean when they say &#8220;it&#39;s all about the data&#8221; is that there is no purpose coming up with better general purpose machine learning algorithms, i.e. SVMs vs. gaussian mixture models vs. Markov random fields vs. whatever.  If that&#39;s your main point, then I agree.</p>
<p>But we can also create more intelligent, specialized, intelligent algorithms by building our own smarts into a general purpose ML algorithm, thereby making the algorithm smarter.  And we can do it without the need for massive amounts of data.  Again, imho.</p>
<p>So I just don&#39;t buy that</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: chris dixon</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1917</link>
		<dc:creator>chris dixon</dc:creator>
		<pubDate>Tue, 01 Sep 2009 16:03:15 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1917</guid>
		<description>Good points jeremy.  I think this is also where the line between data and algorithms starts to blur.  If you are doing, say, vertical music search, a lot of your &quot;algorithm&quot; will come from ML on music-related corpora, which might include having humans labeling the data at various points.</description>
		<content:encoded><![CDATA[<p>Good points jeremy.  I think this is also where the line between data and algorithms starts to blur.  If you are doing, say, vertical music search, a lot of your &#8220;algorithm&#8221; will come from ML on music-related corpora, which might include having humans labeling the data at various points.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1913</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Tue, 01 Sep 2009 12:50:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1913</guid>
		<description>&lt;i&gt;think the reason what you say works is precisely because the domain becomes small enough that you can do all sorts of things to fill in the gaps in the data.&lt;/i&gt;  &lt;br&gt;&lt;br&gt;But isn&#039;t that the point?  At the end of the day, no matter what the reason, you resorted to clever algorithms, rather than large data.  So the thing about data being the only good source of future AI breakthroughs just ain&#039;t true.  Relying on large data isn&#039;t &quot;wrong&quot;.  It&#039;s just not the universal panacea that you make it out to be.&lt;br&gt;&lt;br&gt;The way I see it, there is a small head of large-scale, breadth-loving domains (e.g. web) and/or tasks (e.g. known-item finding.. rather than exploratory search..) in which large data is very appropriate.  &lt;br&gt;&lt;br&gt;At the same time, there is a long tail of medium and small-scale, depth-loving domains (e.g. content-based music search) and tasks (e.g. exploratory search) in which large data does not give you as much as an intelligently-constructed algorithm.&lt;br&gt;&lt;br&gt;So what if the only reason you can construct those algorithms is because the domain is well-enough constrained?  We know from power-law distributions that the volume (usage, whatever) of tasks and problems in the tail sums up to be equal in magnitude to the head.  &lt;br&gt;&lt;br&gt;So at the most, you can say that large data will help you make AI breakthroughs in *half* of the open problems.  Intelligent algorithms will still be necessary for the other half.  imho.</description>
		<content:encoded><![CDATA[<p><i>think the reason what you say works is precisely because the domain becomes small enough that you can do all sorts of things to fill in the gaps in the data.</i>  </p>
<p>But isn&#39;t that the point?  At the end of the day, no matter what the reason, you resorted to clever algorithms, rather than large data.  So the thing about data being the only good source of future AI breakthroughs just ain&#39;t true.  Relying on large data isn&#39;t &#8220;wrong&#8221;.  It&#39;s just not the universal panacea that you make it out to be.</p>
<p>The way I see it, there is a small head of large-scale, breadth-loving domains (e.g. web) and/or tasks (e.g. known-item finding.. rather than exploratory search..) in which large data is very appropriate.  </p>
<p>At the same time, there is a long tail of medium and small-scale, depth-loving domains (e.g. content-based music search) and tasks (e.g. exploratory search) in which large data does not give you as much as an intelligently-constructed algorithm.</p>
<p>So what if the only reason you can construct those algorithms is because the domain is well-enough constrained?  We know from power-law distributions that the volume (usage, whatever) of tasks and problems in the tail sums up to be equal in magnitude to the head.  </p>
<p>So at the most, you can say that large data will help you make AI breakthroughs in *half* of the open problems.  Intelligent algorithms will still be necessary for the other half.  imho.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1911</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Tue, 01 Sep 2009 12:40:43 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1911</guid>
		<description>Sure, MS now has all this Yahoo data.  And Google has plenty of data too.  But what is that data?  It&#039;s known-item, factoid retrieval data.  There is no exploratory search data in there.  There is no recall-oriented search there.  So the only way the data can be used is to improve known-item oriented searching.  But that in turn feeds back on itself.. and when Bing gets better at known-item searching, more people will use it for known-item searching, and then the data they collect pigeon-holes them further into that one, narrow, Google-like information seeking behavior.&lt;br&gt;&lt;br&gt;So it seems to me that the only way out of the constrains imposed upon Bing search is for Bing to come up with clever-er algorithms that do something different and better, despite the gradient which the data is pointing it toward.  &lt;br&gt;&lt;br&gt;If one relies on the data alone, one will not solve a very large range of AI problems.  Intelligent algorithms are needed to make those breakthroughs.</description>
		<content:encoded><![CDATA[<p>Sure, MS now has all this Yahoo data.  And Google has plenty of data too.  But what is that data?  It&#39;s known-item, factoid retrieval data.  There is no exploratory search data in there.  There is no recall-oriented search there.  So the only way the data can be used is to improve known-item oriented searching.  But that in turn feeds back on itself.. and when Bing gets better at known-item searching, more people will use it for known-item searching, and then the data they collect pigeon-holes them further into that one, narrow, Google-like information seeking behavior.</p>
<p>So it seems to me that the only way out of the constrains imposed upon Bing search is for Bing to come up with clever-er algorithms that do something different and better, despite the gradient which the data is pointing it toward.  </p>
<p>If one relies on the data alone, one will not solve a very large range of AI problems.  Intelligent algorithms are needed to make those breakthroughs.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: chris dixon</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1902</link>
		<dc:creator>chris dixon</dc:creator>
		<pubDate>Tue, 01 Sep 2009 09:30:37 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1902</guid>
		<description>Yeah, from what I hear in the rumor mill, search engines today use click data, bounce rates etc much more than people suspect.  With such a long tail of key phrases people enter into search engines, they must have almost unlimited appetite for more user data to get statistically meaningful tests.</description>
		<content:encoded><![CDATA[<p>Yeah, from what I hear in the rumor mill, search engines today use click data, bounce rates etc much more than people suspect.  With such a long tail of key phrases people enter into search engines, they must have almost unlimited appetite for more user data to get statistically meaningful tests.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: chris dixon</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1901</link>
		<dc:creator>chris dixon</dc:creator>
		<pubDate>Tue, 01 Sep 2009 09:28:39 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1901</guid>
		<description>Hey roger, thanks!  Glad to see you here.&lt;br&gt;&lt;br&gt;Re domain specific - I agree, but I think the reason what you say works is precisely because the domain becomes small enough that you can do all sorts of things to fill in the gaps in the data.  I think of my last company, SiteAdvisor, as a data company.  The way we got from 80% to near 100% was all sorts of techniques, from hacks to manual processes to integrating other data sets.  We couldn&#039;t have used those techniques in a horizontal setting.</description>
		<content:encoded><![CDATA[<p>Hey roger, thanks!  Glad to see you here.</p>
<p>Re domain specific &#8211; I agree, but I think the reason what you say works is precisely because the domain becomes small enough that you can do all sorts of things to fill in the gaps in the data.  I think of my last company, SiteAdvisor, as a data company.  The way we got from 80% to near 100% was all sorts of techniques, from hacks to manual processes to integrating other data sets.  We couldn&#39;t have used those techniques in a horizontal setting.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: michels24</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1882</link>
		<dc:creator>michels24</dc:creator>
		<pubDate>Tue, 01 Sep 2009 03:21:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1882</guid>
		<description>Access to data was one of the things overlooked in the Msft/YHOO search deal.  There was a lot of talk about revenue shares and upfront payments, but people forget that Msft now gets a larger source of data to improve it&#039;s product.  Without that query stream (i.e. data) they would never be able to build as intelligent tools for spelling correction, query intent, auto-complete, etc.  One (of many) reasons Google has knocked the ball out of the park is access to this data.  Their bucket tests in a week probably provide more insights than the other guys get in a quarter.</description>
		<content:encoded><![CDATA[<p>Access to data was one of the things overlooked in the Msft/YHOO search deal.  There was a lot of talk about revenue shares and upfront payments, but people forget that Msft now gets a larger source of data to improve it&#39;s product.  Without that query stream (i.e. data) they would never be able to build as intelligent tools for spelling correction, query intent, auto-complete, etc.  One (of many) reasons Google has knocked the ball out of the park is access to this data.  Their bucket tests in a week probably provide more insights than the other guys get in a quarter.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: infoarbitrage</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1877</link>
		<dc:creator>infoarbitrage</dc:creator>
		<pubDate>Tue, 01 Sep 2009 02:23:25 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1877</guid>
		<description>Chris, first of all, congrats on the blog. It is terrific. And based on the breath and intelligence of the comments, this has already become a very exciting ecosystem in which to participate.&lt;br&gt;&lt;br&gt;While I agree with the thrust of your post, you&#039;ve taken a very horizontal view of the problem. I do not spend time on Google-scale problems, but on much more targeted, vertical solutions to the &quot;big data&quot; problem. By layering domain specificity onto the problem of semantic analysis, many of the pitfalls of NLP and AI become far more manageable. I&#039;m not saying they&#039;re a panacea, and certainly not when trying to solve problems in real-time, but they can take you a lot farther than when applying them to horizontal data sets. &lt;br&gt;&lt;br&gt;And yes, tagging rich data and creating additional metadata for analysis holds many of the keys to extracting true meaning from unstructured data sets. I could write on this topic for hours. Thanks for the post.&lt;br&gt;&lt;br&gt;Roger</description>
		<content:encoded><![CDATA[<p>Chris, first of all, congrats on the blog. It is terrific. And based on the breath and intelligence of the comments, this has already become a very exciting ecosystem in which to participate.</p>
<p>While I agree with the thrust of your post, you&#39;ve taken a very horizontal view of the problem. I do not spend time on Google-scale problems, but on much more targeted, vertical solutions to the &#8220;big data&#8221; problem. By layering domain specificity onto the problem of semantic analysis, many of the pitfalls of NLP and AI become far more manageable. I&#39;m not saying they&#39;re a panacea, and certainly not when trying to solve problems in real-time, but they can take you a lot farther than when applying them to horizontal data sets. </p>
<p>And yes, tagging rich data and creating additional metadata for analysis holds many of the keys to extracting true meaning from unstructured data sets. I could write on this topic for hours. Thanks for the post.</p>
<p>Roger</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: aseth</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1847</link>
		<dc:creator>aseth</dc:creator>
		<pubDate>Mon, 31 Aug 2009 20:23:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1847</guid>
		<description>Well said. The big fact about &#039;data&#039; is that if it is not &#039;whole&#039; then it tends to be dangerous (in terms of the predictions that it produces - the predictions on the face of it could look awesome, but have a propensity to be as wacky as not having data at all).&lt;br&gt;&lt;br&gt;I have seen entities make big mistakes in trying to solve major problems with machine learning (and resting on their laurels) without considering the fact that not all data needed that influences the outcomes are being sourced or even though about.</description>
		<content:encoded><![CDATA[<p>Well said. The big fact about &#39;data&#39; is that if it is not &#39;whole&#39; then it tends to be dangerous (in terms of the predictions that it produces &#8211; the predictions on the face of it could look awesome, but have a propensity to be as wacky as not having data at all).</p>
<p>I have seen entities make big mistakes in trying to solve major problems with machine learning (and resting on their laurels) without considering the fact that not all data needed that influences the outcomes are being sourced or even though about.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: henchan</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1845</link>
		<dc:creator>henchan</dc:creator>
		<pubDate>Mon, 31 Aug 2009 19:26:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1845</guid>
		<description>To illustrate the incredible subtlety in the interplay of system components, let me paint a metaphor from nature. It is not the only possible mapping between these two domains, but it serves my current purpose.&lt;br&gt;The algorithm is a copying mechanism; data encoding is DNA / RNA; information is the array of working combinations of encoded data; application processes are organisms; applications are species; communication (pub/sub) is natural selection; the ecosystem is the ecosystem. &lt;br&gt;Living organisms are incented to survive and replicate. Likewise the aim of a publisher is to communicate - deeply, broadly and for a long time. SEO happens to be a powerful form of communication at present. Certainly, a breakthrough in this area will need to get established in some existing niche. Long term though it need not be sustained by currently extant forms of communication.</description>
		<content:encoded><![CDATA[<p>To illustrate the incredible subtlety in the interplay of system components, let me paint a metaphor from nature. It is not the only possible mapping between these two domains, but it serves my current purpose.<br />The algorithm is a copying mechanism; data encoding is DNA / RNA; information is the array of working combinations of encoded data; application processes are organisms; applications are species; communication (pub/sub) is natural selection; the ecosystem is the ecosystem. <br />Living organisms are incented to survive and replicate. Likewise the aim of a publisher is to communicate &#8211; deeply, broadly and for a long time. SEO happens to be a powerful form of communication at present. Certainly, a breakthrough in this area will need to get established in some existing niche. Long term though it need not be sustained by currently extant forms of communication.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: pescatello</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1821</link>
		<dc:creator>pescatello</dc:creator>
		<pubDate>Mon, 31 Aug 2009 13:21:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1821</guid>
		<description>Interesting point about the links.  I never really thought about it before.  More and more metadata will begin to appear around the web which means that the systems that &quot;understand&quot; the data will be able to do new and more powerful thigns.  Similar to how last.fm can know which person is most like me - a &quot;musical neighbor.&quot;  There was never a source/database of listened and liked tracks before, but once you have it you can do things like this.   Very interesting post</description>
		<content:encoded><![CDATA[<p>Interesting point about the links.  I never really thought about it before.  More and more metadata will begin to appear around the web which means that the systems that &#8220;understand&#8221; the data will be able to do new and more powerful thigns.  Similar to how last.fm can know which person is most like me &#8211; a &#8220;musical neighbor.&#8221;  There was never a source/database of listened and liked tracks before, but once you have it you can do things like this.   Very interesting post</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: calebelston</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1820</link>
		<dc:creator>calebelston</dc:creator>
		<pubDate>Mon, 31 Aug 2009 12:49:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1820</guid>
		<description>Hey Chris, great post. Been thinking about the business implications of focusing on algorithms vs. insight. I am experimenting with something new on my latest post; decided to record an audio  companion version of  for those who prefer to listen. Would love to hear your thoughts on the format and content :)&lt;br&gt;&lt;br&gt;&lt;a href=&quot;http://calebelston.com/are-algorithms-the-magic-bullet-0&quot; rel=&quot;nofollow&quot;&gt;http://calebelston.com/are-algorithms-the-magic...&lt;/a&gt;</description>
		<content:encoded><![CDATA[<p>Hey Chris, great post. Been thinking about the business implications of focusing on algorithms vs. insight. I am experimenting with something new on my latest post; decided to record an audio  companion version of  for those who prefer to listen. Would love to hear your thoughts on the format and content <img src='http://cdixon.org/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p><a href="http://calebelston.com/are-algorithms-the-magic-bullet-0" rel="nofollow">http://calebelston.com/are-algorithms-the-magic&#8230;</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: startupeconomy</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1819</link>
		<dc:creator>startupeconomy</dc:creator>
		<pubDate>Mon, 31 Aug 2009 11:10:46 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1819</guid>
		<description>Ditto that.</description>
		<content:encoded><![CDATA[<p>Ditto that.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: cdixon</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1818</link>
		<dc:creator>cdixon</dc:creator>
		<pubDate>Mon, 31 Aug 2009 11:03:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1818</guid>
		<description>&quot;If I were to aggregate all the world&#039;s information (cost aside) and structure the data somehow,&quot;  One problem is that 99% of the &quot;information&quot; (speaking in the broadest sense) iis in people&#039;s heads, out in nature, etc - not in digitally accessible form.</description>
		<content:encoded><![CDATA[<p>&#8220;If I were to aggregate all the world&#39;s information (cost aside) and structure the data somehow,&#8221;  One problem is that 99% of the &#8220;information&#8221; (speaking in the broadest sense) iis in people&#39;s heads, out in nature, etc &#8211; not in digitally accessible form.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: cdixon</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1817</link>
		<dc:creator>cdixon</dc:creator>
		<pubDate>Mon, 31 Aug 2009 11:02:29 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1817</guid>
		<description>I agree it&#039;s a bit of slippery slope between data and algorithms.  You could create an algorithm that creates a new data source from an existing one.  But I bet you if someone has a breakthrough doing it the algorithm itself won&#039;t be as interesting as the data source they identified.&lt;br&gt;&lt;br&gt;Re semantic tagging - If publishers were to ubiquitously start doing so, that would qualify as a massive new data source in my way of thinking.  People have been talking about that for years but right now their is no real incentive for publishers to do it.  Maybe if Google made it help you SEO or something people would start to care.</description>
		<content:encoded><![CDATA[<p>I agree it&#39;s a bit of slippery slope between data and algorithms.  You could create an algorithm that creates a new data source from an existing one.  But I bet you if someone has a breakthrough doing it the algorithm itself won&#39;t be as interesting as the data source they identified.</p>
<p>Re semantic tagging &#8211; If publishers were to ubiquitously start doing so, that would qualify as a massive new data source in my way of thinking.  People have been talking about that for years but right now their is no real incentive for publishers to do it.  Maybe if Google made it help you SEO or something people would start to care.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: startupeconomy</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1816</link>
		<dc:creator>startupeconomy</dc:creator>
		<pubDate>Mon, 31 Aug 2009 10:35:04 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1816</guid>
		<description>Right on, Chris.  Because many startups believe in changing the world, there&#039;s tendency towards believing that there&#039;s gotta be a breakthrough algorithm to structure data in a useful way.  The problem is... there&#039;s really not much to apply the algorithms against.  &lt;br&gt;&lt;br&gt;Starting with the Web 1.0 where it&#039;s onerous of publishers to generate the data sources, Web 2.0 made it possible to catapult the rate of data generation possible.  The big outstanding question is that the user-generated data cannot really be used to structure due to 1) unstructuredness, 2)lack of cleanliness, 3) lack of irrelevance, at least to systematically digest what human beings really try to convey, etc.  Many user contents are indeed useful, but not much money floating around to buy them.&lt;br&gt;&lt;br&gt;The semantic web never delivered its promise because people keep banging their heads against structured vs. unstructured data when the real problem is structuring inconsistent, meaningless, and sometime garbage data wouldn&#039;t really lead to a value creation for the ultimate data consumer.&lt;br&gt;&lt;br&gt;If I were to aggregate all the world&#039;s information (cost aside) and structure the data somehow, would I be able to answer all questions in the universe?  I have doubts. Who knows?</description>
		<content:encoded><![CDATA[<p>Right on, Chris.  Because many startups believe in changing the world, there&#39;s tendency towards believing that there&#39;s gotta be a breakthrough algorithm to structure data in a useful way.  The problem is&#8230; there&#39;s really not much to apply the algorithms against.  </p>
<p>Starting with the Web 1.0 where it&#39;s onerous of publishers to generate the data sources, Web 2.0 made it possible to catapult the rate of data generation possible.  The big outstanding question is that the user-generated data cannot really be used to structure due to 1) unstructuredness, 2)lack of cleanliness, 3) lack of irrelevance, at least to systematically digest what human beings really try to convey, etc.  Many user contents are indeed useful, but not much money floating around to buy them.</p>
<p>The semantic web never delivered its promise because people keep banging their heads against structured vs. unstructured data when the real problem is structuring inconsistent, meaningless, and sometime garbage data wouldn&#39;t really lead to a value creation for the ultimate data consumer.</p>
<p>If I were to aggregate all the world&#39;s information (cost aside) and structure the data somehow, would I be able to answer all questions in the universe?  I have doubts. Who knows?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: henchan</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1805</link>
		<dc:creator>henchan</dc:creator>
		<pubDate>Mon, 31 Aug 2009 04:41:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1805</guid>
		<description>Isn&#039;t there a false dichotomy in the post? Better algorithms will confer functional benefits while new data sources will increase the range of their coverage. Depth and breadth respectively. One or both approaches may be useful for different use cases. Indeed data and code can be inter-dependent. If the quantity of data increases say, while its quality (according to some specified requirement) simultaneously deteriorates, net gain could be negative unless the algorithm can be altered to compensate.&lt;br&gt;&lt;br&gt;It is good to hark back to Google in 1998 or to the nascent WWW ten years earlier. To be thinking of what would it take to make another radical improvement in information management. My view is that the next generational shift will be ubiquitous semantic tagging of public data by the publisher. These tags will be interpreted using consistent, open algorithms but they will be interpreted subjectively by each subscriber, according to private data unevenly distributed across the system. &lt;br&gt;The high cost of creating tags is an empirical observation: true in respect of the Semantic Web and no doubt other systems, but not a universal law. When the requirement for objectivity is dropped, semantic tags with good-enough efficacy can be created at very low marginal cost.</description>
		<content:encoded><![CDATA[<p>Isn&#39;t there a false dichotomy in the post? Better algorithms will confer functional benefits while new data sources will increase the range of their coverage. Depth and breadth respectively. One or both approaches may be useful for different use cases. Indeed data and code can be inter-dependent. If the quantity of data increases say, while its quality (according to some specified requirement) simultaneously deteriorates, net gain could be negative unless the algorithm can be altered to compensate.</p>
<p>It is good to hark back to Google in 1998 or to the nascent WWW ten years earlier. To be thinking of what would it take to make another radical improvement in information management. My view is that the next generational shift will be ubiquitous semantic tagging of public data by the publisher. These tags will be interpreted using consistent, open algorithms but they will be interpreted subjectively by each subscriber, according to private data unevenly distributed across the system. <br />The high cost of creating tags is an empirical observation: true in respect of the Semantic Web and no doubt other systems, but not a universal law. When the requirement for objectivity is dropped, semantic tags with good-enough efficacy can be created at very low marginal cost.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: cdixon</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1802</link>
		<dc:creator>cdixon</dc:creator>
		<pubDate>Mon, 31 Aug 2009 01:09:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1802</guid>
		<description>Eran - I was speaking about Google circa 1998.  At the time the insight of including links and anchor text really did make their search engine vastly better.  All search engines use that data today so that advantage is gone.  Probably today the biggest advantages in search today comes from years of devisings &quot;bags of tricks&quot; - lots of little algorithms that collectively yield a better experience.</description>
		<content:encoded><![CDATA[<p>Eran &#8211; I was speaking about Google circa 1998.  At the time the insight of including links and anchor text really did make their search engine vastly better.  All search engines use that data today so that advantage is gone.  Probably today the biggest advantages in search today comes from years of devisings &#8220;bags of tricks&#8221; &#8211; lots of little algorithms that collectively yield a better experience.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: cdixon</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1803</link>
		<dc:creator>cdixon</dc:creator>
		<pubDate>Mon, 31 Aug 2009 01:09:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1803</guid>
		<description>Eran - I was speaking about Google circa 1998.  At the time the insight of including links and anchor text really did make their search engine vastly better.  All search engines use that data today so that advantage is gone.  Probably today the biggest advantages in search today comes from years of devisings &quot;bags of tricks&quot; - lots of little algorithms that collectively yield a better experience.</description>
		<content:encoded><![CDATA[<p>Eran &#8211; I was speaking about Google circa 1998.  At the time the insight of including links and anchor text really did make their search engine vastly better.  All search engines use that data today so that advantage is gone.  Probably today the biggest advantages in search today comes from years of devisings &#8220;bags of tricks&#8221; &#8211; lots of little algorithms that collectively yield a better experience.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Knowtu &#187; links for 2009-08-30</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1798</link>
		<dc:creator>Knowtu &#187; links for 2009-08-30</dc:creator>
		<pubDate>Mon, 31 Aug 2009 01:03:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1798</guid>
		<description>[...] cdixon.org / To make smarter systems, it’s all about the data (tags: ai business) [...]</description>
		<content:encoded><![CDATA[<p>[...] cdixon.org / To make smarter systems, it’s all about the data (tags: ai business) [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: theflyingchange</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1801</link>
		<dc:creator>theflyingchange</dc:creator>
		<pubDate>Mon, 31 Aug 2009 01:00:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1801</guid>
		<description>A very clever and useful insight.  Almost a perfect blog post to me.  An archetype for the form.  The Google example is great.  Got me thinking.</description>
		<content:encoded><![CDATA[<p>A very clever and useful insight.  Almost a perfect blog post to me.  An archetype for the form.  The Google example is great.  Got me thinking.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Eran Shir </title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1797</link>
		<dc:creator>Eran Shir </dc:creator>
		<pubDate>Sun, 30 Aug 2009 22:50:10 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1797</guid>
		<description>I think data is like the height of NBA players. It&#039;s very hard to be a pro with a 5&#039; height, but it doesn&#039;t mean the tallest player is the best, in fact it&#039;s seldom the case. Same with data. You need enough of it to make things interesting but the idea that google for  example, is dominant because she has more data than no. 2-10 is absurd. At some point it&#039;s not who has more, it&#039;s what it does with it.</description>
		<content:encoded><![CDATA[<p>I think data is like the height of NBA players. It&#39;s very hard to be a pro with a 5&#39; height, but it doesn&#39;t mean the tallest player is the best, in fact it&#39;s seldom the case. Same with data. You need enough of it to make things interesting but the idea that google for  example, is dominant because she has more data than no. 2-10 is absurd. At some point it&#39;s not who has more, it&#39;s what it does with it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1796</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Sun, 30 Aug 2009 21:03:15 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1796</guid>
		<description>I have my doubts, Chris. What you say is invariably true for a certain class of problems and tasks (finding popular recommendations on Amazon, finding home pages on Google).  But by biasing your algorithms to large data, you might make other classes of problems even more difficult.  Rather than repeat all the arguments, let me point you to a couple of places where I wrote about it a few months ago:&lt;br&gt;&lt;br&gt;&lt;a href=&quot;http://irgupf.com/2009/04/09/retrievability/&quot; rel=&quot;nofollow&quot;&gt;http://irgupf.com/2009/04/09/retrievability/&lt;/a&gt;&lt;br&gt;&lt;a href=&quot;http://irgupf.com/2009/04/23/retrievability-and-prague-cafes/&quot; rel=&quot;nofollow&quot;&gt;http://irgupf.com/2009/04/23/retrievability-and...&lt;/a&gt;&lt;br&gt;&lt;a href=&quot;http://irgupf.com/2009/04/09/large-data-versus-limited-applicability/&quot; rel=&quot;nofollow&quot;&gt;http://irgupf.com/2009/04/09/large-data-versus-...&lt;/a&gt;&lt;br&gt;&lt;br&gt;In a nutshell, large data allows you to solve certain types of problems well, but may end up making other types of problems much more difficult, if all you have is naive Bayes on top of that data making your inferences.</description>
		<content:encoded><![CDATA[<p>I have my doubts, Chris. What you say is invariably true for a certain class of problems and tasks (finding popular recommendations on Amazon, finding home pages on Google).  But by biasing your algorithms to large data, you might make other classes of problems even more difficult.  Rather than repeat all the arguments, let me point you to a couple of places where I wrote about it a few months ago:</p>
<p><a href="http://irgupf.com/2009/04/09/retrievability/" rel="nofollow">http://irgupf.com/2009/04/09/retrievability/</a><br /><a href="http://irgupf.com/2009/04/23/retrievability-and-prague-cafes/" rel="nofollow">http://irgupf.com/2009/04/23/retrievability-and&#8230;</a><br /><a href="http://irgupf.com/2009/04/09/large-data-versus-limited-applicability/" rel="nofollow">http://irgupf.com/2009/04/09/large-data-versus-&#8230;</a></p>
<p>In a nutshell, large data allows you to solve certain types of problems well, but may end up making other types of problems much more difficult, if all you have is naive Bayes on top of that data making your inferences.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: alvisbrigis</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1794</link>
		<dc:creator>alvisbrigis</dc:creator>
		<pubDate>Sun, 30 Aug 2009 19:31:37 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1794</guid>
		<description>Right on Chris.  We&#039;re clearly witnessing the increase of volume of data (doubling every 18 months), local data structuring (social graphs, geo graphs, genome graphs, brain graphs, body system graphs, energy graphs, real time robot vision / environment graphs, etc) and combinatorial/macro data structuring (e.g. Twitter + Google Maps mashups), which is clearly adding to the capabilities of what we&#039;ve come to label AI.  &lt;br&gt;&lt;br&gt;AI exists for a purpose, a specific function or task.  Just like a basic lifeform needs relevant environmental information to increase it&#039;s chances of success, AI functions best with access to the richest, most rapidly computable, system/task-relevant data.  e.g. robots that can navigate the Darpa Road Challenge need maps, real-time road/environment sensors, ability to sense and determine the meaning of signs, etc.  -- Circling back to the original point, the algorithm is just part of the AI - the other part is an environment of structured data.  Intelligence arises from the interplay of the two, depends on the system context.  So we can expect the algorithms that most effectively draw on the best data available to them for given tasks to be most successful - that means ongoing rise of AI-ish bots tailored for / carefully tuned to new data environments increasingly capable of performing more complex tasks.  Clearly there&#039;s an expanding market for these (search being a huge part of that), as Norvig and company have realized.&lt;br&gt;&lt;br&gt;When considering complementary data sources and the drive to increase intelligence in the system, it&#039;s occurred to me that we generally appear to be trending toward the super-structuring of all data (the everything graph), or total system quantification.  By cross-referencing different rich data sets, we can interpolate value, push toward quantification / state closure, generating much value and &quot;intelligence&quot; along the way.  If it becomes understood that this process is making our system smarter, then data may continue to centralize, be drawn together for certain higher uses, thus commoditizing current data structures, algorithms, and combinations thereof.&lt;br&gt;&lt;br&gt;Related articles that explore these thoughts:&lt;br&gt;&lt;a href=&quot;http://www.memebox.com/futureblogger/show/1591-total-systems-quantification-toward-the-everything-graph&quot; rel=&quot;nofollow&quot;&gt;http://www.memebox.com/futureblogger/show/1591-...&lt;/a&gt;&lt;br&gt;&lt;a href=&quot;http://memebox.com/futureblogger/show/1518-intelligence-rising-climbing-the-stairs-of-abstraction&quot; rel=&quot;nofollow&quot;&gt;http://memebox.com/futureblogger/show/1518-inte...&lt;/a&gt;&lt;br&gt;&lt;a href=&quot;http://socialnode.blogspot.com/2009/06/simulation-era.html&quot; rel=&quot;nofollow&quot;&gt;http://socialnode.blogspot.com/2009/06/simulati...&lt;/a&gt;</description>
		<content:encoded><![CDATA[<p>Right on Chris.  We&#39;re clearly witnessing the increase of volume of data (doubling every 18 months), local data structuring (social graphs, geo graphs, genome graphs, brain graphs, body system graphs, energy graphs, real time robot vision / environment graphs, etc) and combinatorial/macro data structuring (e.g. Twitter + Google Maps mashups), which is clearly adding to the capabilities of what we&#39;ve come to label AI.  </p>
<p>AI exists for a purpose, a specific function or task.  Just like a basic lifeform needs relevant environmental information to increase it&#39;s chances of success, AI functions best with access to the richest, most rapidly computable, system/task-relevant data.  e.g. robots that can navigate the Darpa Road Challenge need maps, real-time road/environment sensors, ability to sense and determine the meaning of signs, etc.  &#8212; Circling back to the original point, the algorithm is just part of the AI &#8211; the other part is an environment of structured data.  Intelligence arises from the interplay of the two, depends on the system context.  So we can expect the algorithms that most effectively draw on the best data available to them for given tasks to be most successful &#8211; that means ongoing rise of AI-ish bots tailored for / carefully tuned to new data environments increasingly capable of performing more complex tasks.  Clearly there&#39;s an expanding market for these (search being a huge part of that), as Norvig and company have realized.</p>
<p>When considering complementary data sources and the drive to increase intelligence in the system, it&#39;s occurred to me that we generally appear to be trending toward the super-structuring of all data (the everything graph), or total system quantification.  By cross-referencing different rich data sets, we can interpolate value, push toward quantification / state closure, generating much value and &#8220;intelligence&#8221; along the way.  If it becomes understood that this process is making our system smarter, then data may continue to centralize, be drawn together for certain higher uses, thus commoditizing current data structures, algorithms, and combinations thereof.</p>
<p>Related articles that explore these thoughts:<br /><a href="http://www.memebox.com/futureblogger/show/1591-total-systems-quantification-toward-the-everything-graph" rel="nofollow">http://www.memebox.com/futureblogger/show/1591-&#8230;</a><br /><a href="http://memebox.com/futureblogger/show/1518-intelligence-rising-climbing-the-stairs-of-abstraction" rel="nofollow">http://memebox.com/futureblogger/show/1518-inte&#8230;</a><br /><a href="http://socialnode.blogspot.com/2009/06/simulation-era.html" rel="nofollow">http://socialnode.blogspot.com/2009/06/simulati&#8230;</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Linkpost &#124; 8.30.2009 - L&#38;C Tech Talk</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1788</link>
		<dc:creator>Linkpost &#124; 8.30.2009 - L&#38;C Tech Talk</dc:creator>
		<pubDate>Sun, 30 Aug 2009 19:01:06 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1788</guid>
		<description>[...] To make smarter systems, it&#8217;s all about the data &#8211; Forget about the algorithms. Big breakthroughs can come via better sources of [...]</description>
		<content:encoded><![CDATA[<p>[...] To make smarter systems, it&#8217;s all about the data &#8211; Forget about the algorithms. Big breakthroughs can come via better sources of [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: cdixon</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1789</link>
		<dc:creator>cdixon</dc:creator>
		<pubDate>Sun, 30 Aug 2009 17:02:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1789</guid>
		<description>It&#039;s a good question and one I don&#039;t really know the answer to.</description>
		<content:encoded><![CDATA[<p>It&#39;s a good question and one I don&#39;t really know the answer to.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: shansinha79</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1781</link>
		<dc:creator>shansinha79</dc:creator>
		<pubDate>Sun, 30 Aug 2009 15:25:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1781</guid>
		<description>Chris- totally agree with you on this..&lt;br&gt;&lt;br&gt;there&#039;s only one thing that really bothers me about the Netflix competition (which I would imagine must not be unique)... what was it about the 10% threshold that so many people and teams ended up getting to 8 and 9% relatively quickly, but the last year or two of the competition ended up being dominated by incremental progress to the 10% mark.&lt;br&gt;&lt;br&gt;Do you think if they had set the threshold to 15% the results would have been different?  Do you think the Netflix team had some unique insight that there was some magic boundary around 10%...</description>
		<content:encoded><![CDATA[<p>Chris- totally agree with you on this..</p>
<p>there&#39;s only one thing that really bothers me about the Netflix competition (which I would imagine must not be unique)&#8230; what was it about the 10% threshold that so many people and teams ended up getting to 8 and 9% relatively quickly, but the last year or two of the competition ended up being dominated by incremental progress to the 10% mark.</p>
<p>Do you think if they had set the threshold to 15% the results would have been different?  Do you think the Netflix team had some unique insight that there was some magic boundary around 10%&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Blogs I Read: Chris Dixon (cdixon.org) &#124; The Noisy Channel</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1777</link>
		<dc:creator>Blogs I Read: Chris Dixon (cdixon.org) &#124; The Noisy Channel</dc:creator>
		<pubDate>Sun, 30 Aug 2009 15:16:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1777</guid>
		<description>[...] To make smarter systems, it’s all about the data [...]</description>
		<content:encoded><![CDATA[<p>[...] To make smarter systems, it’s all about the data [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joseph Turian</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1780</link>
		<dc:creator>Joseph Turian</dc:creator>
		<pubDate>Sun, 30 Aug 2009 14:53:17 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1780</guid>
		<description>It is not clear how structuring data can be monetized, in many applications. Many of the consumers of your structured data will not initially be sure that they can monetize and pay for whatever they do with your information. So if you charge them up-front, you lose many potential customers. But if you delay charging them, you might fail to monetize your offering. How do you solve this conundrum?</description>
		<content:encoded><![CDATA[<p>It is not clear how structuring data can be monetized, in many applications. Many of the consumers of your structured data will not initially be sure that they can monetize and pay for whatever they do with your information. So if you charge them up-front, you lose many potential customers. But if you delay charging them, you might fail to monetize your offering. How do you solve this conundrum?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: cdixon</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1779</link>
		<dc:creator>cdixon</dc:creator>
		<pubDate>Sun, 30 Aug 2009 14:52:33 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1779</guid>
		<description>That&#039;s a fair point.  I&#039;m coming from the perspective of the technology startup world where you often see companies attacking the same problems and sometimes claim they&#039;ve made a purely algorithmic breakthrough.  I&#039;m skeptical of that.</description>
		<content:encoded><![CDATA[<p>That&#39;s a fair point.  I&#39;m coming from the perspective of the technology startup world where you often see companies attacking the same problems and sometimes claim they&#39;ve made a purely algorithmic breakthrough.  I&#39;m skeptical of that.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joseph Turian</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1778</link>
		<dc:creator>Joseph Turian</dc:creator>
		<pubDate>Sun, 30 Aug 2009 14:49:15 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1778</guid>
		<description>In addition to UI (which is the user experience with the ML), there is I think also the business interface with the ML.&lt;br&gt;&lt;br&gt;Improved solutions to existing problems might not have nearly as large an impact as figuring out &quot;new&quot; problems that can be solved using ML.</description>
		<content:encoded><![CDATA[<p>In addition to UI (which is the user experience with the ML), there is I think also the business interface with the ML.</p>
<p>Improved solutions to existing problems might not have nearly as large an impact as figuring out &#8220;new&#8221; problems that can be solved using ML.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: cdixon</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1776</link>
		<dc:creator>cdixon</dc:creator>
		<pubDate>Sun, 30 Aug 2009 13:13:43 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1776</guid>
		<description>fascinating, thanks.  I will definitely read it.  I should mention I&#039;m generally highly influenced in my views by my good friend Michael Kearns, who I know is friends with Fernando among others, so much of what I&#039;m saying might be indirectly coming from these authors.</description>
		<content:encoded><![CDATA[<p>fascinating, thanks.  I will definitely read it.  I should mention I&#39;m generally highly influenced in my views by my good friend Michael Kearns, who I know is friends with Fernando among others, so much of what I&#39;m saying might be indirectly coming from these authors.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jebboreon</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1774</link>
		<dc:creator>jebboreon</dc:creator>
		<pubDate>Sun, 30 Aug 2009 13:06:48 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1774</guid>
		<description>&quot;What I think this view misses (but I suspect the companies covered in the article understand) is that significant AI breakthroughs come from identifying or creating new sources of data, not inventing new algorithms.&quot;&lt;br&gt;&lt;br&gt;Hi Chris, the companies are indeed very aware of the centrality of data.  You and your readers may want to check out this recent article by three of Google&#039;s research scientists:&lt;br&gt;&lt;br&gt;&quot;The Unreasonable Effectiveness of Data&quot;&lt;br&gt;&lt;a href=&quot;http://googleresearch.blogspot.com/2009/03/unreasonable-effectiveness-of-data.html&quot; rel=&quot;nofollow&quot;&gt;http://googleresearch.blogspot.com/2009/03/unre...&lt;/a&gt;&lt;br&gt;&lt;a href=&quot;http://www.computer.org/portal/cms_docs_intelligent/intelligent/homepage/2009/x2exp.pdf&quot; rel=&quot;nofollow&quot;&gt;http://www.computer.org/portal/cms_docs_intelli...&lt;/a&gt;</description>
		<content:encoded><![CDATA[<p>&#8220;What I think this view misses (but I suspect the companies covered in the article understand) is that significant AI breakthroughs come from identifying or creating new sources of data, not inventing new algorithms.&#8221;</p>
<p>Hi Chris, the companies are indeed very aware of the centrality of data.  You and your readers may want to check out this recent article by three of Google&#39;s research scientists:</p>
<p>&#8220;The Unreasonable Effectiveness of Data&#8221;<br /><a href="http://googleresearch.blogspot.com/2009/03/unreasonable-effectiveness-of-data.html" rel="nofollow">http://googleresearch.blogspot.com/2009/03/unre&#8230;</a><br /><a href="http://www.computer.org/portal/cms_docs_intelligent/intelligent/homepage/2009/x2exp.pdf" rel="nofollow">http://www.computer.org/portal/cms_docs_intelli&#8230;</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: cdixon</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1773</link>
		<dc:creator>cdixon</dc:creator>
		<pubDate>Sun, 30 Aug 2009 13:01:22 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1773</guid>
		<description>patrick - i agree that the lack of structure on the web is an impediment and I&#039;m skeptical new algorithms by themselves will fix this.  We either need more structure or complementary data sources to help structure it.</description>
		<content:encoded><![CDATA[<p>patrick &#8211; i agree that the lack of structure on the web is an impediment and I&#39;m skeptical new algorithms by themselves will fix this.  We either need more structure or complementary data sources to help structure it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: aarondelcohen</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1772</link>
		<dc:creator>aarondelcohen</dc:creator>
		<pubDate>Sun, 30 Aug 2009 13:00:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1772</guid>
		<description>Yes.  But I like Chris corrollary to BW&#039;s Axiom:&lt;br&gt;&lt;br&gt;1. Axiom  Structure Unstructured Data:  &lt;br&gt;2.  Corollary:  Collect new pools of data to create competitive advantage (Chris maybe you can improve)&lt;br&gt;&lt;br&gt;Chris, good move to get disqus rolling</description>
		<content:encoded><![CDATA[<p>Yes.  But I like Chris corrollary to BW&#39;s Axiom:</p>
<p>1. Axiom  Structure Unstructured Data:  <br />2.  Corollary:  Collect new pools of data to create competitive advantage (Chris maybe you can improve)</p>
<p>Chris, good move to get disqus rolling</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: cdixon</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1771</link>
		<dc:creator>cdixon</dc:creator>
		<pubDate>Sun, 30 Aug 2009 12:58:34 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1771</guid>
		<description>Joseph - I agree UI is important too.  Especially for creating a feedback loop to gather more data :)</description>
		<content:encoded><![CDATA[<p>Joseph &#8211; I agree UI is important too.  Especially for creating a feedback loop to gather more data <img src='http://cdixon.org/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joseph Turian</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1770</link>
		<dc:creator>Joseph Turian</dc:creator>
		<pubDate>Sun, 30 Aug 2009 12:26:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1770</guid>
		<description>Actually, I don&#039;t think the main impediment to smarter systems is more data. The real problem is the terrible interface.&lt;br&gt;&lt;br&gt;Right now, the interface for using machine learning is to go talk to a scientist. BAAD interface.&lt;br&gt;Business people are not sure what technologies are applicable to their problem, or how machine learning can empower them. Users are confused by the interface to the artificial intelligence, which is either daunting or oversimplified.&lt;br&gt;&lt;br&gt;This is likely to remain the case in the foreseeable future. Until major improvements in natural language processing occur that natural language communication is a viable interface, the real gains will be seen by companies that can creatively and naturally interface machine learning for everyday users.&lt;br&gt;&lt;br&gt;For example, look at the WSJ story about My TiVo Thinks I&#039;m Gay: (&lt;a href=&quot;http://online.wsj.com/article_email/SB1038261936872356908.html&quot; rel=&quot;nofollow&quot;&gt;http://online.wsj.com/article_email/SB103826193...&lt;/a&gt;)&lt;br&gt;The problem is that the workings of the system are opaque.&lt;br&gt;This example underscores the importance of having an interface that communicates with the user, and attempts to convey its interpretation of the user&#039;s query.&lt;br&gt;&lt;br&gt;The companies with the real competitive advantages are not those that can eke out 5% more accuracy in topic models, it&#039;s the companies that can figure out how to make topic model useful and simple for everyday users.</description>
		<content:encoded><![CDATA[<p>Actually, I don&#39;t think the main impediment to smarter systems is more data. The real problem is the terrible interface.</p>
<p>Right now, the interface for using machine learning is to go talk to a scientist. BAAD interface.<br />Business people are not sure what technologies are applicable to their problem, or how machine learning can empower them. Users are confused by the interface to the artificial intelligence, which is either daunting or oversimplified.</p>
<p>This is likely to remain the case in the foreseeable future. Until major improvements in natural language processing occur that natural language communication is a viable interface, the real gains will be seen by companies that can creatively and naturally interface machine learning for everyday users.</p>
<p>For example, look at the WSJ story about My TiVo Thinks I&#39;m Gay: (<a href="http://online.wsj.com/article_email/SB1038261936872356908.html" rel="nofollow">http://online.wsj.com/article_email/SB103826193&#8230;</a>)<br />The problem is that the workings of the system are opaque.<br />This example underscores the importance of having an interface that communicates with the user, and attempts to convey its interpretation of the user&#39;s query.</p>
<p>The companies with the real competitive advantages are not those that can eke out 5% more accuracy in topic models, it&#39;s the companies that can figure out how to make topic model useful and simple for everyday users.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: lightbody</title>
		<link>http://cdixon.org/2009/08/30/to-make-smarter-systems-its-all-about-the-data/comment-page-1/#comment-1768</link>
		<dc:creator>lightbody</dc:creator>
		<pubDate>Sun, 30 Aug 2009 12:17:25 +0000</pubDate>
		<guid isPermaLink="false">http://www.cdixon.org/?p=340#comment-1768</guid>
		<description>Kindle is a good one. Also, a friend of mine who works at Mozilla on Firefox was telling me about the crazy amount of data they are capturing/will capture when you use Firefox. Much more advanced than basic URL history. That could be a great feed to have for systems automatically determining your interests and try to solve the &quot;firehose&quot; problem with most social networks today.</description>
		<content:encoded><![CDATA[<p>Kindle is a good one. Also, a friend of mine who works at Mozilla on Firefox was telling me about the crazy amount of data they are capturing/will capture when you use Firefox. Much more advanced than basic URL history. That could be a great feed to have for systems automatically determining your interests and try to solve the &#8220;firehose&#8221; problem with most social networks today.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic Page Served (once) in 3.569 seconds -->
<!-- Cached page served by WP-Cache -->

