Stored Hashcash

One of the greatest inventions in the history of computer security is Hashcash. Internet blights like spam and denial-of-service attacks are what economists call “tragedy of the commons” problems. They exploit the fact that it’s free to send email and make web requests. At zero cost, you can have a profitable business even at extremely low success rates.

One way to fix these problems is to impose tariffs that hurt bad actors without hurting good actors. For example, you could impose “postage fees” on every email and web request. Unfortunately, in practice, this is impossible, because you’d have to set up billing relationships between every computer that wants to communicate.

The brilliant idea behind Hashcash is to replace a monetary postage fee with a computational postage fee. In order to send an email, the sender first has to solve a math problem. Legitimate activities suffer an indiscernible delay, but illegitimate activities that require massive volume are hobbled.

Hashcash is a great idea, but cumbersome in practice. For example, the cost imposed on senders varies widely depending on the performance of their email servers. It also hinders legitimate bulk emails like clubs and retailers sending updates to their mailing lists.

The offline analogy to Hashcash is a postal system where senders are required to perform some work every time they want to send something. If you’re a lawyer, you need to practice some law before you send mail. If you’re a doctor, you need to cure something before you send mail. Etc. This of course would be a preposterous postal system.

Adam Smith called money “stored labor“. You do your work and then store your labor as money, which you can later exchange for labor stored by other people. Storing labor in the form of money turns out to be a very flexible system for trading labor, and far superior to the barter system of performing work whenever your counterparty performs work.

So Adam Smith’s version of Hashcash is a system where you get credits for doing computation. You store your computational credits and spend them at your leisure. If you want to send an email, you can spend a little stored Hashcash. If I send you an email and you reply, we’re even. If you send out a billion spam emails, it costs you a lot and undermines your spammy business model. 

There are other important problems that stored Hashcash could solve. Denial-of-service attacks are spam attacks except they happen on HTTP instead of SMTP and the payoff is ransom instead of spam offers. Computer scientists have long believed that pricing schemes could dramatically reduce network congestion. Like every large-scale distributed system, the Internet benefits when scarce resources are efficiently allocated.

It seems plausible that if a system like stored Hashcash were developed, some people would prefer to purchase stored Hashcash directly instead of generating it themselves. A market for stored Hashcash would emerge, with the value being some function of the supply and demand of scarce Internet resources.

So here’s my question: suppose someone invented a way to store Hashcash. It could dramatically reduce spam and denial-of-service attacks, and more efficiently allocate network bandwidth and other Internet resources. How valuable would stored Hashcash be?

Some thoughts on the iPhone contact list controversy and app security

1. I’ve heard rumors that lots of apps have been uploading user contact lists for years. One person who knows the iOS world well told me “if you download a lot of apps, your contact list is on 50 servers right now.” I don’t understand why Apple doesn’t have a permission dialog box for this (that said, I’m not sure that’s the best solution – see #4 below). Apple has dialogs for accessing location and for enabling push notifications. Accessing users’ contact lists seems like an obvious thing to ask permission for.

2. I don’t know what the product design motivations are for uploading contacts, but I assume there are legitimate ones. [commenters suggest it is mainly to notify users when their friends join the service].  If this or something similar is the goal, you could probably do it in a way that protects privacy by (convergently?) encrypting the phone numbers on the client side (I’m assuming the useful info is the phone numbers and not the names associated with the phone numbers since the names would be inconsistent across users).

3. Many commentators have suggested that a primary security risk is the fact that the data is transmitted in plain text. Encrypting over the wire is always a good idea but in reality “man-in-the-middle” attacks are extremely rare. I would worry primarily about the far more common cases of 1) someone (insider or outsider) stealing in the company’s database, 2) a government subpoena for the company’s database. The best protection against these risks is encrypting the data in such a way that hackers and the company itself can’t unencrypt it (or to not send the data to the servers in the first place).

A bad outcome from this controversy would be to have companies encrypt sensitive data over the network and then not encrypt it on their servers (the simplest way to do this is to switch to https, a technology that is much more about security theater than security reality). This would make it impossible for 3rd parties (e.g. white-hat hackers) to detect that sensitive data is being sent over the network but would keep the data vulnerable to server side breaches / subpeonas. Unless Apple or someone else steps in, I worry that this is what apps will do next. It is the quickest way to preserve product features and minimize PR risk.

4. I worry that by just adding tons of permission dialogs we are going back to the Microsoft IE/Active X model of security. With lots of permission popups, users get fatigued and confused and just end up clicking “Yes” to everything. And then the security model says: If the user says “yes”, and the app uses “best practices” like https, it can do whatever it wants. We saw how this played out with the spyware/adware epidemic on the web from 2001-2006 and it wasn’t pretty.

 

Collective knowledge systems

I think you could make a strong argument that the most important technologies developed over the last decade are a set of systems that are sometimes called “collective knowledge systems”.

The most successful collective knowledge system is the combination of Google plus the web. Of course Google was originally intended to be just a search engine, and the web just a collection of interlinked documents. But together they provide a very efficient system for surfacing the smartest thoughts on almost any topic from almost any person.

The second most successful collective knowledge system is Wikipedia. Back in 2001, most people thought Wikipedia was a wacky project that would at best end up being a quirky “toy” encyclopedia. Instead it has become a remarkably comprehensive and accurate resource that most internet users access every day.

Other well-known and mostly successful collective knowledge systems include “answer” sites like Yahoo Answers, review sites like Yelp, and link sharing sites like Delicious.  My own company Hunch is a collective knowledge system for recommendations, building on ideas originally developed by “collaborative filtering” pioneer Firefly and the recommendation systems built into Amazon and Netflix.

Dealing with information overload

It has been widely noted that the amount of information in the world and in digital form has been growing exponentially. One way to make sense of all this information is to try to structure it after it is created. This method has proven to be, at best, partially effective (for a state-of-the-art attempt at doing simple information classification, try Google Squared).

It turns out that imposing even minimal structure on information, especially as it is being created, goes a long way. This is what successful collective knowledge systems do. Google would be vastly less effective if the web didn’t have tags and links. Wikipedia is highly structured, with an extensive organizational hierarchy and set of rules and norms. Yahoo Answers has a reputation and voting system that allows good answers to bubble up. Flickr and Delicious encourage user to explicitly tag items instead of trying to infer tags later via image recognition and text classification.

Importance of collective knowledge systems

There are very practical, pressing needs for better collective knowledge systems. For example, noted security researcher Bruce Schneier argues that the United States’ biggest anti-terrorism intelligence challenge is to build a collective knowledge system across disconnected agencies:

What we need is an intelligence community that shares ideas and hunches and facts on their versions of Facebook, Twitter and wikis. We need the bottom-up organization that has made the Internet the greatest collection of human knowledge and ideas ever assembled.

The same could be said of every organization, large and small, formal and and informal, that wants to get maximum value from the knowledge of its members.

Collective knowledge systems also have pure academic value. When Artificial Intelligence was first being seriously developed in the 1950′s, experts optimistically predicted they’d create machines that were as intelligent as humans in the near future.  In 1965, AI expert Herbert Simon predicted that “machines will be capable, within twenty years, of doing any work a man can do.”

While AI has had notable victories (e.g. chess), and produced an excellent set of tools that laid the groundwork for things like web search, it is nowhere close to achieving its goal of matching – let alone surpassing – human intelligence. If machines will ever be smart (and eventually try to destroy humanity?), collective knowledge systems are the best bet.

Design principles

Should the US government just try putting up a wiki or micro-messaging service and see what happens? How should such a system be structured? Should users be assigned reputations and tagged by expertise? What is the unit of a “contribution”? How much structure should those contributions be required to have? Should there be incentives to contribute? How can the system be structured to “learn” most efficiently? How do you balance requiring up front structure with ease of use?

These are the kind of questions you might think are being researched by academic computer scientists. Unfortunately, academic computer scientists still seem to model their field after the “hard sciences” instead of what they should modeling it after — social sciences like economics or sociology. As a result, computer scientists spend a lot of time dreaming up new programming languages, operating system architectures, and encryption schemes that, for the most part, sadly, nobody will every use.

Meanwhile the really important questions related to information and computer science are mostly being ignored (there are notable exceptions, such as MIT’s Center for Collective Intelligence). Instead most of the work is being done informally and unsystematically by startups, research groups at large companies like Google, and a small group of multi-disciplinary academics like Clay Shirky and Duncan Watts.

Security through diversity

Someone asked me the other day whether I thought the United States was vulnerable to a large scale “cyber” attack. While I have no doubt that any particular organization can be compromised, what comforts me at the national level is the sheer diversity of our systems. We have – unintentionally – employed a very effective defensive strategy known as “security through diversity.”

Every organization’s IT system is composed of multiple layers: credential systems, firewalls, intrusion detection systems, tripwires, databases, web servers, OS builds, encryption schemes, network topologies, etc.  Due to a variety of factors — competitive markets for IT products, lack of standards, diversity of IT managers’ preferences — most institutions make independent and varied choices at each layer. This, in turn, means that each insitution requires a customized attack in order to be penetrated. It is therefore virtually impossible for a single software program (virus, worm) to infiltrate a large portion of them.

On the web, a particular form of uniformity that can be dangerous are the centralized login systems like Facebook Connect. But this is preferable to the current dominant “single sign on system”:  most regular people use the same weak password over and over for every site because it’s too hard to remember more than that (let along multiple strong passwords). This means attackers only need to penetrate one weak link (like the recent Rock You breach), and they get passwords that likely work on many other sites (including presumably banking and other “important” sites).  At least with Facebook Connect there is a well funded, technically savvy organization defending its centralized repository of passwords.

I first heard the phrase “security through diversity” from David Ackley who was working on creating operating systems that had randomly mutated instances (similar ideas have since become standard practice, e.g. stack and address space randomization). It struck me as a good idea and one that should be built into systems intentionally. But meanwhile we get many of the benefits unintentionally. The same factors that frustrate you when you try to transfer your medical records between doctors or network the devices in your house are also what help keep us safe.

Information security – are we experiencing a Pax Romana?

My last startup was an information security company — SiteAdvisor — that was acquired by McAfee, where I then worked for a while. I am no longer working in security, but have many friends that do and I try to stay in touch with what’s going on in the area.

The widespread sense I get is that we are going through a period of unusual calm, especially on the consumer side.   Instead of repeating the historical pattern where new types of threats emerge every few years, we’ve seen the opposite: threat types have actually gone away or been seriously mitigated. Spyware/adware is basically gone, as most of the businesses that were pushing it (yes, it was mostly driven by legal, US-based businesses) have gone bankrupt.  Spam has been mostly controlled, at least if you use Gmail or a good spam filter like Postini.  If you use a Mac you don’t have to worry about viruses or malware.  Mobile security hasn’t ever really become an issue, mostly because the telecom carriers (and now Apple) carefully screen the installation of 3rd party apps.  Identity theft is a real issue but not really something consumers can do anything about – most of it happens offline or through enterprise data center breaches.

On the enterprise and government side, things are more turbulent.   Distributed denial of service attacks using botnets remain almost impossible to defend against. There have been a number of breaches of sensitive consumer information and those will likely only get more common, especially as more information gets centralized in the cloud. Military and terrorist computer attacks also seem to be a likely future threat.

All in all, though, the good guys have been keeping the bad guys down.  This relative calm is generally great news for the computer users, but – let’s be honest – bad news for the computer security industry and venture capital investors.  As an investor, I’ve only made one security investment in the last few years — in a cloud security startup called Vaultive. Everything else I’ve seen seems to be trying to solve non-problems or rehashing solutions that were developed years ago.

Inevitably, the calm will end and new classes of threats will emerge. But for now we should enjoy the relative peace.