How To Catch Terrorists

Posted by Christopher Smith Thu, 04 Oct 2007 16:30:00 GMT

I like Bruce Schneier. I loved reading Applied Cryptography in the days of my youth, not to mention The Electronic Privacy Papers, not to mention being generally impressed with his work on Blowfish and Twofish and generally liking his other books as well as his Crypto-Gram Newsletter and his various essays. So, it is with some trepidation that I’ve decided to publicly criticize his essay entitled How To Not Catch Terrorists.

First off, I don’t, in fact, take issue with his central thesis: data mining through the general population as a means of generating a list of suspects to be investigated by FBI agents doesn’t seem like a good way to hunt down terrorists. My problem is that it is a straw man argument that likely misrepresents the manner in which authorities are attempting to use data mining.

I know a thing or two about the applications of data mining. While not an expert in the field, I’ve worked in search engine companies for the last five years, three of which I spent in research, where I had a chance to talk to researchers about successes and failures of the field. So I have some sense of how one can usefully (and not so usefully) use data mining techniques. As Bruce states, it is a terribly useful tool in all kinds of situations, but it is far from a magic bullet. You need to know how to use it. His essay seems to assume the government is astonishingly ignorant in this regard, despite investing countless billions on the technology.

I think the central flaw in Mr. Schneier’s thinking likely stems from where his thinking started from, which I suspect is best summarized with this line:

There is something un-American about a government program that uses secret criteria to collect dossiers on innocent people and shares that information with various agencies, all without oversight.

Of course, the truth is, this kind of thing is (with apologies to Robert Wuhl) “as American as apple pie”, which is exactly why Mr. Schneier is fearful of it (I’d say “paranoid about it”, but that implies that his fear isn’t rationally justifiable).

So, let’s assume three things:

  1. The government isn’t completely incompetent with understanding how to employ these technologies (remember, this is all “brought to you by the people who brought you ECHELON”), and more importantly is informed by the multiple experts in the fields machine learning and data mining that they employ.
  2. Folks working on these programs are highly motivated to catch terrorists.
  3. Folks working on these programs are at least somewhat fearful of the same kind of abuse of powers that Mr. Schneier is, particularly if they think it could be directed at them or their friends and loved ones.

You might take issue with any of those assertions, but they all seem highly plausible to me, although my “American as apple pie” links make me somewhat dubious about #3. Frankly, if the other two aren’t true, we have got way bigger problems to worry about than a little misuse and abuse of data mining and machine learning techniques.

Now, if we go with these assumptions, how might the government employ something like ADVISE? Let me suggest some ways:

Sifting and sorting the raw data

This one comes right from the horse’s mouth. Mr. Schneier quotes Michael Chertoff as saying, “It is an experiment to see how you can better analyze data that you already have, that you’ve already legally collected, to see if you can understand it, sort it and make use of it more readily than simply doing it manually.” Let’s say for a moment, you’ve got all this data on 1000 people, and someone tells you, “there is a decent chance one of them is a terrorist with plans to attack innocent Americans in the coming year, could you go through this data yourself and give us a best guess as to who to investigate and in what order?” Okay, I’d probably get a team of people reading over every scrap of data right away. I’d start to collect more detailed data on the top candidates, but I’m not going to start asking for court orders, because all I’d have would be someone’s best guess as to whether this person is a terrorist. Now imagine you have the same data on 300 million people, or worse still, everyone on the planet? Are you really going to assemble a team to go over the data of all those people? Of course not! They’d likely retire and perhaps die before completing the task and by then the output would be moot anyway. Now, what if you could get a computer to sift through and sort the data such that it could produce a list of the top 1000 candidates. I’d sure be interested in that computer’s list. Sure, I’d know that I don’t even think there are anywhere near 1000 terrorists and odds are only like 1 in 100 that even one of the candidates is actually a terrorist, so I’m not going start arresting these people or opening up files on them, but I might assemble a team exactly like in the first example in order to refine the list further. Suddenly, I’ve gone from literally having no actionable data to being able to act on it, albeit in a limited way.

Trimming down a suspect list

Okay, our crack investigative team have identified and arrested some middle man who we know sold supplies to a terrorist with plans to strike. The problem is, this guy sold supplies to a lot of people, some innocent civilians, some organized crime types, and this one terrorist. We also have no idea of where or when he sold supplies to this terrorist. You do have a tape recording of him talking to the terrorist about that crazy waiter at Random Regional Restaurant. This guy has been in business for a while and has been successful, so he has a LOT of customers over the years. We have a customer list of all his customers, but the list is understandably quite extensive, possibly ten thousand customers over the years. You only have enough resources and time to investigate ten people in more detail. How do you decide which ten to look at? Well, you know that your target either lived in or visited Random Region in the past. You just might want to have a computer program go through the customer list, exclude all the candidates who’d never been to Random Region. You’d know that it is entirely possible that you don’t have complete data on the movements of all the individuals (particularly since some are shady characters who don’t like their movements tracked) but filtering in this manner brings you down to a small enough candidate list you can do some basic investigations, like calling each of them to see if their voice sounds like the one on the tape. From that, maybe you can identify candidates for further investigation.

Prioritizing

You have information suggesting someone might try to strike Metropolis in the next couple of months. You don’t know who, but you’ve narrowed it down to people either in Metropolis or who are visiting Metropolis in the next couple of months and you have some additional information that narrows the candidates down to a list you should be able to investigate in a month, now that you have wiretap authorizations for each of their phones. Here’s the problem: the terrorist could strike a lot sooner than a month. So, the order you investigate the candidates is very important, only you have no idea where to start beyond gut instinct! Wouldn’t it be nice to shove that list in to a computer and have it sort the list based on the probability that they are a terrorist and how soon they’ll be in Metropolis? Even if you knew the thing was only 70% accurate, you’d take it, because it would still give you better odds of stopping an incident than if you didn’t use it.

It is not to hard to come up with other scenarios, but these are the first three I could come up with that most closely resemble Mr. Schneier’s original straw man. All of these scenarios would benefit from the application of data mining and machine learning, even if you have crummy false positives and negatives and a target population of one that you are trying to identify. None of these are ridiculous “24”-style scenarios that never happen in reality.

So, what are the problems, beyond motivation, in Mr. Schneier’s reasoning?

Well, first, “base rate fallacy” is only a problem if the cost of a false positive or false negative is high enough as to outweigh the benefits of identifying at least some of the true positives. The classic example of this is with medical tests. Yes, if a disease is rare in a general population, even if there is a significant cost for doing subsequent tests, it may make economic sense to perform the test anyway if the savings from a correctly identified patient is astronomical enough. Sometimes only one true positive is enough to justify the expense of mislabeling half of a population. I see lawyers throwing up ads for Mesothelioma all over the place, even though they know most of the time the ads won’t be seen by a single person with the disease and that they’ll undoubtedly encourage several unsavory types to show up at their office and wasting costly staff time as they try to pretend to have the disease. Why? Because for every single real candidate they do identify is worth a jackpot of money to them.

Second, not having a “well defined terrorist profile” assumes that because one is using a computer a deductive reasoning model must be employed to reach a conclusion. Supervised learning methods like SVM’s excel at performing classifications based on doing the statistical equivalent of inductive reasoning. Sure, they get it wrong some of the time, but in cases where there a multitude of factors that help in the decision making process, these methods can often outperform a human working with the same data (and obviously taking far more time). Humans tend to excel at intuitively or logically identifying a few key indicators out of a possible set of millions, but computers excel at statistically identifying a complex model of interactions between millions of key indicators, which is handy if we don’t even know if any of the factors we’re considering have a real direct cause and effect relationship with the prediction we’re looking for.

Finally, there is an assumption that using these techniques necessarily leads to the government surveilling innocent people they’d otherwise have left alone. That is a matter of application rather than intrinsic to the technology. One could just as easily use the technology to identify innocent people that don’t need a file opened up on them. Heck, if we’d had this kind of thing in Hoover’s day, we might have been able to more easily identify the irregularities in how he was selecting candidates for wire tapping. We’re always hearing reports of authorities using racial profiling or just irregularities in one’s behavior leading to inappropriate scrutiny or persecution (remember the insanity over the whole Trench Coat Mafia thing?), well data mining and machine learning techniques could not only help identify inappropriate police behavior, but also provide a “second opinion” about whether someone was genuinely worthy of further investigation.

In general, tracking down the needle in the haystack is a very hard problem and one would want all tools available to do the job. Sure, I can see how the technology could be used to infringe upon people’s privacy, but that is no reason to throw out the baby with the bath water. Some basic oversight and rules ought to be sufficient to prevent the worst abuses..

Comments

Leave a response

Comments