Robert Lemos

…articles and musings of a technology and science journalist

Robert Lemos header image 2

NSA probably can do napkin math

July 10th, 2006 · No Comments

A brief analysis of the National Security Agency’s eavesdropping program using Bayes Theorem concluded that the program has no value for fighting terrorism, but seems to make the mistake of assuming the program operates in a vacuum.

The analysis was written in late May by Floyd Rudmin, a professor of social and community psychology at the University of Tromsø in Norway but highlighted by security researcher Bruce Schneier on his blog. The brief report uses Bayesian probability to argue that–even with a 70 percent probability of detecting a terrorist and a 0.01 percent chance of falsely flagging a citizen as a terrorist–that only about 2 percent of people fingered by the system would be a terrorist.

“The whole NSA domestic spying program will seem to work well, will seem logical and possible, if you are paranoid,” Rudmin stated. “Instead of presuming there are 1,000 terrorists in the USA, presume there are 1 million terrorists.”

The NSA has come under fire in recent months for its alleged surveillance program that eavesdrops on American citizens’ phone calls and e-mail messages to foreign terrorist suspects. The accusation that the NSA tapped the communications of U.S. citizens came last December in a New York Times article. A follow-up article by the same reporters asserted that the agency is also mining large amounts of Internet connection and e-mail routing data to find patterns that could link certain people with terrorists. Two class-action lawsuits have already been filed against AT&T following revelations that the company allowed the NSA to have full access to any data crossing its network.

In May, a USA Today article claimed that the NSA obtained the phone records of tens of millions of Americans through the cooperation of AT&T, Verizon and BellSouth. The latter two companies have denied cooperating with the agency. The Federal Communications Commission refused to investigate whether the telecommunications companies broke the law by allegedly cooperating with the NSA.

As anyone who works in the anti-spam, anti-virus or content filtering community might suspect, there seems to be a problem with the Rudmin’s analysis.

The professor assumes that the NSA has thrown all its eggs into a single basket–that is, that the agency only pays attention to the results of its surveillance monitoring and that being flagged by the surveillance system is enough to get a suspect interrogated. However, if the NSA merely marked the person as suspicious and then used a second system–perhaps, the recently outed system to detect terrorist financing–to vet the list of suspicious people, the results improve. For example, a second system with the same chance to detect a terrorist and a higher false positive rate of 0.1 percent, could process the list of suspicious people and create a list of suspects among whom 93 percent would be terrorists, according to Bayes’ Theorem. (And, it would catch 49 percent of the total terrorists in the U.S.)

(To be fair, this is my math and, considering my engineering degree is likely beyond its expiration date, it could be wrong. Feel free to do your own napkin math as well.)

Just as today’s antivirus and antispam systems use heuristics, signatures and behavioral analysis to increase the likelihood of catching malicious code and decreasing the likelihood of flagging good code as bad, a comparable NSA system could use various detection mechanisms to bolster each other for better results. The more important question to be debating is not whether it works, but for a modern democracy, does it make sense.

Note: I’ve already had one harsh critique of the mathematics, which referred to my napkin math as “amazingly sloppy thinking.” As part of the discussion, I realized that I did not stress that any second system would have to be measuring events that are statistically independent of the those measured by the first. The ultimate question is whether the entirety of the NSA’s surveillance capabilities can be expressed in a single Bayesian model, If that’s so, then perhaps the false positive rate (while it seems ridiculously low on first mention) could actually be smaller. Yet, then we start getting into a realm where the position on an issue is determining the mathematics rather than the other way around. (3:50 p.m. EST)

Tags: Blog · Critical infrastructure · Government · Privacy · Security

0 responses so far ↓

  • There are no comments yet...Kick things off by filling out the form below.

You must log in to post a comment.