NRC: All data mining programs should be evaluated for effectiveness

Council's report calls for more oversight of U.S. agencies' counterterrorism programs

Source XOD

Oct. 20, 2008

7 min read

2008 OCT 20 - (NewsRx.com) -- All U.S. agencies with counterterrorism programs that collect or "mine" personal data -- such as phone, medical, and travel records or Web sites visited -- should be required to systematically evaluate the programs' effectiveness, lawfulness, and impacts on privacy, says a new report from the National Research Council. Both classified and unclassified programs should be evaluated before they are set in motion and regularly thereafter for as long as they are in use, says the report. It offers a framework agencies can use to assess programs, including existing ones (see also National Academy of Sciences).

The report also says that Congress should re-examine existing law to assess how privacy can be protected in such programs, and should consider restricting how personal data are used. And it recommends that any individuals harmed by violations of privacy be given a meaningful form of redress.

"The danger of terror attacks on the U.S. is real and serious, and we should use the information technologies at our disposal to combat this threat," said William Perry, co-chair of the committee that wrote the report, former U.S. secretary of defense, and Michael and Barbara Berberian Professor at Stanford University. "However, the threat does not justify government activities that violate the law, or fundamental changes in the level of privacy protection to which Americans are entitled."

At the request of the U.S. Department of Homeland Security and the National Science Foundation, the report examines the technical effectiveness and privacy impacts of data-mining and behavioral surveillance techniques. Each time a person makes a telephone call, uses a credit card, pays taxes, or takes a trip, he or she leaves digital tracks, records that often end up in massive corporate or government databases. Through formal or informal agreements, government has access to much of the data owned by private-sector companies. Agencies use sophisticated techniques to mine some of these databases -- searching for information on particular suspects, and looking for unusual patterns of activity that may indicate a terrorist network.

The Reality of a Serious Terrorist Threat

The terrorist threat to the United States is all too real, the committee said. Terrorist acts are possible that could inflict enormous damage on the nation. Such acts could cause, and have caused, major casualties as well as severe economic and social disruption.

The most serious threat today comes from terrorist groups that are international in scope; these groups use the Internet to recruit, train, and plan operations and use public channels to communicate. Intercepting and analyzing these information streams might provide important clues about the nature of the threat they pose, the report says. Key clues might also be found in commercial and government databases that record a wide range of information about individuals, organizations, and their behavior. But successfully identifying signs of terrorist activity in these masses of data is extremely difficult, the committee said.

Pattern-Seeking Data-Mining Methods Are of Limited Usefulness

Routine forms of data mining can provide important assistance in the fight against terrorism by expanding and speeding traditional investigative work, the report says. For example, investigators can quickly search multiple databases to learn who has transferred money to or communicated with a suspect. More generally, if analysts have a historical basis for believing a certain pattern of activity is linked to terrorism, then mining for similar patterns may generate useful investigative leads.

Far more problematic are automated data-mining techniques that search databases for unusual patterns of activity not already known to be associated with terrorists, the report says. Although these methods have been useful in the private sector for spotting consumer fraud, they are less helpful for counterterrorism precisely because so little is known about what patterns indicate terrorist activity; as a result, they are likely to generate huge numbers of false leads. Such techniques might, however, have some value as secondary components of a counterterrorism system to assist human analysts. Actions such as arrest, search, or denial of rights should never be taken solely on the basis of an automated data-mining result, the report adds.

The committee also examined behavioral surveillance techniques, which try to identify terrorists by observing behavior or measuring physiological states. There is no scientific consensus on whether these techniques are ready for use at all in counterterrorism, the report says; at most they should be used for preliminary screening, to identify those who merit follow-up investigation. Further, they have enormous potential for privacy violations because they will inevitably force targeted individuals to explain and justify their mental and emotional states.

Oversight Needed to Protect Privacy, Prevent "Mission Creep"

Collecting and examining data to try to identify terrorists inevitably involves privacy violations, since even well-managed programs necessarily result in some "false positives" where innocent people are flagged as possible threats, and their personal information is examined. A mix of policy and technical safeguards could minimize these intrusions, the report says. Indeed, reducing the number of false positives also improves programs' effectiveness by focusing attention and resources on genuine threats.

Policymakers should consider establishing restrictions on the use of data, the committee said. Although some laws limit what types of data the government may collect, there are few legal limits on how agencies can use already-collected data, including those gathered by private companies. An agency could obtain and mine a database of financial records for counterterrorism purposes, for example, and then decide to use it for an entirely different purpose, such as uncovering tax evaders. Restrictions on use can help ensure that programs stay focused on the particular problems they were designed to address, and guard against unauthorized or unconsidered expansion of government surveillance power.

Poor-quality data are a major concern in protecting privacy because inaccuracies may cause data-mining algorithms to identify innocent people as threats, the report says. Linking data sources together tends to compound the problem; current literature suggests that a "mosaic" of data assembled from multiple databases is likely to be error-prone. Analysts and officials should be aware of this tendency toward errors and the consequent likelihood of false positives.

All information-based programs should be accompanied by robust, independent oversight to ensure that privacy safeguards are not bypassed in daily operations, the report says. Systems should log who accesses data, thus leaving a trail that can itself be mined to monitor for abuse.

The report notes that another area ripe for congressional action is legislation to clarify private-sector rights, responsibilities, and liability in turning over data to the government -- areas that are currently unclear. Although the committee did not recommend specific content for this legislation, it noted that private companies should not be held liable simply for complying with government requirements to turn over data.

A Framework to Assess Effectiveness, Privacy Impacts

The report offers two sets of criteria and questions to help agencies and policymakers evaluate data-based counterterrorism programs. One set is designed to determine whether a program is likely to be effective. For example, a system should be tested with a data set of adequate size to see if it will work when used on a large scale, and should be resistant to countermeasures. A second set of criteria assesses likely privacy impacts and helps ensure that, if implemented, the program protects privacy as much as possible. Each program should operate with the least amount of personal data consistent with its objective, for instance, and should have a process in place for the reporting and redress of privacy harms due to false positives.

These evaluations should involve independent experts, and the results should be made available to the broadest audience possible, the report says. Evaluations may result in a program being modified or even cancelled, it notes.

"We hope this framework will help agencies and policymakers determine whether new programs are likely to be effective and consistent with our nation's laws and values and continually improve programs in operation," said Charles Vest, committee co-chair and president of the National Academy of Engineering. "Decisions to use or continue programs need to be based on criteria more stringent than 'it's better than doing nothing.'"

The report was sponsored by the U.S. Department of Homeland Security and the National Science Foundation. The National Academy of Sciences, National Academy of Engineering, Institute of Medicine, and National Research Council make up the National Academies. They are private, nonprofit institutions that provide science, technology, and health policy advice under a congressional charter. The Research Council is the principal operating agency of the National Academy of Sciences and the National Academy of Engineering. A committee roster follows.