Analyzing Big Data

March 11, 2013
Vendors are finally releasing security products that can take advantage of the trillions of bytes of information we generate

As our information world continues to generate an unfathomable amount of stored data, a surprisingly common term has been coined to describe this expanding mass — BIG, or Big Data, to be accurate. How big is BIG, you ask? The answer: big enough that conventional data structures and analysis cannot effectively deal with it.

IBM calculates that every day, 2.5 quintillion bytes (that’s a million trillion bytes, or exabytes) of new data is created. It further estimates that, by 2020, the amount of digital information created and replicated in the world will grow to 35 trillion gigabytes. Some of this data is highly structured (e.g. financial information) and other data, such as IP video, is unstructured. The science of big data has given rise to a new job title — data scientist — or, the one who can scientifically and creatively make sense of all of this.

When the STE editors asked me to write about this, I knew it would be a BIG task; so, I thought I would examine the subject from three perspectives which affect our security world: information security, security and crime event correlation, and video analysis. In this column, I will discuss the first two, and the video as a topic will appear in next month’s column.

Information Security

A reading of the daily papers and online services reminds us that the threats of hacking, malware, and network attacks for fun and ill-gotten profit are only getting worse. Compare an information attack with an attack of the flu — we know that flu shots are only partially effective, and they don’t protect against all flu strains. If I were to contract an especially bad case, I would want the best diagnosis possible, and I would hope that my doctor would research outside resources to come up with the best course of treatment. Most enterprises have been hit with an information “flu” that escaped their defenses. The sooner it can be tracked down and dispensed with, the less the damage due to critical information loss or compromise.

On January 30, RSA, the Security Division of EMC Corp., announced the release of RSA Security Analytics, described as “a transformational security monitoring and investigative solution designed to help organizations defend their digital assets against today’s most sophisticated internal and external threats.” The fact that techniques used to compromise information security are used on a broad basis creates the potential to spot trends and threats if sufficient data can be captured, correlated and analyzed. The RSA product combines external threat intelligence with an analysis of internal traffic down to the packet level to feed an analytics and reporting engine that enables network security visibility, actionable intelligence and investigative capability on another level. Logs and packets are captured by a decoder appliance that collects, reassembles and normalizes traffic at OSI Layers 2-7. The RSA Investigation module has a patented metadata framework of organizing the data (e.g., nouns, verbs, etc.) in a way that supports timely investigation.

On January 31, IBM announced IBM Security Intelligence with Big Data which, the company says, combines “leading security intelligence with big data analytics capabilities for both external cyber security threats and internal risk detection and prevention.” This is achieved by “analyzing structured, enriched security data alongside unstructured enterprise data” and “helps find malicious activity hidden deep in the masses of an organization’s data.” Structured data includes alerts from security devices, operating system logs, DNS transactions and network flows, while unstructured data could be email, social media interactions, full packet information or business transactions. In their announcement, IBM specifically mentioned the vulnerabilities due to inside threats.

Security and Crime Event Correlation

The same types of big data analysis techniques that are inherent in products like RSA’s and IBM’s can be used to provide additional insights into patterns of crime and potential physical security vulnerabilities.

In law enforcement, efforts such as predictive policing and intelligence-led policing (ILP) have evolved, based on methodologies for assembling data from disparate sources and tools such as GIS, applying analysis, and using the results to guide decision making. In moving from a reactive mode to proactive, the hope is effective anticipation, leading to the prevention or response to predicted crime. As these techniques become more refined and proven effective, the increasingly limited dollars available for public safety can be better targeted, including risk-based deployment of resources. 

There appears to be no shortage of data or statistics, but, until now, these predictive efforts have been limited by available analysis techniques, hindering law enforcement’s ability to interpret and use the data. It is easy to see how big data analytics will also be a major tool in fighting fraud, credit card theft and identity theft. This will no doubt encompass access control data, both physical and network and ultimately affect the way Physical Security Information Management (PSIM) systems are implemented.

To sum it up, we’re on the threshold of something BIG.

Ray Coulombe is Founder and Managing Director of, enabling interaction with specifiers in the physical security and ITS markets; and Principal Consultant for Gilwell Technology Services. Ray can be reached at [email protected], through LinkedIn at or followed on Twitter @RayCoulombe.

About the Author

Ray Coulombe

Ray Coulombe is founder of, the industry’s largest searchable database of specifiers in the physical security and ITS markets. He is also Principal Consultant for Gilwell Technology Services. He can be reached at [email protected] or through LinkedIn.