Analyzing Big Data

Vendors are finally releasing security products that can take advantage of the trillions of bytes of information we generate

As our information world continues to generate an unfathomable amount of stored data, a surprisingly common term has been coined to describe this expanding mass — BIG, or Big Data, to be accurate. How big is BIG, you ask? The answer: big enough that conventional data structures and analysis cannot effectively deal with it.

IBM calculates that every day, 2.5 quintillion bytes (that’s a million trillion bytes, or exabytes) of new data is created. It further estimates that, by 2020, the amount of digital information created and replicated in the world will grow to 35 trillion gigabytes, hence the “big data” moniker.

Some of this data is highly structured (e.g. financial information) and other data — such as IP video, tweets and other social media data, is unstructured. The science of big data has given rise to a new job title — data scientist — or, the one who can scientifically and creatively make sense of all of this. As is the nature of big data, it is so large and complex that it is difficult-to-impossible to process using traditional database management tools or data processing applications. Still, with the right tools in hand, we can analyze this data and use it to our advantage as it relates to security.

When the STE editors asked me to write about this, I knew it would be a BIG task; so, I thought I would examine the subject from three perspectives which affect our security world: information security, security and crime event correlation, and video analysis. In this column, I will discuss the first two, and the video portion of the topic will appear in next month’s column.


Information Security

A reading of the daily papers and online services reminds us that the threats of hacking, malware and network attacks for fun and ill-gotten profit are only getting worse. Compare an information attack with an attack of the flu — we know that flu shots are only partially effective, and they do not protect against all flu strains. If I were to contract an especially bad case, I would want the best diagnosis possible, and I would hope that my doctor would research outside resources to come up with the best course of treatment.

Most enterprises have been hit with an information “flu” that had escaped their defenses. The sooner it can be tracked down and dispensed with, the less the damage due to critical information loss or compromise.

On January 30, RSA, the Security Division of EMC Corp. (request more info on RSA at, announced the release of RSA Security Analytics, described as “a transformational security monitoring and investigative solution designed to help organizations defend their digital assets against today’s most sophisticated internal and external threats.” The fact that techniques used to compromise information security are used on a broad basis creates the potential to spot trends and threats if sufficient data can be captured, correlated and analyzed.

The RSA product combines external threat intelligence with an analysis of internal traffic down to the packet level to feed an analytics and reporting engine that enables network security visibility, actionable intelligence and investigative capability on another level. In other words, it harnesses the “big data” to provide you with trends you can actually use as a security director (more on that below). The product’s investigation module has a patented metadata framework of organizing the data (e.g., nouns, verbs, etc.) in a way that supports timely investigations.

On Jan. 31, IBM (request more info on IBM at announced IBM Security Intelligence with Big Data which, the company says, combines “leading security intelligence with big data analytics capabilities for both external cyber security threats and internal risk detection and prevention.” This is achieved by “analyzing structured, enriched security data alongside unstructured enterprise data” and “helps find malicious activity hidden deep in the masses of an organization’s data.”

This content continues onto the next page...