Data identification: It’s time to think outside the box

July 15, 2020
Leading data identification solutions should function more and more like behind-the-scenes assistants

Data identification should be the first step to implementing a comprehensive data protection strategy. In fact, data identification holds the key to ensuring the effectiveness of downstream technologies, such as encryption, data loss prevention, data classification… the list goes on and on.

Unfortunately, it’s long been considered one of the hardest areas to address. The dilemma with data identification does not stem from the “why,” it clearly emanates from the “how,” as profiling, potentially sensitive and valuable data can be quite complicated. For starters, it’s hard to find and flag sensitive data as it can reside anywhere. Additionally, profiling data can be difficult when it contains attributes and characteristics that likely will change over time.

To streamline data identification, companies often assign data to predetermined “buckets” based on a broad yet a limited number of categories. While this “big picture” approach can help jumpstart the process, it lacks the granular view and in-depth understanding needed to keep pace as businesses and regulations evolve. As a result, this commonplace yet constrained approach can leave data vulnerable to security and compliance risks.

Identify, then Classify

That’s why it’s increasingly important to think outside of the box when finding and classifying sensitive information. Think of it as the essential precursor to data classification: You must know what and where data exists before it can be classified and protected.

Since data identification should inform and guide data classification, the ability to apply context, business logic and automation is crucial. Proper data identification considers legal, compliance and regulatory issues as it looks for information within emails, Word docs, shared files and other places where it could be hiding in plain sight.

A best-practices data identification approach requires a close look at corporate information to determine if any of the following apply:

  • Does the data have a certain sensitivity because it contains personally identifiable information (PII)?
  • Is it controlled unclassified information (CUI)?
  • Does it have a specific time-to-value or retention value?
  • Are there regulations, such as Payment Card Industry (PCI), attached?
  • Does the data include confidential Intellectual Property (IP)?
  • Is Protected Health Information (PHI) involved?

The availability of automated tools can increase the speed and accuracy of data identification while offering guidelines for determining business value. Additionally, data identification solutions that leverage metadata and data detection aided by machine learning can streamline necessary actions (e.g., whether data can move only within or beyond the enterprise). Being able to identify sensitive data in motion through email also helps minimize compliance and exposure risks.

Guidelines and Guardrails

Proper data identification offers much-need visibility into a day in the life of your organization, including who talks to whom and what kinds of emails are exchanged during normal business. Understanding interactions with legal, human resources and compliance teams also shed light on the frequency that potentially sensitive data travels both within, and outside of, the company.

The purpose of this assessment is not intended to scare people into thinking data identification is a multi-year, labor-intensive project with no end in sight. Rather, the intent is to offer guidelines and guardrails on how to identify sensitive data using agile methods that enable the addition of extra attributes, as new business, compliance and regulations dictate.

This is where automation makes a huge difference. By expediting the labeling of different types of data, users won’t have to think about 20 different questions every time they want to send an email. Modern data identification solutions rely on automation and machine learning to update multi-dimensional data profiles.

These solutions typically take a page or two from time-tested and field-proven processes employed by government agencies and the military. There are plenty of lessons to be gleaned from how these organizations remain diligent in focusing on the flow of data without losing sight of its value or impact on operations.

A Deeper, Wider View

Advancements in data identification empower enterprises of all types to go deeper and wider in their pursuits to find, understand and classify their data. Most importantly, they’re becoming more ingrained into overall data protection strategies, increasing the effectiveness of encryption, DLP and other solutions.

Leading data identification solutions should function more and more like behind-the-scenes assistants, sort of like a spellcheck feature that offers corrections and recommendations. This not only frees companies from trying to shove data into narrowly defined and constrained boxes, it frees data identification to become a more integral and vital part of successful data protection efforts.

About the author: Stephane Charbonneau is a co-founder of Titus and serves as its Chief Technology Officer. His background as an IT Security Architect helps bridge the gap between customer requirements and the product suites offered by the company. Steph has worked as a senior architect at a major U.S. financial institution and for several Canadian federal government departments. A frequent speaker at numerous security conferences and events worldwide, he holds an Honours Degree in Computer Science from the University of Waterloo.