dark data

Data Analytics Part 2: Dark Data

Get the Scoop on Dark Data

Congratulations, your business is online, you have gone “paperless,” you are dutifully collecting “Big Data,” and backup religiously. You have got plenty of storage because it is cheap and you never know when you are gonna need it.

The problem is that data does not take up space that we can see, so it is easy to ignore. Not unlike the storage room that’s filled with the detritus of modern life, reams of data are mindlessly stored on servers daily filling up that storage space, costing you money, and leaving your enterprise vulnerable to attack, appropriation, and misuse.

The truth is, much of the data you collect in your daily operations are what most would consider unusable. IBM analytics experts believe that “…about 90 percent of data generated by most sensors and A-to-D (analog to digital) conversions on the market never get utilized, and 60 percent of that data loses its true value within milliseconds.”(source 1)

This mountain of bits and bytes is known as “dark data,” which is defined as…

“… the information assets organizations collect, process and store during regular business activities, but fail to use for other purposes (for example, analytics, business relationships and direct monetizing). Similar to dark matter in physics, dark data often comprises most organizations’ universe of information assets. Thus, organizations often retain dark data for compliance purposes only.”(source 2)

Do you know what your systems are collecting? Do you know what may be harmful to your company, clients, customers, and constituents? Do you know what to keep, what to delete, what data might be worth analyzing, monetizing or reporting on? Do you know how to locate it, utilize it, protect it, get rid of it?

Dark data is so prevalent and so much of a problem that the most egregious portion has its own moniker, ROT, which stands for Redundant, Obsolete and Trivial data. The first pass in sorting out this mess is looking for the ROT in your pile. It has been said that as little as 33% and as much as 70% of the storage in an unmanaged server is filled with such useless data.(source 3)

The costs of managing this easily ignored data are seldom noticed until issues like these make it a problem:

  • Searching for mission-critical information in a glut of useless data can cause stress on staff, incur needless overtime pay, missed deadlines and incomplete or inaccurate work product
  • Failing to recognize and monetize untapped resources, one of the most widely heralded reasons for collecting big data, is not good for the bottom line
  • A security breach can damage a company’s reputation and cost business opportunities
  • Unforeseen, time sensitive demands for records in response to litigation can cause havoc, incur fines and ill will from the courts when they are not easily accessible

A few real world examples of data that has been mined and used in ways wholly unintended by the originators include:

  • Computer analytics companies mining Facebook posts to customize targeted advertising to influence buyer behavior
  • Insurance companies using price optimization software to determine which customers do not compare pricing and are vulnerable to price increases
  • Data breach monitoring sites becoming wide open resources for hackers

Facing your storage issues is not unlike rolling up your sleeves and tackling that extra room full of long forgotten trophies, worthless trinkets, family heirlooms, garbage, important papers, uncashed checks, in all, it is more junk than treasure.

Stop the Madness

Because data accumulates quickly, make your people aware of the problem right away and work to recognize types of data and develop the processes for managing them. Keep rules simple, intuitive and easy to follow or people will fall back on default behavior.

Develop Training

Once you’ve developed a process for storing data, all staff need to be trained in compliance. Because data is generated constantly, this is not a fix it and forget it problem.

Invest in Software

People cause this problem, but people alone can’t clean it up. Automate what you can. Finding the proper software will help prune needless time-sensitive, extraneous, redundant bits of data before ever reaching the server

Protect Vulnerable Data

Identify and protect data that can compromise your company, clients and the public at large.

Stop Adding Storage

Storage always seems to be to be filled. Limit the space and compliance becomes mandatory.

An organized data set makes information easier to process, find, and use. By making it clear what data needs to be saved, where it needs to be stored, and what should be deleted will save your company time, costs, and reputation and may even reveal unnoticed revenue streams.


  1. http://siliconangle.com/blog/2015/10/30/ibm-is-at-the-forefront-of-insight-economy-ibminsight/
  2. http://www.gartner.com/it-glossary/dark-data/
  3. http://info.aiim.org/digital-landfill/newaiimo/2011/09/20/5-myths-about-rot-redundant-obsolete-and-trivial-files


This article was originally published on LinkedIn: http://bit.ly/2mPTpYD

Robert Endo is the founder and Engagement Manager of Intrepid Data.

Intrepid Data is a full-service developer that builds platforms for web-based applications