Beyond Data Hoarding: A Blueprint for Responsible Data Management

Many businesses adhere to the notion that retaining all data since inception is a prudent step, driven by the belief that the data might hold some unforeseen value in the future. This instinct to retain aligns with our proclivity to keep items in our closets just in case we might need them someday. While this approach might initially appear conservative, it conceals several challenges for responsible data management, most notably an escalating epidemic of data hoarding.

One of the main drawbacks of retaining data is the financial implications tied to the technical resources such as storage, memory, compute, power, and bandwidth required to do so. Without a clear strategy for data retention, companies can find themselves spending substantial sums on storing and managing data that is of little or no value to their operations. Over time, these costs compound, contributing significantly to the Total Cost of Ownership of technology systems.

In addition, data hoarding contributes to inefficiency. Sifting through extensive data repositories can waste resources and prolong the process of uncovering valuable insights, impacting both the efficiency and productivity of the business over time.

Further, inefficient data management can contribute significantly to data security and privacy issues down the line. When data is poorly managed, it becomes challenging to monitor and protect, leaving it susceptible to breaches and unauthorized access. When privacy regulations are violated due to such inefficiencies, it can lead to reputational damage and hefty fines.

We are ‘overstoring’ data

Many companies don’t just store the data once; they maintain multiple iterations of their data. While it’s recommended to store up to three copies of data for disaster recovery and data protection, I’ve seen major enterprises possess in excess of 20 copies of their critical and large data. What’s worse, often company leaders don’t even know that duplicated data is consuming as much as 20 – 50 percent of their total database footprint!

While some of these copies are essential for High Availability (HA), Disaster Recovery (DR), and reporting, a significant portion of this data duplication arises from decisions or issues within architecture, process design, and human factors across the enterprise. The challenge isn’t simply solved by moving to a singular data storage model either. Rather, we need to foster a heightened awareness and comprehension of the enterprise’s overarching data strategy to determine when data duplication actually makes sense.

What’s more, data hoarding can increase the risk of data breaches, as it can be harder to secure and monitor larger volumes of data. Further, companies that hoard data may run afoul of data protection and privacy regulations, which increasingly require businesses to justify their data collection, retention and security practices.

But perhaps the most important impact of data hoarding is the amount of storage used and the resources required to support petabytes of data. The financial and environmental costs associated with data centers, even green data centers, can no longer be ignored. When you weigh the supposed benefits of data hoarding against all the financial and societal damage it could wreak, it becomes essential to focus on the reduction or elimination of data hoarding as part of a new ESG mindset.

What are companies doing to control data hoarding?
There are some companies that are doing a good job of controlling data hoarding by focusing on customer privacy along with policies and frameworks that place attention on data minimization.
DuckDuckGo uses a privacy-centric approach (it does not collect or share the personal information of its users), unlike other search engines that track and store user data for advertising purposes. Therefore, all searches are anonymous, and the same level of privacy is given to all users, but more importantly, much fewer records are maintained.
Oracle has a robust data lifecycle management strategy which includes a comprehensive data governance framework. They employ advanced techniques like data compression and tiering to manage, store, and archive data efficiently.

Salesforce has implemented a rigorous data management policy, focusing on collecting only essential data, using it responsibly, and maintaining transparency with customers. Their data minimization practices are built around customer trust, ensuring data privacy, and regulatory compliance.

SAP, a global software company, has robust data minimization policies. They have an extensive data governance structure, with firm regulations on data collection, processing, and retention, demonstrating their commitment to preventing data hoarding.

Steps to control data hoarding at your company

Here are some steps you take to control data hoarding at your company.

1. Implement data retention and lifecycle policies.

Establish and implement a comprehensive set of guidelines dictating the appropriate duration for retaining various categories of data. Consistently assess and enhance data retention and security protocols to encompass both data purging and safeguarding measures.

2. Perform data audits

Perform routine data audits to pinpoint redundant, outdated, and trivial (ROT) data. These audits will shed light on the kinds of data that are being accumulated unnecessarily and aid in rectifying this practice. During the audit, assess whether the data’s storage is justified based on Total Cost of Ownership (TCO) considerations and regulatory mandates.

Once you’ve identified data that requires securing or disposal through audits, institute policies and procedures to effectively address these identified needs.

3. Invest in Data Management and Compliance Tools

Allocate resources towards acquiring data management and compliance tools that can significantly enhance your data handling capabilities. Embrace automated data management tools to streamline the organization, categorization, and efficient management of data.

These versatile tools can also be instrumental to automate the data elimination process, following predefined criteria to ensure data retention aligns with regulatory and operational requirements.

4. Establish a data governance framework

Data governance is important for organizations as it establishes a structured framework to manage data effectively. By ensuring data quality, compliance with regulations, security, efficient management, accessibility, and accurate decision-making, data governance enhances operational efficiency, minimizes risks, and supports a culture of responsible data stewardship.

Such governance empowers organizations to extract meaningful insights from their data, make informed decisions, and navigate the complexities of modern data-driven landscapes with confidence.

5. Institute employee training and awareness

Provide regular training and awareness programs for employees regarding data management best practices and compliance guidelines. Educate them about the importance of data retention policies, security measures, and the implications of data hoarding.

Encouraging a culture of responsibility and knowledge will empower your workforce to make informed decisions about data storage, leading to more efficient data management practices.

Stricter data retention and privacy regulations, coupled with voluntary efforts to curtail the accumulation of data, represent the initial steps in addressing the broader challenge of data hoarding. However, these solutions are primarily reactive.

To truly confront the data hoarding dilemma, it’s essential to proactively incorporate measures into the architecture, processes, and applications we create for all types of data, whether related to monitoring, consumer behavior, or any other data domain. Failure to adopt these controls risks inviting a data apocalypse, a scenario that might already be looming unless we commit to responsible data management now.