In today’s business, nothing is more important than information. Data lost, corrupted or in the wrong hands can cripple a business. Inability to get at information you need when you need it can be nearly as bad. And woe betide any publicly-traded firm that can’t produce what it is legally required to have on file when regulators come calling.
Meanwhile, the volume of data keeps growing, helped along not only by technology that produces it faster and faster but by those same regulatory requirements, which have imposed more and more limits on what businesses dare to get rid of.
According to a study conducted by research firm Gartner Inc. of Stamford, Conn., last August, the top three issues for people responsible for data storage are compliance, information lifecycle management and security. Actually the three are all interrelated, each affecting the others and all of them propelling interest in the same products and technologies.
New laws in the past couple of years, such as Canada’s Bill C-198 and the United States’ Sarbanes-Oxley Act, have laid out more specific requirements for publicly traded businesses to keep certain kinds of information for specified periods, and imposed harsh penalties on senior executives whose firms fail to do so. There’s nothing like the threat of the boss going to jail to get top management interested in information management practices. The result has been a flurry of activity, reviewing and revising procedures and in many cases adding storage capacity to handle the new requirements.
“That old strategy of using backup tape as an archiving strategy does not suffice,” says Eric Lottman, business development manager for information lifecycle management at Hewlett-Packard (Canada) Ltd. in Mississauga, Ont.
The compliance flap has helped draw attention to several emerging technologies. One of these is content-addressed storage, which Enterprise Strategy Group Inc., a Milford, Mass., research firm, estimates will be a US$1 billion market this year, roughly doubling in 2007, says Tony Asaro, senior analyst at ESG.
Traditional storage technology is meant for data that changes, like the data in an accounting system. “No-one had ever built a storage system and taken into account the nature and characteristics of a system designed for fixed content,” says Ken Steinhardt, director of technology analysis at EMC.
Content-addressed — sometimes called object-based — storage helps with this because it stores data as objects with metadata attached. This metadata includes a sort of digital signature that verifies that the data has not changed. This is important for two reasons, Steinhardt explains: It satisfies the regulatory need for proof that information hasn’t been altered, and it makes it possible to check data for integrity.
All storage media are vulnerable to “bit flipping” — over time, small errors can be introduced into the data they store. If data is stored long enough, enough errors can accumulate to become a problem. EMC’s Centera content-addressed storage can check stored data for such errors and correct them.
CAS can use metadata to record how long data needs to be kept, preventing it from being deleted too soon and if desired deleting it automatically when the required retention period ends.
Regulatory requirements not the only factor
And CAS allows information to be accessible readily enough to satisfy regulatory needs, without tying up high-performance disk capacity with data that must be kept for years but may never be used. “It all comes down to having a lower-cost tier of storage to keep online infrequently accessed data,” Asaro says.
While compliance concerns are a major reason for increased interest in content-addressed storage, it also ties in nicely with information lifecycle management. CAS’s metadata approach makes storage location-independent, so data can be moved around without affecting applications that use it, says Christina Casten, EMC’s director of strategic programs and chair of a Storage Networking Industry Association (SNIA) committee that is developing a CAS interface standard.
Steinhardt says compliance legislation is just one factor in the data proliferation driving interest in ILM. IT departments are “faced with volumes of information that are just growing exponentially,” he says, “and yet IT budgets have been relatively flat.” The only answer is to use storage more efficiently. That means keeping frequently used data on high-performance storage for the fastest possible access, and moving less-used data to cheaper, lower-performance media ranging from slower disk to tape. But all this has to be managed automatically.
Lifecycle management requires metadata
Until quite recently, Steinhardt says, most IT shops took a one-size-fits-all approach to storage. They either kept everything on high-performance disk, at higher than necessary cost, or they used the least expensive media available — “those are the people who in the last couple of years you tend to read about in industry magazines where data is lost.”
As the volumes of data make the one-size-fits-all approach increasingly impractical, tiers of storage for different requirements are gaining popularity.
“Capacity by itself is a band-aid if you don’t have the management solution,” says Tom Coughlin, president of Coughlin Associates, an Atascadero, Calif., research firm specializing in storage.
Coughlin says successful information lifecycle management starts with understanding the content and creating metadata indicating the importance of data, how long it needs to be kept and how much it will be used, then devising policies on how to handle different types of data.
ILM requires you to consider three aspects of data storage requirements, Steinhardt advises: performance, or how fast the data storage media need to be; availability, meaning how readily you need to be able to get at the data; and retention — how long the data will need to be kept. The answers to those questions will help define what tiers of storage you need and what data goes where.
The ultimate goal — or what Steinhardt calls the “endgame and nirvana” of information lifecycle management — is enterprise content management, or the full automation of data handling according to business rules. But very organizations have come that far today. Even EMC, Steinhardt says, is only in the planning stages for that phase.
The proliferation of data, coupled with a steadily growing need for systems to be available all the time, has also made backup a greater headache. “Traditionally people have done backup in some sort of off hours, like the evening, when their computers are not required,” says Craig Andrews, director of sales engineering at Symantec Corp. in Toronto. “Twenty-four-hour availability of data is becoming the norm for organizations, and so those windows are kind of disappearing.”
So, more enterprises are looking at continuous data protection, in which changes to data are continually being copied to separate storage devices. This avoids shutting down the system to make periodic full backups, and in the event of data loss it can allow data to be restored faster and to a much more recent point.
One in three use continuous data protection
Because it backs up data to disk, continuous data protection is also more reliable, says Aviram Cohen, vice-president of product management at FilesX Inc., a Newton, Mass., CDP vendor. However, Steinhardt notes, CDP does not replace conventional backup. Data can still be copied to tape for longer-term or offsite backup.
Coughlin Associates says 34 per cent of storage users responding to its latest annual survey are using continuous data protection, while another 23 per cent plan to acquire it within 18 months. Coughlin says faster recovery is the Number 1 reason organizations give for using CDP, followed by liability and compliance issues and improved recovery point, meaning more recent data can be recovered.
CDP has two flavours. In true CDP, every change to data is written to a second or backup disk as it is written to the primary disk, so there is an up-to-the-second backup. In the on-demand or snapshot approach, all changes since the last snapshot are copied at fixed intervals. The continuous approach permits more up-to-the-second recovery. FilesX uses the snapshot approach, and Cohen says its advantage is that it ensures a consistent view of the data with each snapshot, whereas the continuous approach may cause problems by capturing half a transaction in progress when the system fails. Currently FilesX limits customers to one snapshot per hour to keep the amount of storage required manageable, but Cohen says snapshots every minute are technically possible.
According to Andrews, CDP has a minimal impact on network bandwidth needs because only changes to data are transmitted. Symantec’s Backup Exec Continuous Protection Server lets network administrators control the amount of bandwidth used for CDP, he adds.
Technologies like continuous data protection and content-addressable storage are helping, but data continues to proliferate and to become ever more vital to most organizations’ survival. Further innovations and plenty of creativity will be required to keep up.