Electronic Discovery in 2010
Michael Osborne has been getting a lot of vendor calls lately pitching a new breed of products, typically called electronic data discovery () tools. These tools pro
The past 10 years have proved that the escalating costs of data collection and review in discovery, as well as the complexity of the systems themselves, demand a major realignment of how business data is maintained
- Deborah H. Juhnke
The year is 2010. Margaret Techway, a highly placed, first-generation, holographic memory engineer, has recently left her company, Innovations Inc., to join market-newcomer 3-D Strategies. Upon her departure, the “data-freeze” provision of Innovations’ e-risk management policy was implemented automatically. A remotely performed, quick forensic review of her primary workstation uncovers suspicious activity during the previous two weeks, which gives Innovations cause to file a lawsuit against Techway and 3-D Strategies for trade secret theft. The challenges of proving the case, however, are just beginning. Blogs, biometric keys, and blades are only a few of the technological hurdles attorneys will face in developing the case.
Because instant messaging (IM) has replaced e-mail as the preferred form of business communication but has not been consistently monitored or saved at Innovations, there are no e-mail archives to search. What files there are had been copied to a removable thumb drive and taken by Techway, leaving little evidence of their removal. Asking for the thumb drive in discovery will be only half the battle, however, because Techway’s thumbprint is necessary to access the drive. 3-D Strategies has adopted blade servers that are configured with a random array of inexpensive disk (RAID) format, meaning that Innovations’ attorneys cannot simply ask for “the server” drive. The increased capacities and more complicated backup models hamper the plaintiff’s attempts to narrow the scope of digital data discovery.
Finally, because Techway has participated in an unstructured public weblog (blog) dedicated to the discussion of new technologies (and sanctioned by Innovations), there are some questions regarding whether the trade secrets taken were, in fact, secrets anymore.
This brief vignette illustrates several points:
- Reliance on the “document” paradigm must change. In years past, discovery was comparatively simple. Ask for documents, get paper. But no longer. Much of what constitutes relevant discovery today and in the future will not, cannot, or should not be printed.
- Constant vigilance in understanding new technology as it relates to electronic discovery is required. Remember when there was no such thing as a personal digital assistant (PDA)? Over the past 10 years, fledgling technologies such as cell phones, digital documents, Web cams, and IM have become mainstream, and new sources of digital data present themselves daily.
These new technologies offer risks along with rewards. Organizations must accept that both technology and redesigned processes will be required to help manage, search, and produce an increasing variety and volume of data. As volumes increase and sources multiply, it will no longer be possible to gather and review all data.
• Computer-based discovery cannot be treated like paper based discovery. The quill pen has given way to the digital pen, creating a responsibility to respect and protect this more fragile form of evidence.
When viewed in light of recent corporate scandals, topics such as these are more relevant than ever to records managers, lawyers, and corporate management. The past decade has provided some lessons, but there are many more to learn.
The Document Is Dead
There was a time when documents were described in discovery as “writings of every kind and description that are fixed in any form of physical media.” The problem is that the common legal definition of a document is conceptually misleading in the context of electronic discovery issues. This is particularly true for collection and review of voice, video, databases, and Internet-based communications. When addressing these types of data, the average person’s concept of a document something that may be printed, read, and held in a person’s hand begins to blur.
Although expanding the legal definition of a document to include electronic data creates the obligation to produce such data in discovery, it offers no guidance on how that production should be carried out. Consequently, there is significant variation in methods used to produce electronic data for discovery. The assumed intent of production is to provide meaningful information, but there are ways in which this intent may be intentionally or inadvertently circumvented.
With paper documents or even word processing files, the meaning is fairly clear. There is a beginning, an end, and a logical structure. True documents tend to be self-contained, or at worst, refer to other documents in support of their content. This makes fitting digital data into the conceptual framework of a document particularly troublesome.
There have been attempts in the past five years or so to shoehorn digital data generated in discovery into the document paradigm, including printing it to paper, printing it to image, extracting it into file structures, and posting it to the Web for review. As technology advances, however, these techniques will become less suitable. They will fall short in their ability to accommodate all relevant forms of data and must evolve to remain viable. Likewise, forays into electronic discovery that have been limited to the collection and review of e-mail should be made cautiously: the good stuff may be left behind. The case where relevant data is found buried in a single field within a corporate database is only one example.
Predictions for Electronic Discovery in 2010
Indiscriminate conversion and production of data will end.
The “document” will be replaced by the “dataset.”
Calculated (not random) sampling will be standard.
Language used to request and describe electronic discovery will become more specific.
There will be more use of technology and techniques for filtering, including search–and–review tools based on artificial intelligence models.
Computer technology will no longer be Microsoft-centric.
E-mail will give way to other forms of communication as the primary source of data discovery.
There will be a need for the wider use of experts, consultants and attorney “specialists.”
The judiciary will become more educated and experienced in the use and abuse of electronic discovery.
Key Discovery Technologies |
Technology |
What It Is |
Example |
E-Discovery Issues |
Instant messaging |
• allows immediate communication via the Internet
• similar to e-mail, but without constraint, tracking, or preservation |
AOL, MSN Messenger |
• Enables users to circumvent corporate e-mail
• no record unless saved proactively
• informal |
Alternative e-mail |
E-mail systems that operate outside the corporate environment |
• Pocomail
• isp-based e-mail such as yahoo |
• Enables users to circumvent corporate e-mail
• no record unless saved proactively
• informal |
Biometrics |
Security Based On Personal Physical Characteristics, Such As Retinal Scan And Fingerprint |
Thumbprint Access On Pdas Or Usb Port Drives |
Can confound discovery and data retrieval efforts by making access difficult
Or impossible |
Filtering software |
Filters spam or other messages
And files |
• Spam Assassin
• filters embedded in ISP services such as AOL, Earthlink, and MSN |
• cannot assume that data sent was received
• on subscription services alone (not business e-mail), 11.7 percent of messages requested were never received, according to Information World |
Collaboration software |
Enables communication between companies and individuals in remote or
Web-based environments |
Eroom
Webex |
• may be overlooked as source of data
• may be only copy of relevant data
• difficult to monitor for data preservation
• eventually may become part of the operating system itself |
Virtual offices |
Business model whereby employees in selected departments work from home |
Jet blue reservation agents |
Dispersed data |
Portable storage |
Small, removable storage devices holding up to 40 gb and costing only about $400 |
• pocket drive
• microdrive (ibm)
• sandisk compactflash |
• hard to find
• hard to track
• easy to steal data |
Blogs |
Web-Based Personal Or Topic-Specific Bulletin Boards |
See Blogger.Com For Examples |
• ad-hoc nature
• difficult to track, collect, or identify
• if found, could be good evidence |
Blade servers |
Network-based servers based on “blades” that are added to a chassis, enabling many servers to be housed in a small space and boosting network efficiency |
Ibm, hp, others |
• more difficult for the untrained user to see
• holds more data
• more difficult to seize and review, as they are generally formatted as raid |
Digital files (beyond word processing) |
Former analog files that are now digitized, including voice and audio |
.wav and .mp3 |
An often-forgotten source of relevant data, particularly when used to broadcast corporate information |
Data mining |
Programs that enable data from a variety of sources to be viewed in the aggregate and from varying perspectives |
Generally customized or business specific,
Such as for hotel industry or
Manufacturing |
Presumption is that all information is locatable because it is in data warehouse |
World wide web |
All content formatted for the internet or a corporate intranet |
Any web site |
• another overlooked source of evidentiary information
• difficult to track and preserve |
Digital convergence |
Bringing together digital data of many types and sources into a single location |
Cell phones |
• creates another good source of digital evidence
• hard to monitor and track |
Peer-to-peer networking |
Communication protocol that enables pcs to talk directly to one another without sharing access to a centralized server |
Groove networks |
• difficult to track and monitor data maintained in this environment
• poses problems for data preservation efforts because of its decentralized nature |
New data types and greater reliance on electronic communication also present a significant records management challenge one that must be addressed by changes in process and in the technology used to manage that process.
So What’s New?
From the Fortune 50 to the “mom-and-pop,” organizations are increasingly implementing digital technologies. Unfortunately, the impact of new and more ephemeral data sources on records management and litigation are the farthest thing from the minds of those who implement new technology. Collaboration software, data warehouses, ISP-hosted e-mail, and Web-based content all present opportunities for indiscriminate archiving and dissemination of corporate information. Such consequences are often lost, however, in the cost-benefit discussions among IT staff and corporate management.
In 1995,Microsoft Windows was predominant, there were few personal computers, and the PDA had not yet been born. Storage was measured in megabytes, not gigabytes, and only “gear-heads” and professors wandered the Internet. Fast forward to 2003 and consider the current landscape: cyber hacking, computer viruses, the Linux operating system, terabytes and petabytes, Internet cafes, and cell phones that take pictures.What was once the stuff of science fiction and spy movies is now mainstream. So how do these advances impact electronic discovery?
It is increasingly difficult to identify and collect the most appropriate evidence. In this respect, technology is both a blessing and a curse. A curse because each year brings new places where relevant data may lurk and ways to exploit the weaknesses in data management structures; a blessing because as each weakness is identified, inventive companies develop the tools to bolster or eradicate it.
The enormous popularity of do-it-yourself in everything from home repair to self-help is filtering into the field of digital discovery, sometimes with disastrous results. Inadvertent overwriting of data and failure to preserve are two areas in which the do-it-yourselfer risks exposure and sanctions. The days of simply collecting e-mail from an Outlook server and calling discovery done are waning, if not already gone.
Those who find this preposterous should consider the Sarbanes-Oxley Act. Its document retention provisions alone mandate a higher standard of care. When taken in the context of litigation and discovery, however, Sarbanes-Oxley goes well beyond monetary sanctions to the specter of jail time. Thus, where to focus attention becomes increasingly important. Instant Messaging and E-mail IM is an immediate issue for most companies. It is ubiquitous, generally unmonitored, and a great way to circumvent restrictive corporate e-mail policies.
According to a study quoted in Information Week, “By 2007, businesses will be supporting 182 million IM users”; PC World estimates IM users will top 250 million. But when misused, IM can be used to leak everything from financial data to source code. For example, consider the possibility of an IM thread about pricing between competing companies and its implications for antitrust violations. Almost as bad as IM misuse is the fact that commercial Internet service providers such as AOL and Yahoo have introduced more sophisticated encryption options and premium e-mail services that enable customers to store more e-mail in their personal accounts for longer time periods. As cell phones and PDAs converge, they, too, will harbor data that may be subject to both retention and discovery.
The effects of increasing e-mail volume are becoming evident. Last year, as a cost-saving measure and in response to a 100-percent increase in e-mail in two years, EDS asked its employees to save messages in their local Microsoft Outlook inboxes, rather than on the Exchange server. This short-term fix is just one example of how companies react to an immediate problem without considering the long-term impact. Compounding the situation is the fact that many users have not been trained to use their e-mail systems effectively, making it much more difficult to retrieve and isolate relevant e-mail.
The good news is that the new version of Exchange, code-named Titanium, promises to protect messaging from hackers and integrates an automatic backup component that takes regular snapshots of the data. It also will further the centralization of e-mail to fewer servers, facilitating both discovery and data retention.
Storage
Storage has become personal. Corporate servers are no longer the exclusive keepers of corporate data. Thumb drives, flash cards, and micro-drives are now capable of holding gigabytes of data that can be downloaded simply and secretly. Employees can more easily take their work (or anything else) home or to a become critical in cases involving trade-secret theft, for example. Biometrics and hardware-based security can still foil an investigator’s attempts to access the data, however. IBM is placing storage of biometric factors and encryption keys on a dedicated processor on the computer’s motherboard. To gain access, some removable media require fingerprint recognition or putting the device into its host computer.
On the high end, storage area networks (SANs) are replacing the need to add larger hard drives to individual servers. Both SANs and outsourced data warehousing can easily be overlooked as a relevant data cache.
Backup
It appears the industry may also be moving beyond backup tapes into the world of “data protection appliances,” a phrase that is not a euphemism for file cabinets. Tape backup, which is linear, subject to failure, and tedious for data recovery, is being challenged by small computer systems interface (SCSI) devices that keep an initial copy of a protected drive and log changes at intervals as short as 30 seconds. Disk-based backups such as these may soon supplant backup tapes whose only goal is data recovery rather than data archiving. For now, tapes continue to grow and by 2010, super advanced intelligent tape (SAIT) may hold as much as 4 terabytes per tape.
The underlying issue, however, is too much data. As storage becomes less expensive and more data is generated, the temptation is simply to keep it available. If that trend continues, the potential liability and cost of gathering and filtering this data for litigation will be staggering. Consider that the reported average storage capacity of a company’s Windows NT servers is 43 terabytes. To put this number in perspective, if 43 terabytes of documents were printed, they would stack over 800 miles high.
Rapid Restore, a new IBM ThinkPad feature, creates a hidden service partition that backs up the entire system image, from data files to registry settings, with periodic updates. Although not the same as an evidentiary image, this backup will let users locate and restore single files that have been corrupted or deleted. That is good news for discovery but bad news for those trying to maintain tight controls.
Software and Operating Systems
Integrated messaging, version control, audit trail, and event notification are all components of the latest online collaboration tools. Objectively, they are excellent tools for streamlining such business processes as product development, corporate management, and more. When litigation threatens, however, they are just one more place where data may lurk.
Technology is slowly moving away from a Microsoft-centric view of business computing. Linux and other open-source platforms will heighten the variety and complexity of internal data review and storage. Futuristic applications such as visualization and mapping technology, rather than the printed report, may ultimately hold the best evidence. It is therefore critical that corporate managers, attorneys, and records managers understand current and future technologies and their effect on both retention requirements and proactive discovery in litigation. For example, will cell-site data (which antenna towers or wireless facilities a cell phone accesses) or .wav files be important in the company’s next litigation? Probably not, but IM and collaborative software probably will be.
Two Roads Diverged
Imagine that a person is carrying five ping pong balls back and forth across the room in her hands. Each time she crosses the room, another ball is added. After only a few trips she starts to drop a ball here and there, and she suddenly realizes that she could put all the Ping-Pong balls into a box to make the task easier. She continues to carry the box back and forth, each trip adding another ball, but now baseballs, basketballs, and footballs are added. The box finally becomes too heavy to carry, however, and she eventually drops all the balls.
This not-so-subtle metaphor helps illustrate how most people have thus far approached computer-based discovery (i.e., continuing to follow the same practices used for paper-based discovery), seeking only to contain the increasing amount and variety of data in a larger container. But take a step back and consider whether carrying all those balls back and forth was really necessary.
The costs and time associated with computer-based discovery can be greatly minimized with a little prior planning. Careful selection of datasets, filtering, and sampling techniques offer ways to focus discovery efforts and limit unnecessary collection. Needless to say, if a comprehensive e-risk management plan is implemented prior to litigation, the amount of data available for review will likely be much smaller. For example, the “2003 E-Mail Rules, Policies and Practices Survey,” co-sponsored by American Management Association, The ePolicy Institute, and Clearswift, revealed a lack of e-mail retention and deletion training and policies in U.S. corporations. According to Nancy Flynn, ePolicy Institute’s executive director, “... only 27 percent of the [1,100 U.S. companies that participated in the survey] are doing any training about retention and deletion of e-mail, and only 34 percent have any retention and deletion policies at all.”
A do-it-yourself trend is beginning to emerge, as lawyers and IT personnel take on more responsibility for managing electronic discovery. Large companies may want to build in-house expertise in electronic discovery. However, they must recognize that they will require significant training and an ongoing program to update them on current tools and technologies.
The law is not settled as to form, scope, and cost of electronic discovery. Two recent cases, Zubulake v. UBS Warburg LLC, 2003 ILRWeb (P&F) 2253 [SDNY, 2003], and Rowe Entertainment, Inc. v. The William Morris Agency, Inc. 205 F.R.D. 421 (January 16, 2002), offer guidance but do not acknowledge the coming storm created by the compression of court dockets and the expansion of information and new technologies.
The Future of Electronic Discovery
Some technologies will flourish, some will die, some will just keep hanging on. Predicting which will survive is like predicting the outcome of the next Kentucky Derby. One thing is certain, however: Our computing environments will continue to change and impact discovery in litigation.
Clearly, information managers must develop an understanding of hardware and software beyond that gained through personal experience to adequately pursue or defend electronic discovery in litigation. It is likewise easy to take a “been there, done that” attitude toward electronic discovery, but the times are quickly changing. AOL alone now generates 3 terabytes of logs a month. A single advanced server, when clustered, can hold a whopping 11 petabytes (11,000 terabytes) of data.
Emerging best practices for data retention and preservation can help corporate counsel address these issues proactively. They will require rethinking in terms of how to approach discovery. Records and information management systems have not historically been deployed with litigation in mind, but perhaps they should be. The escalating costs of data collection and review in discovery, as well as the complexity of the systems themselves demand a major realignment of how data is maintained in the ordinary course of business. Thus, an e-risk management plan has become an imperative. As with most things, a focus on minimizing risk now will yield benefits in the future.
“Deborah H. Juhnke is Vice President of Seattle-based Computer Forensics Inc. and is a leader in the field of electronic data recovery. She may be contacted at djuhnke@forensics.com.”
References
Corporate Instant Messaging Ready to Take Off.” Information Week. 2 April 2003.
E-mail Habits Are Risky Business.” PCWorld.com. 24 June 2003.
The E-mail Scandal.” Infoworld.com. 25 November 2002.
How Secure Is Instant Messaging?” PC World, October 2002.
More Than an In-Box.” InformationWeek.com, 6 May 2002.
2003 Infoworld Storage Survey.” Infoworld.com. 2003.
The Information Management Journal • November/December 2003
|