What Makes a Computer Hash So Important to eDiscovery?
Think of a computer hash as the fingerprint of a digital file. It is a unique series of numbers and letters that form the descriptor for a piece of information stored on a computer or in the cloud. Creating a hash on a file creates an individualized alphanumeric value and attributes it to the information you’re tracking.
Hashing is not sequential. If you look at a hash the only consistency that you’ll find between files is the fact that the hash appears to be completely random.
In technical terms, a hash is binary data written in hexadecimal notation, which is why each hash is unique. If two files are the same, they will have the same hash. Yet if you make even one tiny edit in a 6,000-word document, the file will save under a new computer hash. Legally, this is an important data security measure for any legal proceeding.
A Digital Bates Stamp
Using numbers and letters to identify paper evidence dates back more than 100 years to the Bates stamp. The Bates Automatic Numbering-Machine was created by Edwin G Bates in the late 19th century as a way to quickly stamp important documents. It made it easy to add identifying numbers to document as well as the date, if needed. Since its inception, the Bates stamp has become the standard for identifying and categorizing documents in the legal industry.
There is no debate as to whether the computer hash will actually make the Bates stamp obsolete. The answer, of course, is that hashing has a definite place in today’s digital era, while the Bates stamp will continue to track any individual records that are used during the discovery and production phases of a trial.
The significance of the computer hash certainly signals the widespread use and acceptance of digital data in the cloud.
How Does a Computer Hash Effect eDiscovery?
Hashing is the backbone of an intelligent eDiscovery process. A hash helps to ensure the authenticity of data, protecting it against alteration of any sort. Without a computer hash, any data that is collected loses its legitimacy, therefore rendering it useless. Every single file needs to be accounted for during the moment it is collected to the moment it is used as evidence in court. As a result, it’s imperative for firms to have a plan in place to ensure that all files have a hash added to them during the initial data collection process. Intelligent hashing is a fast and highly accurate process. Since two identical hashes defensibly indicate documents as duplicate, this technology is often used to de-duplicate electronic documents before the review phase of discovery is performed.
Understanding the Practical Hashing Process
So what’s the best way to start working with computer hashes? Most tools in the eDiscovery processing world including many collection tools will generate defensible hashes, but not all hashes are created equal. Many tools rely on a blend of metadata to create a unique value, which is then hashed. Others rely on the bytes (or exact size) of the file in question to create a hash. Furthermore some hashes like SHA are substantially larger than standards like MD5 which mathematically, can prevent accidental collision. Talk to an eDiscovery expert to determine what’s best for your situation.
While computer hashing is certainly something you can perform yourself, outsourcing the discovery process to an e-discovery firm that can manage the data collection process, including assigning a hash to every document they find, will save your firm time and money. To learn more about how eDiscovery can benefit you, download our free e-book, How to Use eDiscovery to Compete With the Big Boys.