A Simple Predictive Coding Definition

One of the biggest buzzwords in the realm of eDiscovery is predictive coding. While those in the litigation support/discovery business may be familiar with the term, the average lawyer or paralegal may not know a). what it is and b). why they should care about it. In order to demystify this piece of legal tech jargon, we have come up with a plain English predictive coding definition.

What is Predictive Coding?

While there are slight variances in the definition of each title, the predictive coding definition most generally accepted is a machine-driven (computer) process that takes a series of human-commands and applies them in search parameters across a large set of data.

Programmers and legal tech geeks will sometimes call it by other names, such as:

1.) Predictive intelligence

2.) Technology-assisted review (TAR)

3.) Computer-assisted review (CAR)

4.) Advanced Analytics

The goal is to allow the machine to filter and define data based on pre-programmed parameters, thus reducing the volume of redundant or irrelevant data. The practical result of predictive coding is that it eliminates laborious keyword searches and document reviews that most attorneys and paralegals conduct during the discovery process.

How Does Predictive Coding Work?

Predictive coding utilizes sophisticated computer algorithms to examine a variety of different types of digital files, such as emails or PDFs once they have been categorized or ‘tagged’ in a document review platform like iConect’s Xera or kCura’s Relativity. For example, when a document is marked with a ‘Privilege’ designation, the system compares that record with others that share the same category, finds similarities and learns what conditions might designate other records the same, automatically.

The process works as follows:

1.) Data sets are extrapolated – “Seed sets,” or representative samples of documents are set aside from the full volume of data to be reviewed.

2.) Filters are applied – Documents are labeled in categories like “responsive” (relevant to the search) or “unresponsive” (not relevant). The software scans the seed, creating an algorithm that can be applied across additional data sets.

3.) Full data is searched – All data is examined using the predefined filters, leaving a smaller, more relevant data set.

4.) Refine the algorithm – Additional data samples are culled from the full data set, as the algorithm is honed. This is done via continuous active learning, using the algorithm to help choose additional documents, or passive learning, a more random process.

5.) Final review – Based upon the final algorithm generated, the entire data set is searched, and the computer applies “responsive” or “unresponsive.”

In the end, all that is left is the data relevant for the case.

The Value of Predictive Coding for Law Firms

The discovery process is one of the most time-consuming parts of the legal process. Lawyers spend countless hours each year combing through files. The onset of digital technology only complicated the data collections process, adding video, texts, and emails, to an already burgeoning pile of documents. Predictive coding speeds up the discovery process, saving time and money.

Some lawyers have been slow to leverage computer technology to discover legal evidence. Litigation is focused on established methodologies and traditions. But the digital era has forced many attorneys to concede there is simply no better way to capture, review, and disseminate all of the data that businesses generate than with eDiscovery, especially when predictive coding is used.

Finding the right software makes a predictive coding convert out of most firms. But because predictive coding requires human input, it’s also important to find an experienced eDiscovery firm. Pairing predictive coding with the right litigation support firm will help to ensure a successful technology assisted review process.



Predictive Coding E-Book




Author Sid Newby

More posts by Sid Newby

Leave a Reply