Learn how to get the best results from utilizing predictive coding during a technology assisted review.

When it comes to data mining large volumes of Electronically Stored Information, predictive coding is one of the most valuable tools a firm can utilize — if not the most valuable tool. It’s a simple truth that the amount of data being included in the average eDiscovery process is increasing with each passing year. Heck, even your data has data.

It’s now prohibitively difficult (if not outright impossible) for most firms to comb through this data manually. Instead, predictive coding can be used to analyze data sets based on samples that have been coded by human reviewers. But predictive coding alone does not an eDiscovery process make — there are still some things you have to do to ensure success.

1. Always Ensure the Accuracy of the Sample Sets

The sample sets that are used during predictive analysis are the only thing separating accurate results from inaccurate results. A single inaccuracy in the sample sets could cascade into hundreds or thousands of errors when the data is analyzed. Because of this, the largest amount of time spent during the technology assisted review process should be dedicated to ensuring that these sample sets are correct. The good news is that sample sets can be perfected through several iterations of documents to perfect the seed.

2. Take Multiple Types of Sample Sets

Sample sets are almost always best taken from random samplings of the data. Otherwise the predictive coding tool may only be trained to analyze certain types of documents or documents from specific sources. But companies may not want to rely on random samples alone — they may also want to take samples of some of the most challenging documents that they’ve located throughout the ESI. This will ensure that the predictive coding system will be able to handle these documents as they are processed. Xera’s predictive analytics really shines in this realm.

3. Clean Up the Samples as You Go

One of the most common mistakes firms make is that they leave in their old data sets as they continue to refine the process. As an example, a sample set that reviewers found to be inaccurate may be left into the system as newer, more accurate data sets are included. This older data will still be informing the predictive coding algorithms and analysis, thereby skewing results. By removing older, inaccurate sample sets, you can improve its accuracy.

4. Think Carefully About Your Relevancy

How relevant do you need your documents to be? Which documents are you looking for — only documents directly related to the subject matter, or documents that may be several steps away? High relevancy will decrease the documents that are returned but could also fail to include important information. Low relevancy may simply return too many results. Your firm will need to run multiple tests to determine the best relevancy settings for each individual discovery. A best practice is to keep predictive coding concepts simple and separated. For example, a review team may create completely different predictive coding initiatives such as relevant, privileged or issue based panels and run them simultaneously if the review platform supports it. Again, Xera allows for this workflow while many others do not.

5. Pay Attention to the Certainty of the Algorithm

Most predictive coding systems will give you a “certainty” rating in the form of a percentage. The system may be 60%, 70%, or 80% certain that its analysis is accurate. If the system isn’t certain of its results, then it will need more training — there will need to be additional sample sets, and these sets will have to be thoroughly checked for consistency.

If properly used, predictive coding will produce solid, accurate data in a mere fraction of the time of a human review. As the amount of data the average organization stores continues to increase, predictive coding becomes an essential tool within the technology assisted review process. For more information about the predictive coding — and how it can help your firm — contact the experts at Platinum IDS today.

Predictive Coding E-Book

Author Sid Newby

More posts by Sid Newby

Leave a Reply