Textual Detection of Near Duplicates with Xera Analytics

Near duplicate detection doesn’t rely on hash values like traditional exact duplication strategies and thats a good thing.

As demonstrated in this video, de-duplication processes can often leave textual duplicates or near duplicates behind because the documents are exact matches. In our demonstration, we find 5 exact copies of a record with different timestamps and in some cases, different email distribution recipients. Because some of the metadata was indeed different (time, recipient) the hash created to fingerprint the records didn’t match each other and therefore were not considered a true duplicate.

So what’s the big deal? In Legal document review, documents are methodically reviewed and tagged or ‘coded’ for relevance in accordance to document production requests. This labor is traditionally distributed across many reviewers. Therefore, the human element is multiplied, the chance for error kind of skyrockets. This means in traditional review scenarios, some versions of a document may be tagged as non-responsive or privileged, while other versions may be tagged otherwise by other reviewers. It happens.

Thankfully, technologies like textual analytics (or near duplicate detection) can help prevent this altogether.

Near duplicate analysis can help you find all versions of a particular document with ease.

ARMA claims that over 80% of the documents created electronically are either from ‘boilerplates’ or ‘templates’, or exist as revised copies of another document. Considering that over 600 BILLION Microsoft Word documents alone were created in 2019, that’s a lot of tracked changes with lots of potential to miss an important alternate version of a record.

Check out the video below to see how near duplicate detection is implemented in Xera’s analytics platform.

Real Accessibility.

Accessibility is about effect, cost and ease of use. By including Xera’s Advanced Analytics platform with every Cost Confidence case we serve, we’ve brought the possibility of analytics like Predictive Coding, Xmplar, Concept Clustering, Near Duplicate Analysis and Threading to a lot more people.

Interested in trying it for yourself? Reach out today and talk with an eDiscovery expert about what Xera Analytics and Platinum can do for you. We’ve got a number of sample datasets available if you’d like to test drive the tools, or we’ll even setup a no cost pilot to check it out with your own data.

Litigation Support is tough. Be more effective with Xera Analytics.