Textual Detection of Near Duplicates with Xera Analytics

Xera’s New Analytics Engine Is Included With Every New Cost Confidence Project, Starting July ’18.

The wait is finally over. Starting in July 2018, Advanced Analytics are bundled in with every standard Cost Confidence program we launch for the foreseeable future.

Xera has seamlessly integrated Ayfie’s remarkable Text and Metadata Analytics engine into the platform bringing extraordinary power and simplicity into simple features within the platform, making it much more accessible to the user. Now, we’re including it with our all inclusive¬†Cost Confidence program.

Today, let’s talk about Near Duplicate Detection and why your team should demand it on every case like so many of our clients do.

Near duplicate detection doesn’t rely on hash values like traditional exact duplication strategies and thats a good thing.

As demonstrated in this video, de-duplication processes can often leave textual duplicates or near duplicates behind because the documents are exact matches. In our demonstration, we find 5 exact copies of a record with different timestamps and in some cases, different email distribution recipients. Because some of the metadata was indeed different (time, recipient) the hash created to fingerprint the records didn’t match each other and therefore were not considered a true duplicate.

What’s the big deal? In Legal document review, documents are methodically reviewed and tagged or ‘coded’ for relevance in accordance to document production requests. This labor is traditionally distributed across many reviewers. When the human element is multiplied, the chance for error kind of skyrockets. This means in traditional review scenarios, some versions of a document may be tagged as non-responsive or privileged, while other versions may be tagged otherwise by other reviewers. It happens.

Technologies like textual analytics (or near duplicate detection) can help prevent this altogether.

Near duplicate analysis can help you find all versions of a particular document with ease.

ARMA claims that over 80% of the documents created electronically are either from ‘boilerplates’ or ‘templates’, or exist as revised copies of another document. Considering that over 600 BILLION Microsoft Word documents alone were created in 2017, that’s a lot of tracked changes with lots of potential to miss an important alternate version of a record.

Check out the video below to see how near duplicate detection is implemented in Xera’s analytics platform.

Real Accessibility.

Accessibility is about cost and ease of use. By including Xera’s Advanced Analytics platform with every Cost Confidence case we serve, we’ve brought the possibility of analytics like Predictive Coding, Xmplar, Concept Clustering, Near Duplicate Analysis and Threading to a lot more people.

Interested in trying it for yourself? Reach out today and talk with an eDiscovery expert about what Xera Analytics and Platinum can do for you. We’ve got a number of sample datasets available if you’d like to test drive the tools, or we’ll even setup a no cost pilot to check it out with your own data.

Litigation Support is tough. Be more effective with Xera Analytics.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.