Xera Analytics. Let’s Talk About Textual Detection of Near Duplicates

Textual Detection of Near Duplicates with Xera Analytics Near duplicate detection doesn't rely on hash values like traditional exact duplication strategies and thats a good thing. As demonstrated in this video, de-duplication processes can often leave textual duplicates or near duplicates behind because the documents are exact matches. In our demonstration, we find 5 exact copies of a record with different timestamps and in some cases, different email distribution recipients. Because some of the metadata was indeed different (time, recipient) the hash created to fingerprint the records didn't match each other and therefore were not considered a true duplicate. So what's the big deal? In Legal document review, documents are methodically reviewed and tagged or 'coded' for relevance in accordance to document production requests. This labor is traditionally distributed across many reviewers. Therefore, the human element is multiplied,