WE’VE GOT AN APP FOR THAT.
Let’s face it. The Portable Document Format (PDF) is practically a godsend when it comes to sharing content between parties in Litigation.
Nothing beats a robust review database when millions of documents need review for discovery, but need to get 300 documents to an expert? Look no further.
However, if you’ve got to combine 30,000 pages into 1,000 exhibits by the end of business, you’ve got a problem. Seriously. Let’s take a moment to talk about the PDF format and what can be done to build these portable documents faster and with less cost than ever before.
LET’S TALK ABOUT THE PROS.
IT’S SECURE. Adobe Acrobat supports 256 bit AES keys among other encryption mechanisms. Passwords are easy to apply and most PDF viewers can work with this natively, ensuring any recipient can decrypt and access the shared content from anywhere, from cell phones to trial presentation decks.
IT’S FREE. Applications like Adobe Acrobat are available at a low cost, but many free PDF creation suites are widely available for download. What’s more, most operating systems and cloud content management systems have the ability to convert documents to PDF format, natively. This means it’s easier than ever to create a PDF and start sharing content right away.
IT’S EASY. This one’s a no-brainer. It’s widely available on user devices, multifunction copiers, cloud services and creating a PDF is generally just a click away.
ITS COMPLIANT. With Robust security baked in and the ability to generate a flattened PDF/A document for the court, PDF is accepted as a standard production format by legal teams and courts across the country. (although based on volume and purpose, other production formats may serve the need more effectively.)
LET’S TALK ABOUT THE CONS.
CREATION IS EXTREMELY SLOW. You heard that right. PDF creation can run at a snail’s pace making quick turnaround PDF jobs very difficult to estimate and this puts deadlines in jeopardy every day. Most Litigation Support teams scale this across as many workstations as they can, but resources are scarce, as the PDF conversion process basically consumes all available compute resources, rendering the workstation almost useless while the conversion is in progress. Need to make last minute designation changes to your exhibits for trial in the morning? Good luck.
REQUIRES LOTS OF COMPUTE RESOURCES. We touched on this above, but consider some actual stats. Some image to PDF conversion jobs can run at 1000 pages an hour. Need to convert 150,000 pages? You’d need 19 machines committed to make an 8 hour deadline.
At our company, we recently converted 3 million pages into 1 million PDFs. We had three days to complete it. By traditional standards, we’d need 42 machines running for 72 hours to complete the assignment on time. Of course, that’s considering that none of the machines / applications crash, which is a common occurrence. Because it’s so common, a ridiculous amount of labor is necessary to check up and maintain the conversion process, which can be taxing across 42 workstations. We were able to complete the project in a fraction of the time and resources with almost no labor.
PDF IS A LOOSE STANDARD. This can create technical challenges when dealing with PDF files. Some PDF files are postscript, which means there are little to no actual images imbedded in the document. These can be very lightweight and easy to work with, even if many thousands of pages are imbedded. Some PDF files are image based, and we’ve seen some with hundreds of 1GB individual uncompressed image ‘pages’ inside. Aerial photographs, GIS data, ASBUILT or CAD renderings. This will crash the majority of PDF conversion applications because of the high CPU and memory intensive workflows involved in conversion.
Finally, some PDF files contain native documents like emails and attachments. We call these ‘Portfolio PDFs’ and they are the bane of any litigation support professional’s existence. They can create a number of unique challenges that the industry’s still struggling to solve.
NOT OPTIMAL FOR REVIEW PLATFORMS. For this reason, they’re generally not recommended for discovery. Unless the PDF is optimized for web view, or the system renders images from the PDF on the fly, many review platforms must download the entire PDF to the users browser before it can be displayed. Also, a single PDF can contain thousands of individual emails and attachments. It’s hard to mark content as responsive or privileged when it’s all encapsulated in a single record. This is a big problem for alot of people right now.
ORIGINAL METADATA IS LOST WHEN YOU CONVERT TO IMAGE OR PDF. Of course it’s replaced with new metadata which is basically worthless to your review team. Creation Date, Author, To, From, CC… Forget about it. Unless it’s captured in some kind of cross reference file for the review platform, it’s gone for good. By the way, most PDF conversion workflows don’t consider this important legal requirement. This could cause a tremendous amount of time and cost to your project if it becomes an issue at a later date.
WHAT CAN BE DONE?
THE GOOD NEWS IS CULLABLE PDF IS COMING SOON. Cullable is a cloud based eDiscovery solution that solves everyday eDiscovery processing challenges by offering important ESI processing workflows as stand alone modules than can be chained together to create complex eDiscovery workflows.
Cullable is being released in modules. The first module released is Cullable OCR, a cloud based OCR engine that can take litigation images and load files and produce the world’s highest quality OCR at a rate of up to 20,000 pages a minute. Cullable can scale to thousands of processing nodes instantly, making short work of OCR deadlines at an extremely low cost.
The next module for release at the end of the month (October 30, 2017) is Cullable PDF. With Cullable PDF, you’ll be able to upload any image or image-based PDF file and produce OCR text files, or fully searchable PDF files in a fraction of the time of any other solution available. Shortly after that, we’ll roll out endorsement allowing teams to apply bates numbers and messaging like ‘confidential’ or ‘attorney client privilege’ messages in addition to formal ID assignments.
Finally, in 4th quarter we’re launching the Alpha of our eDiscovery data processing module that will allow for PDF as a standard output format from eDiscovery processing initiatives. This will include portfolio PDF handling.
All Cullable modules are built with ultimate scalability in mind. Machine learning and autoscaling in an encrypted at rest cloud environment with an extremely easy to use interface that requires little to no training.
We’re building tools for everyone in the Litigation field. Firms, Agencies, Service Providers and Corporations. We want you to get involved. Sign up for the open beta of Cullable today and get access to Cullable’s OCR and PDF open beta on October 30.