People use text for lots of things. We’ll take a stab and assume all of those things should produce better than a 70% accuracy rate. Most OCR engines average about 70% accuracy rate and exist to embed produced text into a PDF or produce a text file and that’s the end of the story. We do that too, but we use Computer Vision instead of OCR and average well over 90% on medium to poor quality documents, even much higher in many cases. Still, for many, this leaves much to be desired.
The team at Cullable asks, what’s next? Downstream processes are important and Cullable’s adding new creative transformations and thoughtful, operational staples downstream from text recognition all the time. From language translation, DLP, automated redaction or summarization using convolutional neural networks like OpenAI’s GPT, all the way through production of .json files cataloging the x and y position of words and characters, or even the edges or “bounding boxes” of recognized words, helping developers, healthcare, knowledge management, legal and digital transformation initiatives in a number of industries. Cullable’s technologies power form automation and machine learning initiatives that harvest intelligence from legacy document formats, lower operational cost of healthcare or determine taxonomies and privilege access rules for enterprise data lake migrations. It’s just useful in so many ways besides ‘making this pdf searchable’.
We’re launching Cullable’s desktop app early access program with Text Recognition as the cornerstone of the solution. We plan on implementing all of Cullable’s downstream processes over time, with priority of feature releases being dependent on customer interest and intent. The text recognition module’s release is considered very stable and is packed with a variety of features, such as the ability to generate PDF files from images, document or page level text file (supports .OPT for boundary definition), or choose to have your text embedded in an output PDF file. It should serve as a solid foundation in which to build on. Cullable’s desktop app is intended to be licensed to an organization, with org-specific api keys being generated for unique users or departments throughout the company. Each app uses it’s respective API key. This allows Cullable to report on a department or user’s specific utilization and cost, or to roll up activity over time in order to chart and understand organizational usage, adoption and ROI. Cullable’s text recognition engine is offered at a low cost per page and volume discounts are always available.
On execution, Cullable bundles the source images or files and securely encrypts them in a container, transferring them to your organization’s private cloud storage bucket. From there, pre-processing operations are performed and images are shipped to Cullable’s API cloud. OCR is processed, records are re-assembled and automagically downloaded to your configured destination directory. We don’t see your files. We don’t retain any text or ‘operational state’. We only track usage and top level project naming conventions in order to report them back to you.
Capitalize on the raw power of Cullable’s cloud using a drop dead simple workflow that anyone can learn in minutes, producing the worlds fastest, most accurate OCR almost instantly (at a rate of up to 20,000 pages per minute).
A full list of Cullable’s current OCR and downstream features are included on the following pages. Where do you want us to go next?