Solving the Insurance Application Challenge: Going Beyond OCR

Optical character recognition (OCR) is a decades-old technology that converts scanned images to text, and it has played an important role across many industries throughout the years. But when it comes to digitizing applications for commercial insurance underwriting, OCR is just the first step of the innovative process. Having worked extensively with the improvement of the insurance application process, I would like to give some thoughts on ways to improve underwriting productivity. Artificial Intelligence (AI), and specifically machine learning, can take the intake process well beyond OCR, making it possible to capture data accurately from scanned documents, derive meaning from that data, and apply it in ways that make the underwriter’s job easier and more profitable.


Submission Intake (1)


Insurance applications come in varied fonts and formats that can make OCR challenging. OCR cannot derive insights from the context of a sentence it has scanned from an image; it only knows what it sees at face value. It certainly lacks the insurance-specific contextual intelligence to tell whether a particular document is using the abbreviation “incr” to mean “increased” or “incurred.”

DataCubes’ experience indicates that approximately 65 percent of all insurance applications for commercial underwriting are scanned, and many are in formats that not all software can process. These documents benefit from state-of-the-art, AI-based intake technologies such as DataCubes’ d3 Intake™, which go well beyond OCR.

d3 Intake applies custom machine learning algorithms that have been rigorously trained on thousands of relevant insurance documents. Its machine learning models can convert digitized OCR results and other application text into accurate, meaningful and valuable underwriting information, reducing noise and minimizing irrelevant data.


_DataCubes experience indicates that approximately 65 percent of all insurance applications for commercial underwriting are scanned._ (4)


With the use of these new technologies, machine learning algorithms can understand industry forms and terminology. They can recognize the difference between a loss run and a statement of values (SOV), and quickly extract and contextualize the data accordingly. They can also rationalize the SOVs from dozens of different carriers, all with their own, customized SOV formats.

Trained on underwriting best practices, insurance-specific machine-learning models can understand the relevance of each item of data. The extracted data can then be combined with other information pulled from the insurance applications, and used to auto-populate the editable fields of the client’s underwriting tools and feed into downstream rating and policy administration systems.

With all relevant information at hand, underwriters are spared from having to manually enter information from insurance applications. They can apply data science and AI in real time to generate well-informed underwriting insights and recommendations. With products such as d3 Answers™, underwriters can also receive expert answers to their questions about the risks presented by each insurance applicant. Freed of a great many manual tasks, they have more time to focus on exceptions and customer service.      

Machine learning technologies based on data science and machine learning are not only making underwriting easier and more powerful; they’re also laying the foundation for a broad range of AI-enhanced innovations for commercial insurance underwriting and more. Today’s solutions are just a start.