Insights in Insurance: Like Needles in Haystacks

Every day there are 2.5 quintillion bytes of data created. This means that more data has been created in the last 10 years than in all of human history. By 2025, the research group IDC predicts annual data output of 163 zettabytes (that is one trillion gigabytes).




This surge in data volume has injected a lot of excitement into commercial underwriting experts that see it as a means to improve due diligence and research. Combined with artificial intelligence tools, there seems to be no excuse for not having deep insight into every aspect of a company, a building, or an asset.


Unfortunately, data as a crystal ball is only a half-baked myth – for now. We are still in the early days of using big data to its fullest potential, and one barrier is the variety and integrity of the data sources. Sorting through today’s massive loads of unstructured data (data not organized in a pre-defined manner) to find the answer to a complex question is like finding a needle a haystack.

Bricks in the road

Knowing that we can have more data to inform underwriting is certainly good news, but the road to making that data actionable is long, unpaved, and has a few potholes along the way.

First, just consider how much information an underwriter must address when reviewing a business risk. This could include looking through up to 10 years of historical information. If there is more information available than ever before, logically the decision on whether or not to issue a policy and how to price it should be made easier.  




The problem for insurers is that while a lot of data exists, much of it still remains unstructured and unorganized. Huge levels of variance in the formats of this data make it very difficult to parse and use. Historical, government data on any given company may still be sitting in Microsoft Access, on a CD, on a website or in an Excel file with no common fields or formats. Company permit files may have just been scanned into unindexed PDFs and are sitting on a server or large cloud-platform, today’s equivalent of a dusty file cabinet - and actually those aren’t off limits either!



Data is not always digital, which presents an enormous challenge when looking for insights.

To make matters more complicated, there are over 3,000 counties in the United States, meaning that there could be hundreds of thousands of these data formats in existence. That there is more data available than ever before does not necessarily imply that this data is easily accessible.

DataCubes has collected, normalized and mined these disparate datasets to surface insights for commercial underwriting. This “data lake” eliminates redundant processes, like having to cross-reference license and permitting documents to verify information. New software and machine learning techniques can automatically strip information from that dusty PDF locker and add it to the data lake, saving time and money for the underwriter.

DataCubes not only maintains existing data, but can also reveal trends and answers to questions that were never asked. This helps to speed up the process, uncover additional insights and enables underwriters to ask fewer, more relevant questions.

Ultimately, for data to be valuable it needs to be searchable and relevant. If the last 10 years created more data than ever before, the next 10 years is certain to continue to show incredible improvements in how we interact with it.