Artificial Intelligence has been in the headlines for years. From bringing the end of humanity or the end of human suffering, the promise of a different world has long been expected as a product of thinking machines. While we are still a long way off from sentient self aware computers, the use of several techniques that fall under the category of “Artificial Intelligence” have taken hold in the document management services industry.
For decades, companies have sought to reduce the labor required to organize and extract data from their records. Starting with simple cross reference indexing, which told users which page in a book or frame on a roll of film to look for a document, we’ve tried to make indexing and data capture easier.
Starting originally with simple Optical Character Recognition (OCR) companies could leverage full text search to augment indexed fields. For instance, a company might index a couple fields in order to organize and manage the files effectively, like client name and date of service, and then use OCR to find keywords in files they are looking for.
But simply augmenting search wasn’t enough. With the advent of more accurate OCR that could process large volumes of documents, software companies began to combine the capabilities of OCR with templates that provided a way to identify where information on a page was likely to be and attempt to automatically extract field level data off of structured and semi-structured documents like tax forms and checks.
This approach works very well for documents that are very consistent and that were created with this kind of treatment in mind, like U.S. Tax Form 1040’s. In contrast, documents that didn’t contain any structure, like legal agreements and highly variable forms, like Tax Transcripts were very hard to capture, because the data moves around from one document to the next.
In order to deal with the slight variations that can occur in even the most structured forms, software developers took another step forward. They combined OCR and software driven business rules that allowed operators to describe the location of information on the page relative to other information that consistently appears on the page. For instance, you could tell the software, “look in the top right quadrant of the page for the word ‘Invoice’, if you find it, look to the left and below it for a sequence of numbers. Capture and place that sequence of numbers in the Invoice Number field.”
This approach allows for a document that contains predictable elements, like an invoice, to have data that can move around on the page to still be captured using automated techniques. This practice can be very effective on the right kinds of documents. However, both templates and the rules based approach called “Intelligent Data Capture” require the process owners to manage large collections of templates in order to automate the capture of records where there are a large number of variations of documents. But the challenge of template management and rules management wasn’t able to be solved just by having better OCR, we needed a better way for software to predict where on a page information would be located, or what type of document the software was looking at.
Enter the age of Artificial Intelligence. About five years ago, companies in the document processing services industry began working on new technologies to overcome the problems created by the last decade’s solutions. How do we get the computer to create and manage rules based on these new technologies without becoming overwhelmed?
So first, what is AI in the context of document processing? We are talking about two different technologies that are being added to existing approaches to overcome the challenges from templates and rules. These new technologies are deep learning and machine learning.
Deep learning is an AI function that is similar in nature to the human brain in its processing data and pattern recognition for use in decision making. Deep learning is a part of machine learning in AI that includes networks capable of learning without operator oversight from data that is unstructured. Also known as deep neural learning or deep neural network.
Machine learning is the ability of a computer program to learn and adapt to new data without a human operator. Machine learning keeps a computer’s built-in algorithms up to date with the newest inbound information without the aid of a human operator.
What drives success in artificial intelligence’s of the deep learning/machine learning kind is sheer volume. The more these systems have seen, the better they are at predicting what to do on the next item. In the world of document processing, this works out very nicely, because most document processing companies have millions of documents that have already been processed correctly by humans that can be fed into these systems to build the basic knowledge to drive the algorithms to success in the future.
With AI, we are seeing systems that can automatically detect and adjust capture based on the line item grid information location. This is very significant as a capability. It shows us that the AI is able to adapt to changing document structures on the fly without human intervention and can capture data that is structured in novel ways without an operator pre-mapping documents for capture or creating business rules logic to assist the AI. Once trained, the AI is able to determine where the data that needs to be captured is located and go after that data.
It’s still very early in the game to say for certain that AI will completely eliminate document classification and data extraction as human based jobs. But, it’s clear that these technologies are going to greatly assist us in the effort of organizing and extracting data from huge sets of documents for less money, with higher accuracy and faster than we as humans were able to do on our own.
In the future, we should expect to see an explosion in the number of documents that are being identified and tracked as part of business processes and the amount of data extracted from each document in the process is going to increase in both the volume and the accuracy of the data as OCR and AI continue to improve and mature. For clients with clean source documents that are produced through document scanning or submission of digital originals, the expectation of a zero touch process will become a reality if it has not already in the next few years.
For those with documents that come from uncontrolled sources, like pictures taken from cell phones, the ability to process some documents without human intervention will continue to improve as well, and in the next decade will likely reduce the reliance on human operators by between 15% and 95% depending on the business process complexity and document quality.
As we move towards our goal of a zero touch world of document processing, we learn how artificial intelligence can aid us, but not without risk. Like solutions from the past, problems that AI solves will produce new problems that will require something we aren’t aware of yet to solve. But ultimately, we will arrive at a place where everyone has the opportunity to do great work, because we’ve eliminated the manual repetitive tasks of the past.