Artificial intelligence (AI) is the key technology that will change the way we live and do business in virtually every industry. Fired by spectacular success stories, such as smart machines defeating world champions in complex board games, discussion about AI is often narrowed down to the most recent developments in machine learning (ML).
This leaves the impression that AI is all about data and human experts will play an ever decreasing role in the near future. Using the example of contract analytics, I will make a case for a somewhat different approach to AI that leverages available expert knowledge to guide and focus the machine’s capabilities to process data.
AI is the scientific discipline that deals with the development of systems capable of dealing with tasks for which a human would have to use his/her intelligence. Nowadays, AI is often erroneously equated with ML.[1] ML stands for a set of techniques with which a system can improve its performance with respect to a certain task without explicit programming efforts. To this end, the system extracts certain types of patterns from training data and represents them in the form of a model. Such a model can be a decision tree or a neural network, with all its internal weights adjusted to the data at hand. One particularly successful ML paradigm is deep learning (DL) – an approach based on a particular type of neural networks which requires huge amounts of training data. A series of spectacular success stories, ranging from winning a game show[2] to defeating the human champion of Go[3] (and many more useful applications in speech recognition, natural language processing, etc) fundamentally changed the way people talk about AI.
According to public mainstream discussions about AI applications (triggered by clearly overselling commercials), all we need to do is gather really large data sets, feed them into a DL system and wait for the perfect solution to be generated. While this actually is a feasible way in certain types of application scenario, there often exist significantly smarter and efficient ways to generate results of at least similar quality. In the rest of this article I will make a case for a somewhat more ‘traditional’ approach to AI that leverages human expert knowledge and brings about a number of advantages.”
AI SYSTEMS STILL NEED EXPERT KNOWLEDGE TO MAKE SENSE OF DATA COLLECTIONS. THUS, COMBINING THE RESPECTIVE STRENGTHS OF BOTH HUMANS AND AI SYSTEMS WILL FORM THE BASIS FOR MANY SUCCESSFUL APPLICATIONS THAT CURRENTLY EXCEED HUMAN CAPABILITIES ALONE”
To this end, let’s have a closer look at the training phase of a machine learning system. For sake of brevity, we will restrict our considerations to supervised learning. In this class of ML approaches, the training data to be used needs to be labelled. That is, somebody with subject matter expertise has to provide additional information on what a certain data record actually stands for. When training an image recognition system, for example, the system has to be told explicitly what is depicted on each training image. This label then stands for the meta-information that image #1 belongs to the class ‘cat’ – as opposed to ‘dog’ or ‘bird’ which represent the contents of other images. While this is an easy task for a human, things become much more complex when we want to, for example, extract certain pieces of information from a contract written in English – such as a beginning and end date, the names of the parties involved, penalties for violation of certain conditions, etc. Indicating this information in thousands of sample contracts is extremely time consuming and error-prone – even if we neglect the fact that collecting such huge numbers of documents might be unfeasible in many situations.[4]
In such cases, it would be great if we could use additional ‘background’ information in the form of expert knowledge to facilitate the task at hand. The hope is that this knowledge provides additional structure and semantics to the amount of data, thus significantly reducing the number of training data required and easing the process of generating the information extraction model. The good news is that, in many cases, this can be achieved by, for example, constructing a knowledge graph (an approach pursued by Google, among others)[5] or an ontology (a network or hierarchy of semantic concepts of a domain and their interrelations).[6] Such processes can be supported by computer systems, but typically require a final editing step from a human.
Another approach consists of eliciting the required knowledge from a human expert. This method obviously reaches its limits in overly complex applications where huge amounts of rules would be required to adequately represent the expert’s knowledge (which turned out to be the main reason for the failure of expert systems in the 1980s). For moderately complex application scenarios, however, doing so can be extremely valuable. Let’s get back to the example of information extraction from contracts in English language.
Such a contract typically has a strictly defined structure. It can be decomposed into a number of clauses such as the termination clause, the indemnification clause, etc. Each clause comprises a relatively small set of information items that could be of interest for contract analytics. With this in mind, it is relatively easy for a legal expert to come up with a complete list of contract clauses and enumerate the information snippets they contain. How does this help in our attempt to train an information extraction system for contracts using a non-deep learning approach?
First of all, the system has to be told where to find the various clauses in a set of sample contracts. This can be easily done by marking the respective portions of text and labelling them with the clauses names they contain. On this basis we can train a classifier model that – when reading through a previously unseen contract – recognises what type of contract clause can be found in a certain text section. With a ‘conventional’ (i.e. not DL-based) algorithm, a small number of examples should be sufficient to generate an accurate classification model that is able to partition the complete contract text into the various clauses it contains.
“COMBINING THE RESPECTIVE STRENGTHS OF BOTH HUMANS AND AI SYSTEMS WILL FORM THE BASIS FOR MANY SUCCESSFUL APPLICATIONS THAT CURRENTLY EXCEED”
Once a clause is identified within a certain contract of the training data, a human can identify and label the interesting information items contained within. Since the text portion of one single clause is relatively small, only a few examples are required to come up with an extraction model for the items in one particular type of clause. Depending on the linguistic complexity and variability of the formulations used, this model can be either generated using ML, by writing extraction rules making use of keywords, or – in exceptionally complicated situations – by applying natural language processing algorithms digging deep into the syntactic structure of each sentence. In any case, the resulting model can be expected to be fairly precise and robust to variations in the respective wording. This is mainly due to the fact that the search space only consists of the text of one clause, not a few hundred pages of a complex contract. And it is easy to see that identifying one particular date indication (e.g. the one characterising the starting date of the contract) in a short text is much easier than sorting out which of the dozens of dates listed in the complete text is actually the one we are looking for.
The useful expert
To summarise, expert knowledge about the clause structure of contracts can be
used to accurately identify the position of a certain clause and then extract the relevant information items within this clause. The expert knowledge helps to significantly reduce the number of training data – and thus, the human effort required to label them – and facilitates the actual information extraction by restricting the search space to the relatively short text of the current clause.
What does this tell us about the use of artificial intelligence in general and machine learning in particular? First of all, there exists no standard algorithm or even paradigm serving all purposes equally well. In application scenarios where huge amounts of data are easily available and can be labelled without significant effort for human experts, a purely data-driven approach might make perfect sense. Examples include image collections, such as YFCC100M, a collection of almost 100 million images.[7] The majority of these images have been labelled by their respective photographers. This means the effort of providing additional information about the training data could be distributed among a similarly huge number of human experts. In the Go example mentioned above, the system trained itself by simply playing games against itself. Labelling then could be fully automated – the system simply marked the data representing each match as won or lost.
When no crowd-sourcing or automatic labelling is feasible, making use of expert knowledge as described above can be a good alternative. Doing so not only significantly reduces the amount of training data required and the effort of labelling them, it typically also helps create much more transparent – and thus, comprehensible – models that can be easily maintained and repaired in case of insufficient performance that might be due to changes in the domain. Deep learning models, on the other hand, currently must be considered as black boxes whose internal behaviour cannot be easily explained – doing so is the goal of ongoing research.
The ultimate lesson learned from these considerations is the soothing observation that humans will continue to play a crucial role for the foreseeable future. AI systems still need expert knowledge to make sense of data collections. Thus, combining the respective strengths of both humans and AI systems will form the basis for many successful applications that currently exceed human capabilities alone.
Footnotes:
1.In fact, ML is just one of many disciplines forming the field of AI. Other examples include robotics, automated theorem proving, and natural language processing. Among these fields, however, ML assumes a special position as it meanwhile became part of the standard methods repertoire of virtually all approaches to AI.
3.https://www.nytimes.com/2017/05/23/business/google-deepmind-alphago-go-champion-defeat.html
4.Unsupervised learning, on the other hand, deals with unlabelled data. The model to be generated by the ML system then serves the purpose to identify clusters of similar data records which might stand
5.https://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not.html
6.https://www.ontotext.com/knowledgehub/fundamentals/what-are-ontologies/