Auto-correct for information extraction models

In the previous blog post, we explained why the quality of annotations is of utmost importance. In this blog post, we will show you a new and innovative way to improve model quality: by using auto-correct for information extraction models.

HOW IT WORKS

Suggested Review Tasks are tasks that are automatically created after you have trained a model. These tasks contain documents with annotations that the Metamaze A.I. believes are wrong and need to be verified. This innovative auto-correct feature is an immense time saver to make your model perform more accurately with less human training time needed.

WHY IT WORKS

After a decent amount of data is annotated, we know we can train a good, accurate model. That model makes predictions that should be as good as, or better than humans, meaning we can use the model to verify human annotations.

After doing this suggested QA task, you have succesfully corrected up to 10% of your documents by automatically detecting misannotations.

The end result?
Accurate training data with only 15% of the normal validation effort!

TECHNICAL DETAILS

After training a model, Metamaze will automatically calculate for which documents the model predictions and the annotations disagree.

For every document, we calculate a sort of “annotation confidence” on how well the annotation and the predictions agree, and when this annotation confidence is too low, we add the document to the Suggested Review Task.

Disagreement or a low annotation confidence always has to be validated to solve the “conflict”.

So in total, three cases can occur


1. Annotation and model prediction agree (~85% of documents – annotation confidence = 100%). No action is needed from the user. These documents are not part of the Suggested Review task. These documents are very likely correctly annotated and correctly predicted by the model, meaning you can safely ignore them.
2. A.: Annotation and model prediction disagree (annotation confidence <100%)
2. B.: These documents are part of the Suggested Review task. A human needs to verify which of the following two cases is occurring:

Model is correctModel is wrong
Annotation is correct85% of documents
Annotation confidence = 100%
No action needed.
5% of documents
Annotation confidence < 100%
Action: confirming model is wrong and the annotation is correct will increase the document’s weight in the next training to make sure it gets modelled accurately.
Annotation is wrong10% of documents
Annotation confidence < 100%
Action: correcting the annotation benefits overall model quality
Impossible to detect automatically
A) Annotation is wrong and model is correct (~10% of documents). In this case, the annotator missed an occurrence or accidentally added extra characters. The annotation should be corrected.

B) Annotation is correct and model is wrong (~5% of documents). In this case, an annotator should confirm again that this is indeed correct. This document will receive a higher weight in the next training session.

Cross-validation

To factor in overfitting on training data, the model predictions are from the validation fold of a 10-fold cross validation training regime.

Since we did all that effort of calculating an annotation confidence for every document, we can now use that information to reweigh those documents and already benefit from it, even before correcting the annotations! Interested in how that works? The technical details are explained well in CrossWeigh: Training Named Entity Tagger from Imperfect Annotations by Wang et al (2019).

Jos Polfliet
Jos Polfliet
CTO Metamaze
For the past 8 years, Jos has contributed to 100+ Artificial Intelligence projects in various industries, countries and use cases. For the past 4 years, Jos has created and led multiple Artificial Intelligence implementation teams including the teams at Faktion, Chatlayer and now Metamaze. His mission is to inspire companies and individuals to benefit greatly from the use of A.I. and M.L. by doing useful things.

Share this post

Share on facebook
Share on twitter
Share on linkedin

More from the blog

Do you want to boost your business?

Drop us a line and we'll be in touch!

Get in touch

Let's build the next unicorn!

Subscribe to our newsletter

Let's join forces