In the previous blog post, we explained why the quality of annotations is of utmost importance. In this blog post, we will show you a new and innovative way to improve model quality: by using auto-correct for information extraction models.
HOW IT WORKS
Suggested Review Tasks are tasks that are automatically created after you have trained a model. These tasks contain documents with annotations that the Metamaze A.I. believes are wrong and need to be verified. This innovative auto-correct feature is an immense time saver to make your model perform more accurately with less human training time needed.
WHY IT WORKS
After a decent amount of data is annotated, we know we can train a good, accurate model. That model makes predictions that should be as good as, or better than humans, meaning we can use the model to verify human annotations.
After doing this suggested QA task, you have succesfully corrected up to 10% of your documents by automatically detecting misannotations.
The end result?
Accurate training data with only 15% of the normal validation effort!
After training a model, Metamaze will automatically calculate for which documents the model predictions and the annotations disagree.
For every document, we calculate a sort of “annotation confidence” on how well the annotation and the predictions agree, and when this annotation confidence is too low, we add the document to the Suggested Review Task.
Disagreement or a low annotation confidence always has to be validated to solve the “conflict”.
So in total, three cases can occur
1. Annotation and model prediction agree (~85% of documents – annotation confidence = 100%). No action is needed from the user. These documents are not part of the Suggested Review task. These documents are very likely correctly annotated and correctly predicted by the model, meaning you can safely ignore them.
2. A.: Annotation and model prediction disagree (annotation confidence <100%)
2. B.: These documents are part of the Suggested Review task. A human needs to verify which of the following two cases is occurring:
|Model is correct||Model is wrong|
|Annotation is correct||85% of documents|
Annotation confidence = 100%
No action needed.
|5% of documents|
Annotation confidence < 100%
Action: confirming model is wrong and the annotation is correct will increase the document’s weight in the next training to make sure it gets modelled accurately.
|Annotation is wrong||10% of documents|
Annotation confidence < 100%
Action: correcting the annotation benefits overall model quality
|Impossible to detect automatically|
B) Annotation is correct and model is wrong (~5% of documents). In this case, an annotator should confirm again that this is indeed correct. This document will receive a higher weight in the next training session.
To factor in overfitting on training data, the model predictions are from the validation fold of a 10-fold cross validation training regime.
Since we did all that effort of calculating an annotation confidence for every document, we can now use that information to reweigh those documents and already benefit from it, even before correcting the annotations! Interested in how that works? The technical details are explained well in CrossWeigh: Training Named Entity Tagger from Imperfect Annotations by Wang et al (2019).