In the previous blog post, we explained why the quality of annotations is of utmost importance. In this blog post, we will show you a new and innovative way to improve model quality: by using auto-correct for information extraction models.
Suggested Review Tasks are tasks that are automatically created after you have trained a model. These tasks contain documents with annotations that the Metamaze A.I. believes are wrong and need to be verified. This innovative auto-correct feature is an immense time-saver to make your model perform more accurately with less human training time needed.
After a decent amount of data is annotated, we know we can train a good, accurate model. That model makes predictions that should be as good as, or better than humans, meaning we can use the model to verify human annotations.
After doing this suggested QA task, you have successfully corrected up to 10% of your documents by automatically detecting misannotations.
The end result?
Accurate training data with only 15% of the normal validation effort!
After training a model, Metamaze will automatically calculate for which documents the model predictions and the annotations disagree.
For every document, we calculate a sort of “annotation confidence” on how well the annotation and the predictions agree, and when this annotation confidence is too low, we add the document to the Suggested Review Task.
Disagreement or low annotation confidence always has to be validated to solve the “conflict”.
Model is correct | Model is wrong | |
Annotation is correct | 85% of documents Annotation confidence = 100% No action needed. |
5% of documents Annotation confidence < 100% Action: confirming model is wrong and the annotation is correct will increase the document’s weight in the next training to make sure it gets modelled accurately. |
Annotation is wrong | 10% of documents Annotation confidence < 100% Action: correcting the annotation benefits overall model quality |
Impossible to detect automatically |
To factor in overfitting on training data, the model predictions are from the validation fold of a 10-fold cross-validation training regime.
Since we did all that effort of calculating an annotation confidence for every document, we can now use that information to reweigh those documents and already benefit from it, even before correcting the annotations! Interested in how that works? The technical details are explained well in CrossWeigh: Training Named Entity Tagger from Imperfect Annotations by Wang et al (2019).
CONTACT US
Curious how Metamaze works and what it can mean for your enterprise?
Metamaze is a no-code Intelligent Document Processing platform that uses AI to automate every document and email workflow.
Metamaze is a no-code Intelligent Document Processing platform that uses AI to automate every document and email workflow.