We used invoices for the benchmark experiment. An invoice model was trained on all our invoice data except one dataset. This enabled us to replicate how a pretrained invoice model works.
Each provider extracts different entities, which is why we have different metrics for Metamaze models. When we compare with Google Invoice model, we only included the entities that both the Metamaze model and the Google model extract, and discarded the others. Same for the other providers: we always only take into account the entities that we have in common with the provider. List of entities per provider is included in the detailed report (see below).