Benchmarking jobs call each selected service to collect translations (usage fees apply) for text samples provided in the selected dataset(s). While doing that, they also log their speed of response and any time-outs occurring due to API failures.
Many well-established benchmarking metrics (such as BLEU, COMET, etc.) are calculated on aiXplain’s infrastructure to enable a performance comparison of the services (usage covered by your aiXplain Standard Membership).
These metrics are then aggregated and reported to you. You can allow us to share those benchmarks publicly with shoppers of services on the aiXplain platform.
Since those metrics are calculated for each dataset that can be in different domains, they also allow you to know the best and worst performing domains in each language pair of your or the service of your choice. You can leverage this information to obtain more labelled training data on certain domains to improve your service, specifically on them.