How To Benchmark

Modified on Tue, 19 Sep 2023 at 07:43 AM

The Benchmark article walks users through the process of creating and running a Benchmarking job on aiXplain's platform, allowing them to compare model performance effectively.

Benchmarking is a useful feature that enables you to evaluate and contrast the performance of different models on a given task or dataset. Benchmarking can help you select the best model for your needs, or identify areas of improvement for your existing models.

To create and run a Benchmarking job, you need to follow these key steps:

Testing Dataset:

Upload a testing dataset for evaluation. A testing dataset is a collection of data points that are used to measure how well a model can perform on a specific task. For example, if you want to benchmark models that can generate summaries from text, you need to upload a dataset that contains pairs of text and summaries. You can upload your own dataset, or choose from the existing datasets on aiXplain's platform. To learn how to upload your own dataset, please refer to the following link: LINK TO ASSET ONBOARDING ARTICLE

Model Selection: 

Add models to benchmark with similar parameters. You can select models from the asset drawer, where you can find over 36,500 ready-to-use AI assets. You can also use your own custom models that you have created or uploaded on aiXplain's platform. 

To learn how to create or upload your own models, please refer to the asset onboarding article https://INCLUDELINKHERE

You need to make sure that the models you select have similar input and output formats, and are compatible with the task and the dataset you have chosen.

Evaluation Metrics:

Specify metrics to measure model performance. Metrics are numerical values that indicate how well a model can perform on a specific task or dataset. For example, if you want to benchmark models that can generate summaries from text, you might use metrics such as ROUGE, BLEU, or METEOR, which compare the similarity between the generated summaries and the reference summaries. 


Specify the number of segments in the dataset. Segments are subsets of the dataset that are used to split the benchmarking process into smaller batches. This can help reduce the computation time and memory usage, as well as provide more granular results. You can specify how many segments you want to benchmark. However, you should also be aware that the number of segments affects the cost of the benchmark job. The more segments you benchmark, the more resources you need to run the benchmark job, and therefore the higher the cost. You should try to balance the number of segments with your budget and your desired level of detail.

Benchmark Report:

Run the report to track progress and insights. Once you have configured all the parameters for your benchmarking job, you can run the report to start the evaluation process. You can monitor the progress of the benchmarking job on aiXplain's platform, and see how each model performs on each segment and metric. You can also view some insights and tips on how to improve your models or choose the best one for your needs.

Report Analysis:

View and analyze the Benchmark report. After the benchmarking job is completed, you can view and analyze the Benchmark report on aiXplain's platform. The report contains various charts and tables that show the performance of each model on each segment and metric, as well as the overall performance across all segments and metrics. You can also download or share the report with others.

I hope this article has proven to be both helpful and informative for you. We greatly appreciate your decision to select aiXplain as your AI creation and optimization partner. Should you have any inquiries or feedback, please don't hesitate to reach out to us at your convenience.

Contact us at:

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select atleast one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article