Here’s what you need to know before uploading your own dataset:
Upload file format
You should be sure that the files you upload are correctly formatted. Otherwise, errors might occur during the upload, or your data may not look right in your reports.
UTF-8 encoding
Your upload file should be in UTF-8 encoding. This is a standard encoding for most applications on the web. However, if you are exporting data from certain desktop products, such as Microsoft Excel™, you may need to convert your file to UTF-8 before uploading it on aiXplain.com.
CSV files
The CSV files must have a regular structure of rows and columns. Each row must have the same number of columns, even if data is missing for a particular cell in the table. Trying to upload a file with merged cells, or an inconsistent structure will fail with an upload error.
One column in the CSV file should include the source data and one or more columns could include target data.
Don’t place CSV files inside ZIP archives.
Separators
All the fields in your data must be separated from each other by commas or semicolons.
If there are commas within the actual data in a field you want to upload, that field must be surrounded by “quotes”. If your data includes double quotes, you can use a ‘single quote’ character to surround the field.
Header row
The first line in your file must be a header row. This row will help map the data columns correctly in the aiXplain upload process. Header names must be unique, so you can’t have duplicate values in your header row.
Column names must:
Contain only letters, numbers, or underscores. Other punctuations or special characters are not allowed.
Start with a letter or underscore
Be at most 128 characters long
The header row must also follow the rules for separators mentioned above.
Line breaks
Each line in the file must end with a line break. File upload does not support line breaks in your data even if these are escaped by quotes.
ZIP files
To upload media files (.wav/.mp3 only), you should upload them in a ZIP file. Separately upload a CSV file that lists the (relative) path of the media files in one column (i.e., source data column). A second column in the CSV should contain the target data (transcripts).
Select all audio files and create one zip archive of them without putting them into a folder.