Data Collection Services

We provide data collection services to improve machine learning at scale. As a global leader in our field, our clients benefit from our capability to quickly deliver large volumes of high-quality data across multiple data types, including image, video, speech, audio, and text for your specific AI program needs.

We provide several different data collection solutions and services to best suit your specific needs: You may have the most appropriate algorithm, but if you train your machine on bad data, then it will learn the wrong lessons, fail expectations, and not work as you (or your customers) expect. Your success is almost entirely reliant on your data. Key advantages of using us as your AI training data provider are: All AI training data is collected according to legal standards aligned with GDPR requirements Participants are fairly compensated for the data they provide in accordance with our Fair Pay policy An end-to-end managed service covering collection design, large-scale field operation, data QA, and annotation with over 20 years of deep expertise Truly global coverage of markets across all continents, in over 180 languages and dialects, with access to our curated crowd of over one million people

Off-the-Shelf Speech Datasets

Quickly expand your voice recognition products with licensable speech recognition databases and text corpora. Our high-quality licensable datasets include:

  • transcribed speech datasets for broadcast, call center, in-car, and telephony applications
  • Pronunciation lexicons, both general and domain specific (e.g. names, places, natural numbers)
  • POS-tagged lexicons and thesauri
  • Text corpora annotated for morphological information and named entities
  • New off-the-shelf resources are being developed across all media (speech, image, video). You can also contact us to discuss creation of new licensable datasets upon request if the specification is broad enough to be of interest to other clients.

    Open Source Datasets

    Curated from the Bride Valley platform, these free to download datasets are for the entire data science and machine learning community.

    The template used to annotate each dataset can be duplicated so you can expand them on the platform if needed. Inside each dataset, you’ll find the raw data, job design, description, instructions, and more.

    We are committed to the highest standards of excellence across all aspects of our business. We look for opportunities to exceed expectations and strive with determination to win and be the best at what we do. We value outstanding results – they’re a source of pride, recognition and reward in support of the high performance culture we are creating.