Job Title: Audio Collection Specialists for Wake Word and Non-Wake Word Dataset Project
Project ID: SEP-0110(SH432)
Location: Remote
Contract Type: Freelance
About the Project
ServiceEdgePro is seeking experienced audio collection specialists to participate in our latest project focused on collecting wake words and non-wake words datasets across four languages: English (US), Spanish, Taiwanese Mandarin, and Japanese.
Project Scope
- Collect wake words and utterances from a diverse group of speakers.
- Each speaker will say the wake words 25 times to capture variations.
- Participants will read 4 minutes of unique sentences for non-wake word recordings.
In Scope
- Vendors with experience in audio collection.
- Crowdsourcing teams.
Wake Words and Non-Wake Words Dataset Requirements
Wake Words: Each speaker must say the wake words 25 times. Variations include:
- Hey Google
- Hi Google
- Hello Google
Speaker Demographics:
- 500 speakers per language across various regions.
- Gender distribution: 50% male, 50% female.
- Age range: 20-60 years.
- Males: 20-29 (20%), 30-39 (15%), 40-49 (10%), 50-59 (5%).
- Females: 20-29 (20%), 30-39 (15%), 40-49 (10%), 50-59 (5%).
Languages:
- 4 languages: English (US), Spanish, Taiwanese Mandarin, and Japanese.
- Each language requires 12,500 utterances, totaling 50,000 utterances across all languages.
Audio Details
- Format: WAV
- Encoding: PCM, signed, 16-bit, little endian
- Sampling Rate: 16 kHz
- Bit Depth: 16 bits per sample
- Channel: Mono (1 channel)
Recording Guidelines
- Peak amplitude should range between -21 dBFS and -6 dBFS.
- Record in a quiet environment with background noise <50 dBSPL. Ensure SNR >25 dB.
- No distortion, missing samples, or chopping.
- Normal speech speed, verified by subjective listening.
- Segment each wake word into separate .wav files, ensuring no part is cut off, with front/rear silence trimmed to <0.1 sec.
Non-Wake Word Recording
- Each speaker must read 4 minutes of unique sentences, avoiding any selected wake words. Content should differ for each speaker and vary across the 4 languages.
Application Process
Interested candidates are invited to apply by filling out the Google Form below: