We are delighted to announce that the atypical speech team has two papers accepted at ICASSP 2025 around Dysarthric Speech Recognition! Dr. Singh will present the papers on 2DN behalf.
The first paper, entitled “Dysarthric Speech Conformer: Adaptation for Sequence-to-Sequence Dysarthric Speech Recognition,” introduces a two-phase adaptation method using the Conformer architecture, a cutting-edge AI model for speech processing. The system achieves remarkable improvements by training on typical speech and then adapting to individual dysarthric speakers. Tests showed a 21.5% Word Error Rate on the UASpeech dataset and just 12.7% on the TORGO dataset—marking a significant advancement in assistive speech technology.
The second paper, “Robust Cross-Etiology and Speaker-Independent Dysarthric Speech Recognition“, focuses on evaluating the newly released Speech Accessibility Project (SAP-1005) dataset, which includes speech data from individuals with Parkinson’s disease. Using OpenAI’s Whisper model, our system achieved an impressive Character Error Rate (CER) of 6.99% and a Word Error Rate (WER) of 10.71% on SAP-1005. To test its versatility, we also evaluated the model on the TORGO dataset, which contains speech from individuals with cerebral palsy (CP) and amyotrophic lateral sclerosis (ALS). While performance decreased in cross-etiology testing (with a CER of 25.08% and a WER of 39.56%), the results still demonstrate the model’s ability to generalize across different types of dysarthria.
These studies mark a significant step toward developing more inclusive ASR systems that can aid individuals with speech impairments, regardless of their specific condition or speech patterns.
Recent Comments