Text-to-speech (TTS) development is limited by scarcity of high-quality, publicly available speech data for most languages outside a few high-resource languages. We present Nord-Parl-TTS, an open TTS dataset for Finnish and Swedish based on speech found in the wild. Using recordings of Nordic parliamentary proceedings, we extract 900 hours of Finnish and 5090 hours of Swedish speech suitable for TTS training. The dataset is built using an adapted version of the Emilia data processing pipeline and includes unified evaluation sets to support model development and benchmarking. By offering open, large-scale data for Finnish and Swedish, Nord-Parl-TTS narrows the resource gap in TTS between high- and lower-resourced languages.
Nord-Parl-TTS
is built upon the Parliament recordings from Finland and Sweden. The Finnish subset contains 900 hours of
speech data, while the Swedish subset contains 5090 hours of speech data. We show the duration statistics for each language
in the table below.
We show some samples from Nord-Parl-TTS here.
We show the data processing pipeline for Finnish and Swedish in the figure below. The pipeline is an adapted version of the Emilia data processing pipeline.
In this section, we show samples from two monolingual TTS models trained on Nord-Parl-TTS: Matcha-TTS and F5-TTS.