Nord-Parl-TTS: Finnish and Swedish TTS Dataset from Parliament Speech

Zirui Li¹ Jens Edlund² Yicheng Gu³ Nhan Phan¹ Lauri Juvela¹ Mikko Kurimo¹

¹ Department of Information and Communications Engineering, Aalto University, Espoo, Finland ² Speech, Music & Hearing, KTH Royal Institute of Technology, Stockholm, Sweden ³ School of Data Science, The Chinese University of Hong Kong, Shenzhen, China

Abstract

Text-to-speech (TTS) development is limited by scarcity of high-quality, publicly available speech data for most languages outside a few high-resource languages. We present Nord-Parl-TTS, an open TTS dataset for Finnish and Swedish based on speech found in the wild. Using recordings of Nordic parliamentary proceedings, we extract 900 hours of Finnish and 5090 hours of Swedish speech suitable for TTS training. The dataset is built using an adapted version of the Emilia data processing pipeline and includes unified evaluation sets to support model development and benchmarking. By offering open, large-scale data for Finnish and Swedish, Nord-Parl-TTS narrows the resource gap in TTS between high- and lower-resourced languages.

Nord-Parl-TTS

Overview

Nord-Parl-TTS is built upon the Parliament recordings from Finland and Sweden. The Finnish subset contains 900 hours of speech data, while the Swedish subset contains 5090 hours of speech data. We show the duration statistics for each language in the table below.

Data Preview

We show some samples from Nord-Parl-TTS here.

Nord-Parl-TTS Data Processing Pipeline

We show the data processing pipeline for Finnish and Swedish in the figure below. The pipeline is an adapted version of the Emilia data processing pipeline.

Demos

In this section, we show samples from two monolingual TTS models trained on Nord-Parl-TTS: Matcha-TTS and F5-TTS.

Target Text: {{ item.targetText }}