Estonian Emotional Speech Corpus

What is the Estonian Emotional Speech Corpus?

The Estonian Emotional Speech Corpus (EEKK) has been created in the framework of the National Programme for Estonian Language Technology at the Institute of the Estonian Language. The corpus contains sentences expressing anger, joy and sadness, as well as neutral sentences.

The underlying principle of the corpus is that emotions can be recognised in natural, non-acted speech, and non-acted speech is a precondition for synthesizing natural speech (see Iida et al. 2003).

The corpus has two objectives:

The reliability of the corpus is ensured by perception tests: each corpus sentence is provided with perception test data on the recognisability of the rendered emotion.

The corpus is, in every way, open for expansion: by adding readers, sentences, emotions, etc.

For more on the corpus see Altrov 2007, 2008; Altrov, Pajupuu 2008, 2010; Altrov, Pajupuu 2012.

User options, queries

Users can search sentences expressing anger, joy or sadness and neutral sentences from the corpus (see Reports).

Sentences are displayed as text and can be listened to by clicking on them.

The emotion perception percentage of each sentence is also displayed.

Sentences can also be searched by the perception percentage.

Queries can be restricted to include only sentences in which

The sound and text of sentences can be downloaded and saved (Wav, textgrid).

Technical description and downloadable data

System tehnical documentation is here.

The corpus is a web-based application that uses freeware: Linux, PostgreSQL, Python, Praat, NLTK.

All the corpus metadata can be downloaded in the PostgreSQL dump format here.

The corpus data can also be loaded into EMU . All currently available databases are listed here and a small guide for EMU installation is here.


