Estonian Emotional Speech Corpus

What is the Estonian Emotional Speech Corpus?

The Estonian Emotional Speech Corpus (EEKK) has been created in the framework of the National Programme for Estonian Language Technology at the Institute of the Estonian Language. The corpus contains sentences expressing anger, joy and sadness, as well as neutral sentences.

The underlying principle of the corpus is that emotions can be recognised in natural, non-acted speech, and non-acted speech is a precondition for synthesizing natural speech (see Iida et al. 2003).

The corpus has two objectives:

The reliability of the corpus is ensured by perception tests: each corpus sentence is provided with perception test data on the recognisability of the rendered emotion.

The corpus is, in every way, open for expansion: by adding readers, sentences, emotions, etc.

For more on the corpus see Altrov 2007, 2008; Altrov, Pajupuu 2008, 2010; Altrov, Pajupuu 2012.

User options, queries

Users can search sentences expressing anger, joy or sadness and neutral sentences from the corpus (see Reports).

Sentences are displayed as text and can be listened to by clicking on them.

The emotion perception percentage of each sentence is also displayed.

Sentences can also be searched by the perception percentage.

Queries can be restricted to include only sentences in which

The sound and text of sentences can be downloaded and saved (Wav, textgrid).

Technical description and downloadable data

System tehnical documentation is here.

The corpus is a web-based application that uses freeware: Linux, PostgreSQL, Python, Praat, NLTK.

All the corpus metadata can be downloaded in the PostgreSQL dump format here.

The corpus data can also be loaded into EMU . All currently available databases are listed here and a small guide for EMU installation is here.

References

Altrov, Rene 2007. Emotsionaalse kõne korpuse loomine eesti keele tekst-kõne sünteesi jaoks. Tekstimaterjali evalvatsioon viha näitel. Magistritöö. Tartu Ülikool.

Altrov, Rene 2008. Eesti emotsionaalse kõne korpus: teoreetilised toetuspunktid. Keel ja Kirjandus, 4, 261 - 271.

Altrov, Rene; Pajupuu, Hille 2008. The Estonian Emotional Speech Corpus: Release 1. In: Proc. of the Third Baltic Conference on Human Language Technologies, František Čermak, Rūta Marcinkevičienė, Erika Rimkutė, Jolanta Zabarskaitė (eds.), 9-15. Vytauto Didžiojo Universitetas; Lietuviu Kalbos Institutas, Vilnius.

Altrov, Rene; Pajupuu, Hille 2010. Estonian Emotional Speech Corpus: Culture and Age in Selecting Corpus Testers. In: Human Language Technologies - The Baltic Perspective - Proc. of the Fourth International Conference Baltic HLT 2010, Inguna Skadiņa, Andrejs Vasiļjevs (eds.), 25-32. Amsterdam: IOS Press.

Altrov, Rene; Pajupuu, Hille 2012. Estonian Emotional Speech Corpus: Theoretical base and implementation. In: 4th International Workshop on Corpora for Research on Emotion Sentiment & Social Signals (ES3), Devillers, L., Schuller, B., Batliner, A., Rosso, P., Douglas-Cowie, E., Cowie, R., Pelachaud, C.(eds.),50-53. Istanbul.

Boersma, Paul; Weenink, David 2009. Praat: doing phonetics by computer (Version 5.1.01) [Computer program]. Retrieved February 26, 2009.

Iida, Akemi; Campbell, Nick; Higuchi, Fumito; Yasumura, Michiaki 2003. A corpus-based speech synthesis system with emotion. Speech Communication, 40, 161–187.