General Pronunciation Dictionary for ASR

The general Icelandic pronunciation dictionary for ASR is based on the Pronunciation dictionary (see link on The dictionary contains about 136,000 words, transcribed with the IPA.


• Icelandic pronunciation data, V1 for the Kaldi ASR toolkit hér (15 MB). CC BY 4.0

About the general Icelandic pronunciation dictionary for ASR
The dictionary is based on the Pronunciation dictionary (see link on which was developed during the Hjal project in 2003. It is bein used in the open ASR system for Icelandic developed at the Reykjavik University,
The vocabulary
The ASR system was trained using the Málrómur speech corpus (see link on and the original lexicon was extended by words from that corpus. Icelandic texts from the Leipzig Wortschatz project (see ‘Íslenskur orðasjóður’ on were used for language modeling, and common words from that corpus were added to the lexicon as well. Furthermore, some words from the original lexicon not found in the training corpora were deleted.
Phonetic transcription
The manually created transcriptions of the original dictionary have been partly revised. Some inconsistencies have been corrected by limiting the transcripts to the closed phoneme set defined for the ASR system. These transcriptions were then used to train an automatic grapheme-to-phoneme converter to transcribe all new words in the lexicon.
Pronunciation data to use with Kaldi
The necessary data for the Kaldi toolkit have been extracted from the pronunciation dictionary and can be downloaded as well as a simple text version of the dictionary.
FThe pronunciation dictionary for ASR will be updated on a regular basis, along with the data for work with Kaldi.

Using the Pronunciation Dictionary for ASR
You can download the Pronunciationa Dictionary for ASR here. Prospective users must register and accept the terms and conditions. The texts are accessible through a CC BY 4.0 licence.

Anna Björk Nikulásdóttir
Jón Guðnason