Digital Speech and Signal Processing (DSSP) research group
Association research group Multimedia and Imaging Team (aOG MIT)
Electronics and Information Systems (ELIS) department

Faculty of Engineering
Ghent University (Belgium)
AUTONOMATA
In many modern applications such as directory assistance, name dialing, car navigation, etc. one needs a speech recognizer and/or a speech synthesizer. The former to recognize spoken user commands and the latter to pronounce information found in a database. Both components make use of phonetic transcriptions of the words to recognize/pronounce. In order to develop an application, the developer needs a tool that accepts words/sentences and that returns the phonetic transcriptions of these words/sentences. The core of such a tool is the so-called grapheme-to-phoneme converter, often abbreviated to a g2p.
A well known problem of the standard g2p's one can buy is that they perform rather badly on names (proper names, address items, brand names). Nevertheless, names play an important part in most of the mentioned applications..
One of the major goals of this project is to investigate whether for particular name categories it is possible to develop a phoneme-to-phoneme (p2p) post processor which can automatically correct a number of the mistakes being made by the standard g2p.
Another problem, especially in view of the recognition of names, is the existence of different pronunciations for the same name. These pronunciations often depend on the background (mother tongue) of the user. Typical examples are the pronunciation of foreign city names, foreign proper names, etc.
The second goal of the project is to collect a large number of name pronunciations and to provide manually corrected phonetic transcription of the name utterances. Together with meta-data on the speakers, this corpus can become a valuable resource in the research towards a better name recognition.


