Contents | Start | End | Previous: Appendix F: Speech Profile Reference | Next: Change Log


Appendix G: Alphabet Description Reference

This topic describes the XML-based language that Jutoh uses to specify lexicon alphabets.

About alphabet descriptions

The lexicon editor makes use of alphabet descriptions, so that the phonetic vocabulary can be shown in the lexicon entry dialog; it is also used to translate phonemes between alphabets. This appendix describes the syntax of these descriptions. To add your own alphabet descriptions, place the XML files in the Alphabets folder under the Jutoh application data folder where Jutoh keeps its other settings. On Windows, this might be in c:\Users\<user name>\AppData\Roaming\Jutoh 2\Alphabets. Alphabets are read when first editing a lexicon in the current Jutoh session.

To add alphabets for selection in the user interface without adding a whole XML description file, edit the setting Preferences/Advanced/Lexicon alphabets.

Please note that this an experimental feature and there are few (and incomplete) alphabet descriptions as yet. Please contact us if you would like to help improve this feature.

Alphabet syntax

An alphabet file consists of an alphabet top-level element, containing phone elements that describe each phone of an alphabet. This is an example of part of an alphabet file:

<?xml version="1.0" encoding="utf-8"?>

<alphabet version="1.0" xml:lang="en-GB" name="Microsoft SAPI American English"

abbreviation="x-microsoft-sapi" description=""

vendor="Microsoft" phone-separator=" " syllable-separator="-" word-separator="&amp;" supports-equivalents="">

<phone value=" " example-word="" example-pronunciation="" description="Phone separator" equivalents=""/>

<phone value="-" example-word="" example-pronunciation="" description="Syllable separator" equivalents=""/>

</alphabet>

The attributes are defined as follows.

version is the alphabet file format version, usually 1.0.

xml:lang describes the language the alphabet is relevant for.

name is a long name for the alphabet.

abbreviation is the abbreviated alphabet name, such as x-sampa or x-microsoft-sapi.

description is text describing the alphabet or phone.

vendor is the alphabet vendor (if any).

phone-separator is the text used to separate phones.

syllable-separator is the text used to separate syllables.

word-separator is the text used to separate words.

supports-equivalent is a comma-separated list of alphabets that this alphabet can translate to. Each phone element will also have to have an equivalents attribute, defined as below.

equivalents is a comma-separated list of alphabet:phone pairs, indicating the equivalent phone for the given alphabet.

value is the phone representation.

example-word is a comma-separated list of example words.

example-pronunciation is a comma-separated list of pronunciations in the alphabet’s phones, corresponding to example-word.


Contents | Start | End | Previous: Appendix F: Speech Profile Reference | Next: Change Log