As some of you may know, I am one of those übergeeks who actually likes to create languages for fun. I even produce and host a podcast about the art of creating languages. During that podcast, one particular topic has come up tangentially more than once. That topic is romanization. Many of the constructed languages I have seen have quite odd romanizations, though most have been understandable. Of course, an odd romanization scheme is not necessarily a deal breaker: Indeed, quite a few natural languages have quite annoying problems with romanization -- particularly those language for which the Latin alphabet simply isn't well suited (and there are a great many of those.
It has struck me that there are four competing design goals that a language creator (or indeed, a field linguist) needs to consider when creating a romanization scheme. I will do my best to explain them:
- Elegance: One of my priorities is to have as elegant a romanization scheme as possible. This means trying my best to keep to a ratio of one grapheme per phoneme, minimize the number digraphs of diacritics, and over all make the romanization as simple as possible while expressing all the necessary information. Certain aspects of your language's phonology can affect just how elegant your romanization can be. For instance, if you have a large vowel inventory, you will have to resort to digraphs or diacritics. If you have a three-way voiced-voiceless-aspirated distinction, you are probably going to have to use digraphs for one part of that, and if you make any significant use of tone you are almost certain to use diacritics. This is also the pressure that militates against unnecessary apostrophes that have no phonetic use. Ultimately and elegant romanization will have as few graphemes as possible while still leaving the phonemes of any given word explicit and unambiguous.
- Accessibility: If you want your conlang to be appreciated by people who are not linguistically savvy (an uphill battle at the start) or use it in a context where non-linguists will need to read the words, such as in fiction, then your romanization needs to be accessible. This means that the graphemes you use should be easily understood by the target audience's language. For instance, and English speaking audience should fairly understand that <kh> represents /x/ or something like it, and will be less likely to make a mistake than if you use <ch> or <x>. However, for a Spanish-speaking audience, <j> is an even better choice, as it is used in Spanish exclusively for /x/. Accessible romanizations, like elegant romanizations, will try to reduce ambiguity, but for accessibility one needs to consider not only the ambiguity among the language's own phonemes, but with the target audience's language as well. Thus, languages that would use <c> for /k/ in all positions lose some accessibility with an English-speaking audience (though Welsh speakers would have no problem). I should note that accessibility need not militate toward giving readers the correct native pronunciation, which is often not possible purely through orthography (how do you tell an English speaker there is an ejective in a word without some explanation?). They merely need to be able to produce a passable approximation, or an appropriate Anglicization/Hispanicization/etc, particularly where proper names are concerned. How often do you hear a news announcer pronounce a foreign name in a non-Anglicized manner? How about when those names are not Spanish or French in origin?
- Aesthetics: Many language creators will use certain artistic preferences when designing an orthography. For instance, someone may not like the letter <y> and prefer to use <j> or <i> for all instances of /j/ for no other reason. In my experience, aesthetic considerations are among the most frequent reasons for language creators to make odd choices in romanization. Why else would Teonaht use <ht> for /θ/ if not for an odd aesthetic preference on the part of the author. And since artistic preferences are all over the map, a priority placed on aesthetics can lead to some pretty strange orthographies.
- History: This is not actual history, but world-internal history. Some conlangers derive their languages from real world languages written in the Latin alphabet, and thus understandably derive their spellings from those real world spellings. Others develop complex histories for their languages, and thus may decide to make certain choices based on spellings that would have made sense in earlier forms o the language, particularly when such choices jive with the native script. This seems much less common in constructed languages than in the real world, though part of that may come from the fact that many real-world romanization schemes were actually created at an earlier stage of the language (think of the Postal Map romanization of Chinese, which uses <k> for both /k/ and /tç/ because the sound change that produced /tç/ was still in progress when the romanization was devised).
The above design goals are by no means the only factors involved in creating a romanization. Obviously the phonology of a language is a key factor. As I mentioned above, many phonological choices can severely limit how elegant you can make your romanization, and it also can put a limit on how accessible it can be made. Certain phonological features might be treated differently depending on priorities, however. For instance, a conflict between elegance and accessibility to English speaker seems to be the reason some romanizations of Japanese represent /si/ as <si> and others write it as <shi> (though differing opinions on how to analize Japanese [ʃi] may also come into play -- romanizing natlangs is soo much more complicated).
Think of a language with heavy lenition. A conlanger who prioritized elegant romanizations would likely represent the lenited consonants the same as the underlying phonemes in all cases. Someone concerned with accessibility would probably represent the various lenited forms differently from the underlying phonemes. Someone interested in aesthetics would choose whatever they felt looked better, perhaps even creating a deliberately obtuse system for denoting lenited forms because they felt like it. And the historical conlanger might decide to represent them according to the older forms, perhaps before the sound changes leading to lenition occurred, thus producing something similar to the schema used by the elegant conlanger.
Some language creators may apply different design priorities in different areas. For instance, Tolkien bowed to aesthetics over accessibility when he chose to use <c> for /k/ in nearly all positions in his Elven languages, a fact known painfully by any fan who mistakenly pronounced Celeborn as /sɛlɛbɔ˞n/ and was corrected for it, but he admittedly introduced the dieresis for reasons of accessibility, saying it was to disambiguate vowels that could be interpreted by English speakers as part of a digraph, part of a diphthong rather than a sequential vowel, or silenced -- such as <e> at the end of a word after a consonant. (How successful he was is hard to say, given that English speakers often ignore diacritics.). I doubt that anyone could really be described as relying purely on one design parameter. Even someone who cares only about aesthetics might need some way to break a tie between two graphemes they like equally for a given sound.
My own preferences hew toward prioritizing elegance and accessibility, with English speakers as my target audience. Thus, I try to represent as many phonemes as possible with a single letter, never use <c> for /k/, only use <'> for the glottal stop, etc. As for the lenition example above, I would represent them as their underlying form except where the lenited forms also exist as phonemes in the language, in which case I would represent those phones as the phoneme associated with the lenited form. Thus, I strike a balance between elegance and accessibility. I don't necessarily advocate that position, as I cared much more about aesthetics and very little for elegance when I started conlanging, and I don't find a particular problem with it, despite my tendency to have negative feelings toward <c> for /k/. I hope that people who read this might simply use it to better understand people's romanization choices, or even as a way to think about their own choices, since, in my opinion, mindful art is often better art. And romanization really is an art, particularly in the world of conlanging.
EDIT: I made an error in the previous version of this post, claiming that Wade-Giles uses <k> for /tç/. In fact it actually uses <ch>, making it more-or-less up-to-date. If anything, Wade-Giles is simply less elegant than modern pinyin (with some attempt to be accessible, though it's difficult to make a Chinese romanization truly accessible).