Phonetic Writing Systems
l | s | I |
i | o | t |
k | m | |
e | e | l |
t | o | |
t | h | o |
h | i | k |
i | n | s |
s | g | |
. |
Since this page is about Japanese characters, you're going to need a Japanese font, such as MS Mincho, and a compatible browser to get much out of it beyond the Romaji section. Recent versions of Mozilla, Opera, and IE all display this page properly (at least for me) but I can't say anything with certainty about other browsers or earlier versions.
Before getting into the character sets used in Japanese, know that Japanese may be written horizontally or vertically. Horizontal writing is borrowed from the West and, as such, is read in rows, each row read left to right, starting with the topmost row and moving down (like this text). Vertical writing, the traditional Japanese form, is read in columns, each column read top to bottom, starting with the rightmost column and moving left, as shown in the demonstration to the right.
In any case, Japanese uses four different character sets. Here are three of them, in order of what is likely to be increasing foreignness from the perspective of the average Westerner. The fourth, kanji, is on its own page.
ローマ字 (ローマじ) Romaji
This one should be nothing new. It's just the Roman alphabet (the one English uses). It's rarely used in written Japanese, though it does show up occasionally. Though sometimes it appears for impact, or because ASCII tends to be less trouble for computers, the main use seems to be in providing sort of an intermediate level between Western languages like English and standard Japanese. This makes it very useful for those beginning to learn the language. There are a few things to watch out for, though. Just because it looks English doesn't mean it's pronounced like English (the vowels, at least, are closer to Spanish). Here are all the important pronunciation points that I can think of for now:
Vowels
- 'a' (short a): Similar to English short a, as in "father"
- 'aa' (long a): Same sound as 'a' but lasts longer.
- 'i' (short i): Similar to English long 'e', as in "beech", also similar to English short i, as in "ribbon". Additionally, when combined with a voiceless consonant (k, s, t, h, p) and followed by another voiceless consonant or (to a lesser extent) the end of a word, it tends to be weakly pronounced, so, for example, "ashita" tends to sound more like "ashta".
- 'ii' (long i): Same sound as 'i' but lasts longer. Closer to English long 'e'.
- 'u' (short u): Similar to 'oo' in English "boot", but not in "foot". Additionally, when combined with a voiceless consonant (k, s, t, h, p) and followed by another voiceless consonant or nothing, it tends to be very weakly pronounced, so, for example, "desu" tends to sound more like "des" (except when the pronunciation is exaggerated).
- 'uu' (long u): Same sound as 'u' but lasts longer.
- 'e' (short e): Similar to English long a, as in "cane", also similar to English short e, as in "English".
- 'ee' (long e): Same sound as 'e' but lasts longer.
- 'ei' : 'e' + 'i', virtually the same sound as "ee" (even native speakers can't always tell the difference). Classes and textbooks have told me that there is no difference, but listening closely to actual pronunciation, particularly in music, has convinced me that they are not identical.
- 'o' (short o): similar to English long o, as in "open"
- 'oo' or 'ou' (long o): same sound as 'o' but lasts longer. The difference between 'oo' and 'ou' may reflect either the kana spelling or the preference of whoever Romanized it, and has absolutely no effect on pronunciation in modern Japanese.
As you have likely noticed, a long vowel in Japanese (and in most European languages, for that matter) has the same sound as the short vowel but is voiced for a longer period of time. English oddities aside, there's a reason long vowels are called that.
Consonants
- 'k','z','t','d','p','m': Much like their English equivalents
- 'g': Like English g in "goat", but not in "gym"
- 's': Similar to English s, but never hissed or pronounced as 'z'
- 'sh': Similar to English, though 'sha' may sound more like 'sya', and so on for the other vowels. This is really the same consonant as 's', but the subtle difference from English 's' is more conspicuous when combined with 'i'.
- 'j': Sort of a cross between English 'z' and 'j' (at least that's my impression of it), though 'ja' is sort of a cross between 'zya' and 'jya', and so on for the other vowels. As indicated by the kana (see below), this is the "voiced" counterpart to 'sh', so also has a similarity to that sound. This is really the same consonant as 'z', but the subtle difference from English 'z' is more conspicuous when combined with 'i'.
- 'ch': Similar to English, though 'cha' is sort of a 'tya' sound, and so on for the other vowels. This is really the same consonant as 't', but the subtle difference from English 't' is more conspicuous when combined with 'i'.
- 'ts': More or less like in English "ants", for example. This is really the same consonant as 't', but the subtle difference from English 't' is more conspicuous when combined with 'u'. If it sounds like an 's', you're saying it wrong. "Tsunami" is not "sunami".
- 'dj', 'dz': Pronounced essentially the same as 'j' and 'z'. The spelling difference, when used, reflects the kana spelling and has little if any discernable effect on pronunciation in modern Japanese. It's not uncommon for Romanizations to use 'j' and 'z' even when the 'dji' and 'dzu' kana are used. These are really the same consonant as 'd', but the subtle difference from English 'd' is more conspicuous when combined with 'i' or 'u'.
- 'n': When combined with a vowel, essentially the same as English 'n'. Otherwise (when followed by a consonant, at the end of a word, or written with an apostrophe after it—this is the "syllabic n") it's still more or less an 'n', but may sound more like a nasalization of the previous vowel when at the end of a word or followed by a vowel, an 'm' when followed by 'm', 'p', or 'b', or 'ng' as in English "song" when followed by 'k' or 'g'. The variation is a fairly natural result of the surrounding sounds, and isn't worth worrying about.
- 'h': Similar to English h, but sounds more like an 'f' in 'hu'/'fu' (alternate Romanizations of the same character), since it's not quite the same as English 'h'. However, the particles は (wa) and へ (e) (check the grammar section for more on those) are sometimes Romanized as 'ha' and 'he', respectively, because they're written using those kana, probably for historical reasons. I don't much like that rendition, since it runs counter to the pronunciation, and one of the primary purposes of Romaji is to aid pronunciation.
- 'f': Like a cross between English 'h' and 'f' in 'hu'/'fu' (see 'h', above). It's a bit more 'f'-like when followed by a different vowel, which is uncommon and occurs only in borrowed words.
- 'b': Similar to English 'b', though it may also have a bit of English 'v' to it
- 'y': Always pronounced as a consonant as in English "yodel", never as a vowel as in "baby".
- 'l', 'r': This is the really fun one. It's a lot like a cross between English 'r' and 'l' with a bit of 'd' thrown in for good measure. You know how you press your tongue to the roof of your mouth behind your teeth to make an 'l' sound, but not for an 'r', even though they're basically the same otherwise? Try tapping your tongue on the top of your mouth, maybe a bit further back, for an instant while making either sound... that's about as well as I can describe how to do it. I've always thought it sounds more like English 'l' than 'r', but it's most often Romanized as 'r'. How much it sounds like either 'l' or 'r' also depends both on the surrounding sounds and on the speaker. I've heard some singers that pronounce it so much like an English 'l' I can't tell the difference, while others make it more 'r'-like.
- 'w': Only two characters in modern Japanese use this consonant. In 'wa', it's much like in English, but in 'wo', it's less pronounced and sounds almost, but not quite, the same as 'o'. Borrowed words may use nonstandard combinations of kana to come up with 'wu's and 'we's and such; feel free to pronounce these as written.
- Doubled consonants: In theory, the doubled consonant is held longer. This works fine for sounds like 's' that can be prolonged, but for sounds like 'k', the net effect is that the second consonant is pronounced and the first acts more as a pause, with the preceding vowel cut off abruptly. This effectively strengthens the consonant sound. In any case, the consonant is not actually said twice.
Miscellany
- Katakana is often Romanized in all capitals while hiragana and kanji are usually assigned lowercase. This preserves the emphasis that katakana usually represents (see the katakana section for more).
- Japanese is based on syllables, though linguists insist that they're morae, not syllables, because of some obscure difference between the two terms. Regardless, the point is that each syllable, or mora if you prefer, is pronounced for (roughly) the same amount of time when said correctly (at least officially; there are of course variations in actual usage, such as when someone elongates part of a word for emphasis). I could try explaining how you can pick out the morae from a word, but it's easier to just check a kana chart (like those below) to see what they are.
- Japanese has a pitch accent rather than a stress accent, which basically means that, instead of one syllable being pronounced louder/longer as in English, each mora is said with essentially the same volume and duration, with some pronounced at a higher or lower pitch than others. Which morae have which pitch apparently varies regionally and rarely makes enough difference to matter much (unlike in Chinese, which I've heard places critical importance on pitch). Pitch accents warrant only a single paragraph in my Japanese textbook, and at worst, fouling up pitches in Japanese will make your speech sound somewhat awkward and unnatural, not mangle the meaning beyond all recognition.
- The best way to learn to pronounce Japanese is to listen to it. Audio resources are available in numerous places on the Internet and probably in libraries. If you happen to know a native Japanese speaker, or even an experienced non-native, even better.
Because different people think differently, there are several different Romanization schemes. Several official ones, even. I cover those differences and my personal preferences in the section on hiragana.
Back to top片仮名 (かたかな) Katakana
This character set is primarily used to write words borrowed from other languages. The top two languages borrowed from are English and Portuguese (not counting Chinese, which usually falls under kanji). However, just because you know an English word that Japanese borrowed doesn't mean you'll be able to pick it out. Since the sounds don't match exactly, words usually have to be adapted to fit the kana available—like ice cream → アイスクリーム (AISU KURIIMU), try saying it out loud, keeping in mind that way Romaji is pronounced—and since there are hardly any redundant sounds in Japanese, homonyms and near-homonyms from other languages end up with the same kana (like "race" and "lace", both レース).
Katakana is additionally used for emphasis, scientific names, sound effects, and possibly other purposes that I haven't come across yet or can't think of at the moment, so don't assume that all words in katakana must automatically be borrowed. It's sort of like the italics of Japanese.
Anyway, here's the standard katakana chart and some extended characters (actually variations of the standard in most cases), with my preferred Romanization (more on that a bit later). The kana invented to better accommodate foreign words are relatively recent and therefore less common, and often not completely standardized. However, I have seen many of them at least occasionally in actual usage.
Standard chart | ||||
---|---|---|---|---|
ア a | イ i | ウ u | エ e | オ o |
カ ka | キ ki | ク ku | ケ ke | コ ko |
サ sa | シ shi | ス su | セ se | ソ so |
タ ta | チ chi | ツ tsu | テ te | ト to |
ナ na | ニ ni | ヌ nu | ネ ne | ノ no |
ハ ha | ヒ hi | フ fu | ヘ he | ホ ho |
マ ma | ミ mi | ム mu | メ me | モ mo |
ヤ ya | ユ yu | ヨ yo |
||
ラ ra | リ ri | ル ru | レ re | ロ ro |
ワ wa | ヰ wi | ヱ we | ヲ wo |
|
ン n or n' |
Other morae | ||||
---|---|---|---|---|
ガ ga | ギ gi | グ gu | ゲ ge | ゴ go |
ザ za | ジ ji | ズ zu | ゼ ze | ゾ zo |
ダ da | ヂ dji | ヅ dzu | デ de | ド do |
バ ba | ビ bi | ブ bu | ベ be | ボ bo |
パ pa | ピ pi | プ pu | ペ pe | ポ po |
ー (long vowel mark) |
||||
ッ (gemination mark) |
2-charcter morae | ||||
---|---|---|---|---|
キャ kya | キュ kyu | キョ kyo |
||
ギャ gya | ギュ gyu | ギョ gyo |
||
シャ sha | シュ shu | ショ sho |
||
ジャ ja | ジュ ju | ジョ jo |
||
チャ cha | チュ chu | チョ cho |
||
ヂャ dja | ヂュ dju | ヂョ djo |
||
ニャ nya | ニュ nyu | ニョ nyo |
||
ヒャ hya | ヒュ hyu | ヒョ hyo |
||
ビャ bya | ビュ byu | ビョ byo |
||
ピャ pya | ピュ pyu | ピョ pyo |
||
ミャ mya | ミュ myu | ミョ myo |
||
リャ rya | リュ ryu | リョ ryo |
Invented morae | |||||||||
---|---|---|---|---|---|---|---|---|---|
ヴァ va | ヴィ vi |
ヴ vu | ヴェ ve |
ヴォ vo |
|||||
クァ kwa | グァ gwa |
クィ kwi | グィ gwi |
クェ kwe | グェ gwe |
クォ kwo | グォ gwo |
||
キェ kye | ギェ gye |
||||||||
スィ si | ズィ zi |
シェ she | ジェ je |
||||||
ツァ tsa | ツィ tsi |
ドゥ or デュ du |
トゥ or テュ tu |
ツェ tse | ツォ tso |
||||
ティ ti | ディ di |
チェ che |
|||||||
ニェ nye |
|||||||||
ファ fa | フャ fya | フィ fi |
フュ fyu | フェ fe | ヒェ hye |
フォ fo | フョ fyo |
||
ビェ bye | ピェ pye |
||||||||
ミェ mye |
|||||||||
リェ rye |
|||||||||
ウィ wi |
ウェ we |
ウォ wo |
- Note that the ァ, ィ, ゥ, ェ, ォ, ャ, ュ, and ョ used in combinations are written the same way as the full-sized characters ア, イ, ウ, エ, オ, ヤ, ユ, and ヨ, but smaller. The double consonant character ッ is similarly a smaller version of the character ツ.
- Though the character ヶ appears to be a small ケ (and is typically input to computers as though it were), it's not actually a kana at all, but shorthand for the kanji 箇 or 个 and usually pronounced か (ka), が (ga), or こ (ko). There's also a ヵ character, which can be used in its place when the pronunciation is か (ka), though apparently purists are against it.
- The characters ヰ (wi) and ヱ (we) are not used in modern Japanese. Borrowed words use ウィ and ウェ instead.
- ヲ is only used when the particle "wo" is written in katakana instead of hiragana. Borrowed words use ウォ instead.
- ー, called the Katakana-Hiragana Prolonged Sound Mark in Unicode and 長音符 (chouonpu, literally long vowel mark) in Japanese, is normally used to indicate long vowels in katakana, though words that are normally written in hiragana or kanji tend to use vowels instead. Thus, キー has a long 'i' sound and is Romanized as KII. When Japanese is written vertically, the ー character becomes a vertical mark. The ー is not the same as the dash ―.
- ッ, officially called the 促音 (sokuon, literal meaning similar to "urge sound") and often referred to descriptively as the 小さい「つ」 (chiisai tsu = small 'tsu'), more or less extends the following consonant sound backward the way ー extends the preceding vowel sound forward (the technical term for this is gemination). Many consonants don't extend well, though, so it ends up being more like a pause much of the time. Additionally, when an utterance ends with a ッ, there is no consonant to extend. In these cases, it indicates an abrupt cutoff of the sound before it (a glottal stop). Finally, a double 'n' is written with a ン, not a ッ.
- As if there weren't enough nonstandard kana already, written sound effects and similar cases may make up even more. ア゛ーーッ! could be a strangled scream, for instance.
Converting from other languages
What makes katakana so interesting and useful even if you don't know a word of Japanese is that, as explained above, it's usually used to write words that aren't Japanese in origin. Especially in recent years, more katakana words are borrowed from English than from any other language, and video games (just to give an example) frequently give English names to items, skills, etc. If you know katakana and understand how words tend to be adapted, you stand a good chance of being able to figure out the original word. Here are some of the conventions generally used to convert English (specifically, though much of this applies to other languages as well) words to katakana.
- English short vowels are often unchanged, in the sense that the
Romanization has the same letter for it as the original English.
- memo → メモ (MEMO)
- opera → オペラ (OPERA)
- pajamas → パジャマ (PAJAMA)
- Other vowel sounds tend to come out as whatever sounds
the closest to the source word. The most notable is that English
long 'i' is equivalent to Japanese 'a'+'i'.
- queen → クイーン (KUIIN)
- science → サイエンス (SAIENSU)
- blade → ブレイド (BUREIDO)
- lightning → ライトニング (RAITONINGU)
- More often than not, pronunciation is what matters, not spelling.
However, some words treat the spelling as Romaji and go from
there, which usually distorts the pronunciation significantly. Since
the kana-ization rules change, and are not universally agreed on,
some words have several katakana versions.
- aura → オーラ (OURA) (common, based on pronunciation) or アウラ (AURA) (uncommon, based on spelling)
- pizza → ピザ (PIZA) (common), ピッツァ (PITTSA) (somewhat common), or ピッツア (PITTSUA) (uncommon)
- The 's' in words that are typically used in the plural is often
dropped (as Japanese generally ignores the concept of plural),
but may be kept instead. Whatever works, I guess.
- pajamas → パジャマ (PAJAMA)
- shoes → シューズ (SHUUZU)
- sports → スポーツ (SUPOUTSU)
- As you may have noticed, there are numerous combinations of
consonants that simply aren't possible in Japanese. Most of the time,
the problem of having too many consonants in one place is solved
by adding the fairly weak vowel 'u' as needed. 't' and 'd' usually
become ト and ド in these cases to avoid 'tsu' and 'dzu', and 'n'
(and sometimes 'm' as well) becomes ン (ヌ is almost never used).
This pattern also applies when a word ends in a consonant and
when a vowel is silent in the English. Note that no vowel is added
where none is needed.
- mint → ミント (MINTO)
- McDonald's → マクドナルド (MAKU DONARUDO)
- instant → インスタント (INSUTANTO)
- knife → ナイフ (NAIFU)
- computer → コンピューター (CONPYUUTAA)
Sample exceptions:
- sport → スポーツ (SUPOUTSU), not スポート (SUPOUTO)
- salad → サラダ (SARADA), not サラド (SARADO)
- Sometimes, consonants are doubled in Japanese when these
extra vowels are added. I'm not sure exactly how to tell when this
will happen, but it seems common with ending 't' and 'd' sounds
(unless they come after ン) and when the vowel is too prominent
(I know, that's entirely too subjective). There might be a more
precise rule, but I doubt it. In any case, here are a few....
- apple → アップル (APPURU)
- hit → ヒット (HITTO)
- L and R sounds both become 'r'.
- delta → デルタ (DERUTA)
- wrist → リスト (RISUTO)
The exception is that vowel+'r' combinations (in "car", "oar", etc.) are usually treated as vowel sounds. 'ar', 'er', 'ir', and 'ur' sounds usually become a long 'a', and 'or' usually becomes a long 'o'.
- car → カー (CAA)
- bluebird → ブルーバード (BURUUBAADO)
- cork → コーク (COUKU)
- Using ヴ for 'v' is a recent concept, and fairly uncommon. Most
words with a 'v' sound use 'B' instead, especially if they've been
around for a while.
- video → ビデオ (BIDEO)
- drive → ドライブ (DORAIBU)
- Japanese has no 'si' sound, so シ is used for both 'shi' and 'si'.
スィ may be used occassionally but is very uncommon.
- simple → シンプル (SHINPURU)
- cinnamon → シナモン (SHINAMON)
- shield → シールド (SHIIRUDO)
- Japanese has no equivalent for either pronunciation of 'th'.
The soft 'th' as in "thought" and "bath"
generally becomes 's', while the hard 'th' found in "this"
and "that" tends to become 'z'.
- thunderbird → サンダーバード (SANDAABAADO)
- rhythm → リズム (RIZUMU)
- Words may be abbreviated, especially in popular names,
and particularly in techinology and video games.
- American football → アメフト (AMEFUTO)
- upload, update → アップ (APPU)
- pocket monster → ポケモン (POKEMON)
Reverting to other languages
Since some tweaking goes on, it's understandable that it can be difficult to decypher a borrowed word, particularly on unusual borrows such as those often found in video games. Here are some common points of confusion.
- Added vowels: Since many words need to add vowels when borrowed, any given short 'u' (or 'o' after 't' or 'd') many or may not be from the original. It helps to check against the possibilities and see what makes the most sense in context.
- Ambiguous consonants: Since 'l' and 'r' both become 'R', 's' and soft 'th' both become 'S', 'z' and hard 'th' both become 'Z', 'b' and (usually) 'v' become 'B', and 'si' and 'shi' both become 'SHI'. It's ambiguous which consonant is appropriate in these cases. Again, it helps to check and see what makes sense. The translators for "Lufia 2" apparently didn't do this (though I enjoyed the game anyway) and came up with monsters like the "Iron gorem" (should be "Iron golem") and "Asashin" (should be "Assassin").
- Vowel sounds in general: This can get hideous in translations. Is that long 'A' supposed to be 'ar', 'er', 'ir', 'ur', just an extended 'a', or none of them? Is this long 'O' a long 'o', an 'or', or something else? When the party encounters monsters called オーク, are they oaks or orcs? What do you do with vowel sounds that people are likely to mispronounce no matter how you Romanize them?
- English words that sound the same but have different meanings, especially when the spellings are also different, only make things worse. Should ベア be "bear" or "bare"?
- Mix and match for more confusion. Is ロード "load", "lode", "lord", "road", or "rode"?
- All this gets even worse when something needs to be written "in English" that, like many character and place names, isn't derived from any existing word. Here are just a few that have been argued about: Is クレス (Tales of Phantasia) Cless or Cress? In FF7, is エアリス Aeris or Aerith? What do you do with クルル, from FF5 (I've seen Cara, Krile, and the plain Romanization Kururu)?
- Since Japanese rarely uses spaces, one chunk of katakana may actually be two or more words. As just one example, this seems to be the cause of a typo in the Wild Arms 3 manual that reads "forcibility" where it clearly should say "force ability" (top of page 32 if anyone's curious), and this is even though it correctly says "force ability" further down the page.
- As mentioned above, borrowed words are often shortened or otherwise modified. While they usually aren't that hard to figure out, like ファミコム being a fami(ly) com(puter), i.e., video game system, some borrowed words are counterintuitive from an English point of view. For example, パンツ isn't "pants" like you might expect, it's (usually) actually underpants. ズボン (trousers, from the French "jupon"), ジーンズ (jeans), and トレーニングパンツ (sweatpants, from "training pants") are better choices when talking about pants in Japan. Another confusing example is that while マンション looks like it should mean "mansion", and even comes from the word, it actually refers to an apartment.
平仮名 (ひらがな) Hiragana
This is the most commonly used phonetic character set in Japanese writing. Any Japanese word can be written using only hiragana. Hiragana represent the same sounds as katakana, but the sounds added to better fit borrowed words don't normally apply to hiragana, which is almost never used for borrowed words. The only situation I can think of that a borrowed word would be written in hiragana is if it needs special emphasis for some reason, and while this does sometimes occur, it's uncommon. So here's the hiragana chart.
Standard chart | Other morae | 2-character morae | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
あ a | い i | う u | え e | お o |
きゃ kya | きゅ kyu | きょ kyo |
|||||||
か ka | き ki | く ku | け ke | こ ko |
が ga | ぎ gi | ぐ gu | げ ge | ご go |
ぎゃ gya | ぎゅ gyu | ぎょ gyo |
||
さ sa | し shi | す su | せ se | そ so |
ざ za | じ ji | ず zu | ぜ ze | ぞ zo |
しゃ sha | しゅ shu | しょ sho |
||
た ta | ち chi | つ tsu | て te | と to |
だ da | ぢ dji | づ dzu | で de | ど do |
じゃ ja | じゅ ju | じょ jo |
||
な na | に ni | ぬ nu | ね ne | の no |
ば ba | び bi | ぶ bu | べ be | ぼ bo |
ちゃ cha | ちゅ chu | ちょ cho |
||
は ha | ひ hi | ふ fu | へ he | ほ ho |
ぱ pa | ぴ pi | ぷ pu | ぺ pe | ぽ po |
ぢゃ dja | ぢゅ dju | ぢょ djo |
||
ま ma | み mi | む mu | め me | も mo |
にゃ nya | にゅ nyu | にょ nyo |
|||||||
や ya | ゆ yu | よ yo |
ひゃ hya | ひゅ hyu | ひょ hyo |
|||||||||
ら ra | り ri | る ru | れ re | ろ ro |
びゃ bya | びゅ byu | びょ byo |
|||||||
わ wa | ゐ wi | ゑ we | を wo |
ぴゃ pya | ぴゅ pyu | ぴょ pyo |
||||||||
ん n or n' |
みゃ mya | みゅ myu | みょ myo |
|||||||||||
っ (gemination mark) |
りゃ rya | りゅ ryu | りょ ryo |
- As in katakana, the small characters ゃ, ゅ, ょ, and っ written just like the larger equivalents, except for their size.
- The ー is occasionally used to indicate long vowels in hiragana, but long vowels are normally indicated, unsurprisingly, by adding another of the vowel that is to be lengthened. The exception is that a long 'o' is usually written using う, though some words use お because of the kanji involved. Also, an 'e' followed by an 'i' is very nearly the same as a long 'e', but not quite identical no matter how much my references insist otherwise.
- The characters ゐ (wi) and ゑ (we) are not used in modern Japanese.
Voiced, Unvoiced, and Semi-Voiced
Those funny little marks:
By now you've probably noticed that many of the basic kana have other kana that look the same except for a few little marks in the corner. There's a reason for that. The consonants 'k', 's', 't', and 'h' are what linguists call "unvoiced" or "voiceless" consonants, which means that they are pronounced without the use of the vocal chords. Adding the mark ゛, called the 濁点 (dakuten) or informally the てんてん (ten ten = dot dot), to kana with these consonants produces the equivalent "voiced" consonants 'g', 'z', 'd', and 'b'. As you may have guessed, voiced consonants are those that require use of the vocal chords to pronounce. Additionally, kana with the 'h' consonant may also take the mark ゜, called the 半濁点 (handakuten) or informally the まる (maru = circle), to produce the 'p', a "half-voiced" consonant.
There are also several uses of the dakuten that don't quite fit the normal usage. The katakana ウ (u) may appear with a dakuten as ヴ to represent a 'vu' sound, though the 'b' consonant is used for 'v' just as often. In addition, kana that cannot normally have a dakuten may be written with one when indicating abnormal or distorted noises similar to the base kana. For instance, あ゛ seems to be fairly popular for rendering strangled shouts.
It seems that linguists also use the handakuten on 'k' kana to represent an 'ng' sound, but I've never seen it personally. Anyway, 'ngu' would look like く゜, for example.
Sorting
The basics:
The usual ordering is called 五十音順 (gojuu on jun = 50-sound order) after the kana table (which originally contained 50 sounds rather than the modern 45), or あいうえお順 (aiueo order) after the first row of kana, much as English alphabetical order is also called the ABC order.
Plain hiragana follow the order of the standard kana chart: あいうえおかきくけこさしすせそたちつてとなにぬねのはひふへほまみむめもやゆよらりるれろわゐゑを. This much is fully standardized. ん doesn't exactly fit into the standard chart, but typically comes after を.
The kana は (ha) and へ (he) are considered to be the same even when used as particles and pronounced as 'wa' and 'e', respectively.
Except for tiebreaking purposes, all variants of a kana are treated as the same character. Specifically, a hiragana character and the equivalent katakana character are considered the same, unvoiced (は) and voiced (ば) and semivoiced (ぱ) kana are considered the same, and normal-sized (つ) and reduced-sized (っ) kana are considered the same. This is somewhat similar to upper-case and lower-case English letters being considered the same except for tiebreaking purposes, if more complicated.
The ヴ character invented to handle 'v' sounds in foreign words is tyipcally handled as a "voiced" ウ, if only because that's what it looks like. Some instead treat ヴァ as a variant of バ, etc., but while this has the advantage of placing very similar sounds together, it breaks with the usual method of handling each kana separately.
As in English, [end of term] comes before any character. In other words, shorter terms come before longer ones that start out the same, and 'same' in this case means the same base kana, ignoring any variants. To give concrete examples, くろ comes before ぐろう or クロウ, each of which come before クロウチ. This is much like in English sorting, where "an" comes before "ant", which comes before "antihero".
Kanji have no effect on ordering, in the sense that the kanji themselves do not matter, except when the kanji themselves are being sorted, rather than terms. Kanji terms are sorted by their reading, the way they would appear if written in kana.
Tiebreakers and other tricky stuff:
As noted previously, hiragana and katakana, unvoiced, voiced, and semi-voiced kana, and regular and small kana are all considered equivalent when not directly competing, and the ー complicates things further. So what happens if two items are identical except for one of these equivalent characters? This is where the tiebreaking comes into play. Unfortunately, the system for doing so appears to be somewhat less than universal.
- Unvoiced kana come before voiced kana. Semi-voiced
(p-row) kana come after both. This is a standard rule.
- つく before つぐ
- ハイン before バイン before パイン
- Hiragana usually comes before katakana.
- あんな before アンナ
- しゃい before シャイ.
- I'm not sure how kanji vs. kana figures into this... presumably words written in kana come before those in kanji as part of the tendency to place basic unmodifed hiragana before anything else.
- Large (normal) kana may come either before or after their
shrunken equivalents, as long as the sorting is consistent within
the dictionary/index/whatever. I get the impression that large
before small is considered more correct, but since computerized
character encodings put the small kana before their large
equivalents, machine-sorted lists put small before large, and
indifference takes over. Personally, I think it makes sense to
sort the large kana first, in keeping with the tendency to place
basic unmodifed hiragana before anything else.
- びよういん before びょういん
- きやく before きゃく
- かつて before かって
- Most handle the ー symbol for indicating long vowels as equivalent to the extended vowel, but others consider it equivalent to no character and effectively drop it when sorting, like the English hyphen. Rarely, it will instead be handled as a completely different character and sorted after ん, which I consider to be very poor handling since it puts words that are phonetically identical far apart in sort order. Even machine sorting usually knows better.
As if all that weren't a big enough mess already, there's the question to do if the rules you're using conflict. For example, if unvoiced comes before voiced and hiragana comes before katakana, which comes first, が (hiragana, but voiced) or カ (unvoiced, but katakana)? Again, there don't seem to be any standardized rules here. Fortunately, this sort of conflict is relatively uncommon, especially in indices and informal lists that aren't likely to spell out their rules. Dictionaries will typically describe what conventions they use.
While I'm no dictionary, I do think it makes sense to define an ordering system, even if I never need to use the full details of it. The examples given in the following steps are invented for convenience and unlikely to correspond to actual words.
- Sort first by the base kana, putting shorter terms before longer terms that begin with the same base kana. Regard each kana as an individual unit, regardless of whether or not it's part of a compound sound (きゃ, ヴィ, etc.). For now, regard all variants as the same kana, ignoring voicing, size, and character set. For now, also regard the long vowel marker ー as identical to the preceding vowel sound, including 'e' and 'o', even though those could be Romanized as 'ei' and 'ou'.
- かあき ⇒ カーキク ⇒ かーきくけ ⇒ カアキクケコ
- ちゃつ ⇒ ちやつて ⇒ ちゃってと ⇒ ちやってとた
- はひ ⇒ ばひふ ⇒ はぴぶへ ⇒ ぱひふへほ
- If any two (or more) terms are regarded as identical so far but are not written identically, then within these terms, sort unvoiced before voiced and voiced before semi-voiced. If more than one mismatch occurs, all earlier mismatches count as larger differences than all later ones. Regard ヴ as a voiced ウ.
- さしす ⇒ さしず ⇒ さじす ⇒ ざしす ⇒ ざしず
- かきく ⇒ カキグ ⇒ がきく ⇒ ガキグ ⇒ ガギグ
- ちゃふ ⇒ ちやぶ ⇒ ちゃぷ ⇒ ぢゃぶ ⇒ ぢやぷ
- If any two (or more) terms that are not written identically are still regarded as identical, then within these terms, sort normal-sized kana before small ones. If more than one mismatch occurs, all earlier mismatches count as larger differences than all later ones.
- キヤフオテイ ⇒ キヤフオティ ⇒ キヤフォテイ ⇒ キャフオティ ⇒ キャフォティ
- きやつえ ⇒ キヤツェ ⇒ きゃつえ ⇒ キャツェ
- If any two (or more) terms that are not written identically are still regarded as identical, then within these terms, sort hiragana before katakana and both before kanji (the long vowel marker counts as whatever the preceding character is). If more than one mismatch occurs, all earlier mismatches count as larger differences than all later ones.
- あいうえお ⇒ あいうエお ⇒ あいウえオ ⇒ あイうえお ⇒ アイウえお ⇒ アイウエオ
- えーのー ⇒ ええのオ ⇒ えーノー ⇒ えエのー ⇒ エエノオ
- If any two (or more) terms that are not written identically are still regarded as identical, then within these terms, sort actual kana before the long vowel marker. If more than one mismatch occurs, all earlier mismatches count as larger differences than all later ones.
- パアトナア ⇒ パアトナー ⇒ パートナア ⇒ パートナー
- If any two (or more) terms that are not written identically are still regarded as identical, then within these terms, I give up and sort them at random. The only case I can think of when this would occur is when they have identical kana, but different kanji. While there are several kanji-sorting schemes, I'm not familiar enough with any to attempt to use them.
いろは order:
An alternate order exists but is rarely used for sorting. Actually a poem known as the いろは (Iroha) after its first three kana, it is remarkable primarily for using each of the 47 kana in use at the time exactly once. The poem is traditionally divided into lines as follows, though this results in breaking up several words:
いろはにほへと
ちりぬるをわか
よたれそつねな
らむうゐのおく
やまけふこえて
あさきゆめみし
ゑひもせす
Though this order is uncommon for sorting, the kana sometimes appear in this order as labels for an ordered list, for example.
For the curious, there is an online classical Japanese database with translations of the いろは.
Back to topRomanization conventions
There are at least three different major Romanization schemes in use, and that's not counting all the variants from people (like me) who don't care much what's official. Here's a quick guide to certain variants that I'm aware of and which ones I normally use.
Kana | Variants | My preference |
---|---|---|
しゃ/シャ | sya, sha, shya | sha |
し/シ | si, shi | shi |
しゅ/シュ | syu, shu, shyu | shu |
しょ/ショ | syo, sho, shyo | sho |
じゃ/ジャ | zya, jya, ja | ja |
じ/ジ | zi, ji | ji |
じゅ/ジュ | zyu, jyu, ju | ju |
じょ/ジョ | zyo, jyo, ju | jo |
ちゃ/チャ | tya, cha, chya | cha |
ち/チ | ti, chi | chi |
ちゅ/チュ | tyu, chu, chyu | chu |
ちょ/チョ | tyo, cho, chyo | cho |
ぢゃ/ヂャ | dya, dja, djya, ja, jya | dja |
ぢ/ヂ | di, dji, ji | dji |
ぢゅ/ヂュ | dyu, dju, djyu, ju, jyu | dju |
ぢょ/ヂョ | dyo, djo, djyo, jo, jyo | djo |
つ/ツ | tu, tsu | tsu |
づ/ヅ | du, dzu, zu | dzu |
ふ/フ | hu, fu | fu |
を/ヲ | wo, o | wo |
ん/ン | n' always, n always, n' when ambiguous but n otherwise, nn (thanks to typing conversions) |
n' when ambiguous but n otherwise |
ら/ラ | ra, la | ra |
り/リ | ri, li | ri |
る/ル | ru, lu | ru |
れ/レ | re, le | re |
ろ/ロ | ro, lo | ra |
'A'+ー | AA, A-, Â, Ā | AA |
'a'+あ | aa, â, ā | aa |
'I'+ー | II, I-, Î, Ī | II |
'U'+ー | UU, U-, Û, Ū | UU |
'u'+う | uu, û, ū | uu |
'E'+ー | EE, EI, E-, Ê, Ē | EE |
'O'+ー | OO, OU, OH, O-, Ô, Ō | OU |
'o'+お | oo, oh, ô, ō | oo |
'o'+う | oo, ou, oh, ô, ō | ou |
っち/ッチ | cchi, tchi | tchi |
None of this matters when a term has an official Romanization. 東京 is "Tokyo" even though it should be "Toukyou", ローマ字 is "Romaji" instead of "Roumaji", etc.
All others use the renderings given on the kana charts above. The only exceptions are that I typically Romanize the particles は and へ as 'wa' and 'e', respectively, because that's how they're pronounced, regardless of the kana. Some insist on using 'ha' and 'he' due to the kana, and while that arguably has some merit, it obscures the pronunciation rather than indicating it.
As I see it, my combination of choices has the advantage of approximating the English sounds while assigning a different Romanization to every common mora, with the exception of を/ヲ and ウォ, which doesn't matter much because ウォ is only used for borrowed words, while を/ヲ is never used for borrowed words (well, maybe not never, but close enough). This should go without saying, but I use my preferences throughout the site, so get used to them.
What I mean by n being ambiguous at times is with such kana as に, んい, and んに. They all clearly need an 'i' and an 'n' or two, but all three are different and even have different pronunciations. If you make ん always "n", then they're "ni", "ni", and "nni", which ignores the difference between に and んい. On the other hand, if it's always "n' ", you get "ni", "n'i", and "n'ni", which, for んに, is redundant and funny-looking, not to mention that it leaves a lot of words with an apostrophe on the end. I prefer "ni", "n'i", and "nni" for these reasons. It's the same thing with にゃ, んや, and んにゃ, which I Romanize as "nya", "n'ya", and "nnya".
It probably makes more sense to write the 'r' row with 'l's, considering that I've always thought it sounds more like an 'l' anyway. The 'r' writing is so prevalent, though, that it's essentially uncontestable. Kind of like how モーグリ is a lot closer to "moagly", but "moogle" is too widely known to bother arguing about. This is kind of funny, considering that the Japanese government's official system of Romanization writes them with 'l's (this system is rarely used in practice).
My preference of 'OU' for 'O'+ー is purely because I hate seeing 'OO' for words that use it. This partly stems from seeing some people Romanize 'o'+う as 'oo', which goes entirely against the kana. ありがとう will never be "arigatoo" to me.
I also can't see writing を as 'o'. It's not the same sound as お, even if it is very close.
Back to top