When borrowing from Chinese isn't "Sinoxenic"

7 February 2016

1. What is Sinoxenic?
2. Non-Sinoxenic borrowing

1. What is Sinoxenic?

In around October last year, a rather curious question appeared on Quora:

If the Ainu language were to borrow words directly from Chinese, how would they differ from the other Sino-Xenic vocabulary while taking both certain Proto-Ainu sounds into account (like certain vowels) & modern survivors?

This interesting hypothetical question contains a flawed assumption: that vocabulary directly borrowed from Chinese is automatically "Sinoxenic vocabulary". This is not, in fact, the case. So-called "Sinoxenic vocabulary" is the result of a very particular mode of borrowing involving the wholesale adoption of the Chinese script. This is a very striking, possibly unique, mode of borrowing, and plays a large part in giving the major languages of East Asia their peculiar flavour and remarkable commonalities.

Sinoxenic vocabulary reflects systems of reading Chinese characters that were developed by the Vietnamese, Koreans, and Japanese well over a millennium ago. It is a product of the process of gaining literacy in Chinese at a time when it was the most prestigious language and dominant cultural vehicle of East Asia. It involved a wholesale attempt to master the Chinese written language, including virtually the entire Chinese literary vocabulary. This has left these languages in modern times with broad swathes of vocabulary, or building blocks of vocabulary, borrowed from Chinese. Japanese and, to a lesser extent, Korean, still use Chinese characters as a key element of their writing systems, and even in Vietnamese, which has abolished the use of Chinese characters, they loom as an unseen presence behind a latinised script.

There is a great deal that is hazy about the circumstances and details of how these vocabulary systems developed. Some of its distinctive features are as follows:

1. Sinoxenic vocabulary is the result of formal language learning. At least in the early stages, it involved the active transmission of literacy from teachers to students, either directly from Chinese teachers, or in some cases through intermediaries (notably in Japan, where Korean teachers may have been involved). Learning how to read Chinese involved not only the memorisation of characters, but also the recitation, reading, and (ideally) composition of Classical Chinese texts. (The concept that Sino-Vietnamese resulted from contact between Chinese-speaking teachers and Vietnamese-speaking learners has been questioned recently by John Duong Phan, who posits the existence of a bilingual situation in Vietnam.)

2. Characters were read by the Koreans, Japanese, and Vietnamese in as close an approximation to the Chinese pronunciation as was possible. However, pronunciations were inevitably modified to fit the different phonological systems of these languages. For instance, the Chinese character 六, which is believed to have been pronounced /luwk̚/, became two syllables, /ro-ku/, in Japanese pronunciation.

3. The adoption of Chinese characters involved borrowing the system as a whole, comprising literally thousands of characters. The main systems of readings for Chinese characters date from the second half of the first millennium -- Late Old Chinese to Late Middle Chinese in the case of Korean, and southern varieties of Late Middle Chinese in the case of Vietnamese. In Japanese, the wholesale borrowing of characters and their accompanying sets of pronunciations took place on two occasions, with the result that Japanese now has two main sets of readings: the go-on system (Late Old Chinese - Early Middle Chinese), which were probably adopted from a more southerly area of the northern dialect or possibly involved Korean teachers; and the later kan-on system (Late Middle Chinese), which was borrowed from the prestigious Chang'an dialect in the northeast. So systematised did these two sets of readings become that later dictionary-makers even tried to fill in gaps where the go-on or kan-on reading for a particular character was missing.

4. Systems of pronunciation eventually became conventionalised ways of reading texts, divorced from the contemporary Chinese pronunciation and probably passed on by local teachers. In Japan, which is geographically isolated from the Mainland, it's quite likely that many teachers and students of Chinese texts never heard Chinese spoken during their entire lives. Despite changes in pronunciation, character readings still generally run in parallel in Modern Standard Mandarin (MSM), Chinese dialects, and Sinoxenic languages, even where the actual physical pronunciation has diverged. The parallels become clear when characters are placed in lists:

(Table 1)

Characters	English	Chinese (MSM)	Japanese go-on	Japanese kan-on	Korean	Vietnamese
京	'capital'	jīng	きょう kyō	けい kei	경 gyeong	kinh
形	'shape'	xíng	ぎょう gyō	けい kei	형 hyeong	hình
情	'feeling'	qíng	じょう jō	せい sei	정 jeong	tình
明	'bright'	míng	みょう myō	めい mei	명 myeong	minh
停	'stop'	tíng	じょう jō	てい tei	정 jeong	đình
丁	'situation'	dīng	ちょう chō	てい tei	정 jeong	đinh
平	'flat, peaceful'	píng	びょう byō	へい hei	평 pyeong	bình

A second series:

(Table 2)

Characters	English	Chinese (MSM)	Japanese go-on	Japanese kan-on	Korean	Vietnamese
筆	'writing implement'	bǐ	ひち hichi	ひつ hitsu	필 pil	bút
蜜	'honey'	mì	みち michi	びつ bitsu	밀 mil	mật
質	'quality'	zhì	しち shichi	しつ shitsu	질 jil	chất
実	'actual'	shí	じち jichi	じつ jitsu	실 sil	thực, thật
日	'sun'	rì	にち nichi	じつ jitsu	일 il	nhật

Despite irregularities (including a few that I've omitted from the list), the parallels are remarkable. A study of these "Sinoxenic" readings of Chinese characters has been instrumental in modern scholars' attempts to elucidate the phonology of Middle Chinese.

5. Since Chinese characters were borrowed as a combination of form (the written character), meaning, and pronunciation (the 'reading' of the character), they were eventually taken in as morphemes in the host languages. These borrowed morphemes could then be used to create new Sinitic-style vocabulary by reading them with their respective conventionalised readings. Such vocabulary was available for borrowing between the different Sinoxenic languages. The Japanese, in particular, coined large amounts of Sinoxenic vocabulary in the modern era to render new concepts that had come from the West, and this has been widely borrowed into Korean, Chinese, and Vietnamese.

For example, the following words are shared between the four languages, based on the same combinations of morphemes:

(Table 3)

Characters	English	Chinese (MSM)	Japanese	Korean	Vietnamese
形態	'form'	xíngtài	けいたい keitai	형태 hyeongtae	hình thái
形狀	'shape'	xíngzhuàng	けいじょう keijō	형상 hyeongsang	--
狀態	'state'	zhuàngtài	じょうたい jōtai	상태 sangtae	trạng thái
狀況	'situation'	zhuàngkuàng	じょうきょう jōkyō	상황 sanghwang	trạng huống
形成	'formation'	xíngchéng	けいせい keisei	형성 hyeongseong	hình thành

(There are, however, limits, as not all languages have borrowed or use all combinations, and meaning and usage can differ.)

As a final note, Sinoxenic languages also have vocabulary that doesn't cleanly belong within the Sinoxenic framework. Vietnamese has a few hundred words that were borrowed before the systematic borrowing of Sinoxenic vocabulary took place. As a result, there are many forms or words that were borrowed twice, once as "Old Sino-Vietnamese", and later on a much comprehensive scale as "Sino-Vietnamese". In general, the earlier borrowings tend to be regarded by speakers as native Vietnamese forms, while the later Sinoxenic forms form a separate stratum of the Vietnamese vocabulary. A few examples include:

(Table 4)

Chinese (MSM)	Meaning	Old Sino-Vietnamese	Sino-Vietnamese
味 wèi	'flavour'	mùi	vị
黃 huáng	'yellow'	vàng	hoàng
鶯 yīng	'oriole'	anh	oanh

Japanese also has some vocabulary of this type, notably 梅 ume 'plum' and 馬 uma 'horse', from Middle Chinese /mwəj/ and /maɨX/ respectively. Note that an additional vowel /u/ has been added at the start of the word in both cases. Both words are regarded by speakers as 'native Japanese' vocabulary. In Korean, older layers of borrowed vocabulary do not form a separate stratum and are subsumed under Sino-Korean as a whole [But see Disqus note from "Geisendorf" below.]

2. Non-Sinoxenic borrowing

While the Sinoxenic model has traditionally held the limelight as the most distinctive and influential model for borrowing Chinese vocabulary, it is not the only model. Here I'll look at a language that has borrowed a considerable amount of vocabulary from Chinese but isn't a "Sinoxenic" language.

Chronologically, Mongolian borrowing of Chinese vocabulary took place later than that of Sinoxenic languages. The vast majority of words have been borrowed in the past 800 years, sourced from Early, Middle, and Modern Mandarin as spoken in northern China.

Modes of borrowing are not uniform. Some vocabulary was borrowed indirectly, such as the word for 'writing', ᠪᠢᠴᠢᠭ᠌ бичиг bichig, which appears to have entered from Turkic in ancient times but ultimately derives from Chinese 筆 (MSM bǐ 'writing brush'). The same Chinese word was later borrowed via Tibetan as ᠪᠢᠷ бийр biir 'writing instrument'.

In more recent times, by far the greater proportion of words has been borrowed directly. Some are significantly different from the Chinese pronunciation, a result of either great age, or of impressionistic auditory borrowing. One example is the word ᠴᠣᠩᠬᠣ цонх tsoŋx 'window', from Chinese 窗戶 chuānghu 'window'.

Another likely example is the word ᠲᠠᠢᠢᠪᠣᠩ тайван taivaŋ 'peace', which is supposedly from Chinese 太平 MSM tàipíng 'peace'. The traditional spelling (which equates to taibuŋ) makes no attempt to reproduce the original vowel in 平 píng.

Since the Qing dynasty, however, spellings in the traditional Mongolian script have generally come closer to the Chinese pronunciation. For example, ᠶᠠᠩᠬᠠᠨ янхан yaŋxaŋ 'prostitute', spelt yaŋxan in the traditional script, accurately reproduces the vowels and consonants of Chinese 養漢 yǎnghàn. Whether yaŋxan is how the Mongolians heard the word (auditory loan), or whether this represents an auditory loan that has been rectified to confirm with the Chinese pronunciation in its written form, is not clear. It is likely, however, that spelling conventions in the traditional script reflect familiarity with the Chinese written language among at least part of the Mongolian intellectual elite. This accords with the actual situation during the Qing, in which there was considerable intimate interaction between the Mongols (particularly in the south) and the Chinese cosmopolis.

The tendency for the written form to accurately reflect the Chinese pronunciation has become even more marked in modern times, to the extent that the transliteration of Chinese words into Mongolian has now been fully standardised in China, as found here and here.

While transliteration from Chinese is now carried out according to rigid rules, there is a crucial difference between Mongolian and Sinoxenic languages: Mongolian borrowing of Chinese vocabulary is not the result of an attempt to adopt Chinese as the literary language of the Mongols, and has never involved adopting the Chinese writing system as a whole. Mongolian has had its own scripts since the time of Genghis Khan and no consistent attempt has been made to use Chinese characters to write Chinese loan words in Mongolian. One of the most important documents in Mongolian history, the Secret History of the Mongols, survives only in a transliteration into Chinese characters, but these characters are used purely for their phonetic value, similar to but clumsier than Japanese kana. (In modern Inner Mongolia, however, with the teaching of Chinese in the educational system, bilingual Mongolians at times informally use Chinese characters to write Chinese words in a Mongolian context.)

Words also tend to be borrowed as single forms, without being analysed into constituent morphemes for use in further word-building. Borrowings are 'one-off' in nature rather than systematic. There are, of course, monosyllabic words that do not need further analysis and can be used as independent morphemes, such as ᠵᠢᠩ жин ǰiŋ 'weight, catty', which is from the Chinese word 斤 jìn 'catty'. But bisyllabic words from Chinese tend to be perceived as single units, much like native Mongolian words, which are generally composed of more than one syllable.

For example, both the archaic term ᠲᠠᠢᠢᠭᠠᠨ тайган taigaŋ 'eunuch' (written taigan) and ᠲᠠᠢᠢᠪᠣᠩ тайван taivaŋ 'peace' (written taibuŋ) contain the Chinese morpheme 太 tài, but this is not perceived as an identifiable morpheme in Mongolian. Similarly, ᠬᠣᠪᠣᠩ хоовон xoovoŋ 'brazier' (traditionally spelt xoboŋ) and ᠲᠦᠩᠫᠦᠩ төмпөн tömpöŋ 'pots and pans' (traditionally spelt töŋpöŋ) both contain the Chinese morpheme 盆 pén 'bowl', but it is unlikely that a Mongolian speaker would be conscious of a connection between -вон -voŋ and -пөн -pöŋ in the two words.

There are, of course, some morphemes that are clearly identifiable in borrowed vocabulary. One is the morpheme 匠 jiàng, which is used in a number of words relating to tradesmen, including the very common word мужаан muǰaaŋ 'carpenter'. The traditional spelling of this morpheme in Mongolian is -ᠵᠢᠶᠠᠩ (ǰiyaŋ), and even in Cyrillic it is normally -жаан -ǰaaŋ, making it an easily recognisable form. This is reinforced in words like түнжаан tünǰaaŋ 'brazier (copper worker)', which ignores vowel harmony and strengthens the perception of -жаан -ǰaaŋ as a productive morpheme. However, there are also forms where vowel harmony detracts from this unity of form, e.g., the word for a stone mason, which is pronounced as шожоон šoǰooŋ.

(Table 5)

Mongolian	Trad script	Meaning	Chinese characters	Chinese (MSM)	Japanese	Korean	Vietnamese
мужаан (muǰaan)	ᠮᠤᠵᠢᠶᠠᠩ (muǰiyaŋ)	'carpenter'	木匠	mùjiang	ぼくしょう, もくしょう bokushō, mokushō	목장 mokjang*	mộc tượng*
инжаан (inǰaan)	ᠢᠨᠵᠢᠶᠠᠩ (inǰiyaŋ)	'silversmith'	銀匠	yínjiang	ぎんしょう ginshō*	은장 injang*	ngân tượng*
тижаан (tiǰaan)	ᠲᠢᠵᠢᠶᠠᠩ? (tiǰiyaŋ?)	'ironworker'	鐵匠	tiějiang	てつしょう tetsushō*	철장 cheoljang*	thiết tượng*
түнжаан (tünǰaan)	ᠳᠥᠨᠵᠢᠶᠠᠩ (tünǰiyaŋ)	'brazier'	銅匠	tóngjiang	どうしょう dōshō*	동장 dongjang*	đồng tượng*
шожоон (šoǰoon)	ᠱᠤᠵᠢᠶᠠᠩ (šoǰiyaŋ)	'stone-mason'	石匠	shíjiang	せきしょう sekishō*	석장 seokjang*	thạch tượng*

While the traditional script often tends to highlight the original Chinese pronunciation, the Cyrillic orthography, which spells words as they are pronounced, serves to obscure any connection with Chinese. For example, the word ᠶᠠᠨᠲᠣᠩ (yantuŋ), from Chinese 煙筒 yāntong, is spelt яандан (yandan) in Cyrillic, considerably removed from the traditional spelling of yantuŋ.

Moreover, syllable-final н in the Cyrillic script is pronounced /ŋ/ in Mongolia, thus neutralising the earlier distinction between /ŋ/ and /n/ in this position and further obscuring the regularity of relationships with Chinese. This distinction between /ŋ/ and /n/ is maintained in Inner Mongolian.

A few examples of the way in which Mongolian words are a poor reflection of the Chinese original will make the situation clear. The main source is Монгол хэлний Хятад ормол үгийн судалгаа by Н. Балжинням, and I'm assuming for the sake of this exercise that the Chinese etymologies given are accurate. I've picked two sets of words that fall into typical patterns to illustrate the point. What is noteworthy is the way that Mongolian vocabulary follows set patterns that tend to obscure the patterns of the original Chinese.

The Chinese pronunciation given is that of the modern standard language, although many Mongolian borrowings may actually have been from local northern dialects of Chinese. Sinoxenic pronunciations are given for comparison. Asterisks indicate words that do not actually occur in the Sinoxenic language in question. There are also a few words where I don't have a reliable source for the traditional spelling. Note that final н in Cyrillic, which I've transliterated as 'n', is actually read ŋ.

Pattern One

(Table 6)

Mongolian	Trad script	Meaning	Chinese characters	Chinese (MS)	Japanese	Korean	Vietnamese
байван (baivan)	ᠪᠠᠢᠢᠪᠠᠩ (baibaŋ)	'vitriol'	白礬	báifán	はくばん hakuban*	백반 baegban	bạch phèn
тайган (taigan)	ᠲᠠᠢᠢᠭᠠᠨ(taigan)	'eunuch'	太監	tàijiàn	たいかん taikan	태감 taegam	thái giám
янхан (yanxan)	ᠶᠠᠩᠬᠠᠨ (yaŋxan)	'prostitute'	養漢	yǎnghàn	ようかん yōkan*	양한 yanghan	dưỡng hán*
зондон (zondon)	ᠵᠣᠩᠳᠠᠨ (ǰoŋdan)	'satin, silk'	妝緞	zhuāngduàn	そうどん sōdon*	장단 cangtan	trang đoạn*
хоовон (xoovon)	ᠬᠣᠪᠣᠩ (xoboŋ)	'brazier'	火盆	huǒpén	かぼん kabon*	화분 hwabun	hỏa bồn*
төмпөн (tömpön)	ᠲᠦᠩᠫᠦᠩ (töŋpöŋ)	'bowl, basin'	銅盆	tóngpén	どうぼん dōbon*	동분 dongbun	đồng bồn*
зайсан (zaisan)	ᠵᠠᠢᠢᠰᠠᠩ (ǰaisaŋ)	'clan or religious title'	宰相	zǎixiàng	さいしょう saishō	재상 jaesang	tể tương
паалан (paalan)	ᠫᠠᠯᠠᠩ (palaŋ)	'enamel'	珐琅 / 琺瑯	fàláng	ほうろう hōrō	법랑 beoblang	pháp lang
яандан (yandan)	ᠶᠠᠨᠲᠣᠩ (yantuŋ)	'pipe, chimney'	煙筒	yāntong	えんとう entō*	연통 yeontong	yên đồng*
тайван (taivan)	ᠲᠠᠢᠢᠪᠣᠩ (taibuŋ)	'calmness, peace'	太平	tàipíng	たいへい taihei	태평 taepyeong	thái bình
лууван (luuvan)	ᠯᠣᠣᠪᠠᠩ (luubaŋ)	'carrot'	蘿蔔	luóbo	らふく, らほく rafuku, rahoku	나복, 라복 nabog, labog	la bặc
цүүвэн (tsüüven)	ᠴᠦᠦᠪᠦᠩ (čüübüŋ)	'rent'	租費	zūfèi	そひ sohi*	조비 jobi*	to phí*
дааман (daaman)	ᠳᠠ ᠮᠧᠨ (da mEn)	'gates'	大門	dàmén	だいもん daimon	대문 daemun	đại môn*
хуасан (xuasan)	ᠬᠣᠸᠠᠱᠧᠩ (xuwašEŋ)	'peanut'	花生	huāshēng	かせい kasei*	화생 hwasaeng*	hoa sinh*

Pattern Two

(Table 7)

Mongolian	Trad script	Meaning	Chinese characters	Chinese (MS)	Japanese	Korean	Vietnamese
хулуу (xuluu)	ᠬᠣᠯᠣ (xulu)	pumpkin	葫蘆	húlu	ころ koro	호로 holo	hồ lô*
саахуу (saaxuu)	ᠰᠠᠬᠣᠣ (saxuu)	'teapot'	茶壺	cháhú	ちゃこ chako*	차 daho, chaho	trà ho, chè ho
санхүү (sanxüü)	ᠰᠠᠩᠬᠦᠦ (saŋxüü)	'warehouse'	倉庫	cāngkù	そうこ sōko	창고 chang-go	thương kho*
пүнлүү (pünlüü)	ᠫᠦᠩᠯᠦ (püŋlü)	'salary'	俸祿	fēnglù	ほうろく hōroku	봉록 bonglog	bổng lục*
цууянбуу (tsuuyanbuu)	ᠴᠣᠣᠶᠠᠩᠪᠣ (čuuyaŋbu)	'calico'	粗洋布	cūyángbù	そようふ soyōfu*	조양포 joyangpo*	to dương bố*
мантуу mantuu	ᠮᠠᠨᠲᠠᠣ (mantau)	'flat cake'	饅頭	mántou	まんとう mantō*	만두 mandu	man đầu
лантуу lantuu	ᠮᠠᠩᠲᠣᠣ (laŋtuu)	sledgehammer	榔頭	lángtou	ろうとう rōtō*	랑두, 낭두 rangdu, nangdu	lang đầu*
жинтүү ǰintüü	ᠵᠢᠨᠲᠦᠦ (ǰintüü)	'pillow'	枕頭	zhěntóu	ちんとう chintō	침두 chimdu	chấm đầu*
пямбуу pyambuu	ᠫᠢᠩᠪᠣᠣ? (piŋbuu?)	'carpenter's plane'	平刨	píngbǎo	へいほう heihō*	평포* pyeongpo*	bình bào, bình bao
ембүү yembüü	ᠶᠣᠠᠮᠪᠣᠣ (yuwambuu), ᠶᠣᠠᠨᠪᠣᠣ (yuwanbuu)	'ingot'	元寶	yuánbǎo	げんぽう genpō	원보 wonbo	nguyên bảo
тэмбүү tembüü	ᠲᠡᠮᠪᠦᠦ (tembüü)	'syphilis'	天疱	tiānpào	てんぽう tenpō	천포 cheonpo	thiên bào*
хуажуу xuaǰuu	ᠬᠣᠸᠠᠵᠣᠣ (xuwaǰuu)	'pepper'	花椒	huājiāo	かしょう kashō*	화초 hwacho	hoa tiêu*
чинжүү činjüü	ᠴᠢᠨᠵᠦᠦ (činǰüü), ᠴᠢᠩ ᠵᠢᠶᠣᠣ (čiŋ ǰiyuu)	'green pepper'	青椒	qīngjiāo	せいしょう seishō*	청초 cheongcho	thanh tiêu*
поолуу pooluu	ᠫᠣᠣᠯᠣᠣ (pooluu)	wicker basket	笸籮	pǒluó	はら hara*	--	--
дэнлүү deŋlüü	ᠳᠧᠩᠯᠦ (dEŋlüü)	lantern	燈籠	dēnglóng	とうろう tōrō	등롱, 등농 deungrong, deungnong	đèn lồng

With one exception, санхүү saŋxüü, all pronunciations follow the Mongolian requirement for vowel harmony. In modern times, Mongolian has adopted the phoneme /f/ in borrowing vocabulary from Russian, but at the time this vocabulary was borrowed /f/ was alien to the Mongolian sound system and /p/ was used. In modern Inner Mongolia usage, ᠪᠠᠢᠢᠹᠠᠩ (baifaŋ) is also used instead of ᠪᠠᠢᠢᠪᠠᠩ (baibaŋ).

The above suggests that Mongolian borrowing of Chinese vocabulary has involved both auditory borrowing of an impressionistic type and, increasingly, more systematic borrowing based on a close acquaintance with Chinese phonology.

However, Mongolian has largely avoided the borrowing of Sinoxenic morphemes that is a strong feature of Sinoxenic languages, as seen in Table 3 above. Instead, Mongolian has tended to form compounds of native Mongolian words based on the Chinese model, i.e., calques. But this is the stuff of a separate post...

(This post is a patchy revision of my answer at Quora. In writing this post I've become increasingly aware of shortcomings in that answer, which was based on a superficial inspection of borrowed forms in the Cyrillic script. In the process of writing the Quora answer and this post, I've had reference to Н. Балжинням's Монгол хэлний Хятад ормол үгийн судалгаа; Marc Hideo Miyake's Old Japanese: a phonetic reconstruction; several dictionaries featuring the traditional Mongolian script; John Duong Phan's Dissertation preview of "Lacquered Words: the evolution of Vietnamese under Sinitic influences from the 1st century BCE to the 17th century CE; Wikipedia; and various flavours of Wiktionary. However, a deeper understanding of the process of borrowing vocabulary from Chinese requires a far greater familiarity with the history and nature of borrowing than I currently possess.)

Sorry, I'm now using Disqus for comments. If you'd prefer not to use Disqus, please send me an email and I'll list your comments separately. Thanks!