Thursday, July 26, 2018

Panlatin browser extension 2.4 for Chrome and Firefox

(this is a prerelease text for the upcoming new version 2.4)

So you know a bit of many languages and you don't have time to be really fluent?
While reading the web, the computer can assist in presenting the texts in Latin letters. Also, experimental display of the pronouncing details for languages with Latin letters is added.

The extension is described in two posts.

The web extension is available for Chrome and Firefox browers. Click the names of the browsers to get to their web stores.

There is a dedicated support page for questions, please ask questions there.

Jože Fabčič

Tuesday, December 26, 2017

Adding dictionaries

Since version 2.2, other dictionaries can be added to Panlatin browser extension. Thus far, the following 3 were tested. Any other dictionary of the same format can be added, too:
  • Chinese-German: HanDeDict, Kollaborativ entwickeltes, freies chinesisch-deutsches Wörterbuch, Veröffentlicht von Gábor L Ugray
  • Chinese-French: CFDICT, le dictionnaire chinois-francais libre par Chine Informations
  • Japanese-English: EDICT2 Japanese-English Electronic Dictionary Files by Electronic Dictionary Research & Development Group
For today, let me just give short instructions. In a week or so, simple step by step instructions will be added here.

For handedict, download handedict.u8.gz from, unpack it to handedict.u8 and load it to the Chinese dictionaries.

For cfdict, download from, unpack it to get cfdict.u8 and load it to the Chinese dictionaries.

For edict2, download edict2.gz from, unpack it to edict2. Then change the encoding to utf8. I did it in Windows git bash terminal with "iconv -f EUC-JP -t UTF-8 edict2 > edict2.u8". Then load edict2.u8 to Japanese dictionaries.

The following two screenshots show both Chinese dictionaries loaded using (1). The French dictionary is moved to the top of the list so that its results are shown first (2). The browser is shown in French localization.

The translations from all the three dictionaries are enabled and shown.

The following screenshot shows the Japanese text. Hiragana is shown above the native text and pointing with mouse show English translation.

Panlatin support

You can ask questions about Panlatin browser extension here. In the past months, few questions arrived into different places. This is now the preferred place to ask a question.

Wednesday, November 29, 2017

Romanized and Detailed Spellings

You can read the text in different spelling than the one that was used for writing. Computers can change the spelling of text defore displaying it. There are two main reasons to use different spellings:

  • If the native spelling is in different script, computers can save time not to learn so many other scripts: Chinese, Cyrillic, Japanese, Arabic, Hindi, Greek, etc. Romanized (transliterated) spellings show the text in Latin script.
  • The native spelling doesn't show all the pronunciation details. We can always look up the words in dictionary, but it's easier if computer does it instead. This is useful for out mother language, too. The standard pronunciation is usually different from our own dialect and we can't be sure that our mothers and friends have taught us the standard.

The examples use Unicode combing characters and some computers can't show them. In Windows, Notepad and fonts like Lucida Unicode, Arial, Consolas can be used to view the text correctly.

Example romanizations

The following example texts show the possible romanizations in Panlatin browser extension.


Russian: пишет, училище, уже, не, отъезд, черный, солнце, Россия
iso9: dočʹ, pišet, učiliŝe, uže, ne, otʺezd, černyj, solnce, Rossiâ
r9: dočʹ, pišet, učilišče, uže, ne, otʺezd, černyj, solnce, Rossija
en: dochʹ, pishet, uchilishche, uzhe, ne, otʺezd, chernyy, solntse, Rossiya
pan: dočʸ, pʸišet, učilʸiśe, uže, nʸe, ot··ʸezd, černỳy, solnc̍e, Rossʸiʸa


Greek: Κλειδωνιά, Νίθαυρη, Λευκόγεια, Δοϊράνης, Χαϊδαρίου, Παύλου; αυε, αβε
pan: Kleidòniá, Níthav̀rì, Lef̀kógeia, Doïránìs, CHaïdaríou, Páv̀lou; av̀e, ave
elot: Kleidonia, Nithavri, Lefkogeia, Doiranis, CHaidariou, Pavlou; ave, ave


Arabic: اللغة العربية
Romanized: āllghh ālʻrbīh
The Arabic script doesn't show most of vowels, so they're not shown in romanized text.


Chinese: 中文
Romanized: tā, tā, tā; 中文Zhōngwén
Chinese romanization (pinyin) adds Latin letters after every Chinese word. This way, it can be pronounced. The original Chinese text can be extracted from such a form. Showing also the Chinese characters help distinguish different characters with the same pinyin values.

In the position above, Chinese words are also translated. Romanization and translations are using CEDICT.

Examples for detailed spellings

The orthography in a typical language that uses Latins script usually leaves out few details. The letters and their combinations don't always show the pronunciation.
English language uses a dictionary (CMUDICT) for most of the words, the other languages just show how the pronunciation can be marked.


Traditional long vowels are marked (a>aʸ, e>è, i>ᴬi, o>oʷ, u>ʸu) as in made (maʸde̤), theme (thème̤), fine (fᴬine̤), home (hoʷme̤), cute (cʸute̤). Also the final unpronounced /e/ is marked with two dots below. The same sound marks can appear in other words: cold (coʷld).

Double dot below any letter means it's not pronounced: write (w̤rᴬite̤ ), right (rᴬig̤h̤t), sign (sᴬig̤n), build (bṳild).

Grave mark (`) is generally used as “first alternate form”. Further examples are à: water (wàter), ò: some (sòme̤), ù: cut (cùt), òu: round (ròund), èa: sea (sèa). Also /s/ pronounced as /z/ is marked with grave mark, see examples below.

Stress (when not on the first syllable) is marked on vowels without grave mark with acute (áéíóúý: awáy, caréer, bèlów). Vowels with grave (èòù) are shown with double grave (ȅȍȕ): Chᴬinȅs̀e̤ abȍut, adȕlt.

The sequences /ee/, /oo/, /ay/ are unmarked when pronounced the usual way.

Few more consonants:
/c/ is marked before /eiy/ as /ç/: cell (çell), or as /c·/ McEnroe (Mᴬc·Enroʷe̤), or is unmarked otherwise as “class” if pronounced /k/.
/c/ pronounced the Italian way as /tᶴ/ is shown as /č/: cello (čelloʷ).
/d/ pronounced as /dᶾ/ is shown as /dᶾ/: soldier (soʷldᶾie̤r).
/g/ is marked before /eiy/ as in gem (ĝem) or in get (g·et), or is unmarked otherwise as “gap” if pronounced /g/.
/g/ pronounced in French way as /ʒ / is shown as /ǧ/: regime (reǧíme̤).
/s/ pronounced as /z/ is shown as /s̀/: result (rès̀ȕlt) days (days̀).
/s/ pronounced as /ʃ/ is shown as /š/: extension (èxténši̤on).
/t/ pronounced as /tᶴ/ is shown as /tᶴ/: culture (cùltᶴure̤).
/z/: pronounced as /ʒ/ is shown as /ž/: azure (ažure̤)

/-tion/ and /-sion/ are marked as nation (naʸt̰i̤on), extension (èxténši̤on), mission (mišši̤on), vision(viši̤on). The last word is pronounced with /ʒ/ whereas the previous are with /ʒ /, but this is not distinguished.

/ea/ can be pronounced in few common ways: /iː/: sea (sèa), /e/: health (hea̤lth), /eɪ/: great (grea̰t).

/ou/ and /ow/ can also be pronounced in few ways: /aʊ/: round (ròund), cow (còw); American /oʊ / or British /əʊ / are unmarked: soul, own; /ʌ/: co̤ùntry.

A middle dot /·/ is used to separate two letters that usually compose digraph: foothill (foot·hill), single (sin·gle̤).
Schwa is occasionally inserted to show syllable: rhythm (rh̤ythᵊm).

See list of words below for more examples.

Standard: benefit below, engine enhance; gear gem regime jet, round soul country, cow own, idea each great health, modal model, put cut; page rite home theme unite sir, new; days, general generals, get getting, say says, long longer; nation national, foothill, record, cell sell call, what write right, cold called; culture, soldier

Detailed: benefit bèlów, enĝine̤ enhánçe̤; g·èar ĝem reǧíme̤ jet, ròund soul co̤ùntry, còw own, ᴬidȅa èach grea̰t hea̤lth, moʷdal model, put cùt; paʸĝe̤ rᴬite̤ hoʷme̤ thème̤ ʸunᴬíte̤, sir, nḛ̏w; days̀, ĝene̤ral ĝenerals̀, g·et g·etting, say sa̰y̤s̀, lōng lōn·ger; naʸt̰i̤on nat̰i̤onal, foot·hill, record, çell sell càll, wh̤at w̤rᴬite̤ rᴬig̤h̤t, coʷld càlle̤d; cùltᶴure̤, soʷldᶾie̤r


The German words are usually stressed at the first syllable and marked with acute (áéíóú) when not as in Kilometer (Kilométer). The letter /v/ is usually pronounced as /f/, but sometimes it stays /v/ as in Violine (Ṿiolíne). The letter /g/ is in some words pronounced like soft French /g/ as in Passagier (Passaĝíer). The /i/ in /ie/ is sometimes similar in pronunciation to letter /j/ as in Familie (Famílje). The letter /h/ after a vowel usually marks that the vowel is long and letter /h/ is not pronounced. When pronounced, it's separated with middle dot as in Alkohol (Alko·hol). The middle dot also marks separate syllables as in ideel (ide·el).

Standard: Kilometer, Violine vier, Passagier, Familie, Alkohol Zahl, ideel Idee

Detailed: Kilométer, Ṿiolíne vier, Passaĝíer, Famílje, Alko·hol Zahl, ide·él Idee


Italian uses open and close vowels /oe/. The open vowels are shown as /ɔɛ/. The stress in Italian is usually at the pre-last syllable. When it's not there, it's marked with acute as in celere (cɛ́lere). The letter /z/ can be voiced as in zero (ᴰzɛro) or voiceless (breᵀzza). Sometimes the sequence /gli/ doesn't represent a single word as in glicerina (g·licerina). The words with foreign spelling can be detailed like computer (compʸuter) or similar.

Standard: celere celeste, però zero brezza, glicerina, computer Bucholz

Detailed: cɛ́lere celɛste, perɔ̀ ᴰzɛro breᵀzza, g·licerina, compʸuter Bukholᵀz


In stressed syllables, there are 8 vowels phonemes in Slovene. In pan system, they're /aeɛəioɔu/. In detailed spelling, open /ɛɔ/ are shown as /êô/ and are always long. Detailed spelling additionally uses grave àèìòù for short stressed and acute áéíóú for long stressed vowels. Schwa /ə/ is not expressed in Slovene stress marking, but it was nevertheless used here. Initial /r/ occurs as in rdeč (ərdéč) or in unstressed /e/ before /lmnr/ as in moder (mṓdər).

Standard: zelen moder rdeč, veja agent agenta pet petega, govor visok visoka, oče sestra, slovenščina Celje

Detailed: zelén módər ərdèč, vêja agènt agênta pét pêtega, gôvor visòk visôka, ôče sêstra, slovénščina Cêlje

Pan-Latin spellings

As seen from the list of detailed spellings, different languages use different conventions for representing sounds and variations. For students of many languages, a more unified system can be more suitable. The “pan” series of conversion systems use mostly unified conventions that still allow fluent reading.
Acute áéíóúɛ́ɔ́ always marks stress position. Open /eo/ and schwa /ə/ vowels are shown as in IPA. IPA sounds /tʃ, ʤ, ʃ, ʒ, t͡s, d͡z / are marked in only few ways, /ch č, ǰ ǧ ᴰž, sh š, zh ž, ts c̍ ᵀz, ᴰz/. Grave àèìòù and also on consonants marks language-specific alternate pronunciation. Macron āēīōū shows long vowel. Two dots below e̤ show deleted vowel as in English. Middle dot · shows that multi-letter sequence is not to have combined meaning. Phonetic raised letters ᵀᴰᴬʸʷ are used as parts of sounds or as additional sounds not shown in ordinary spelling.
Here are few examples:
Italian: celere celeste, però zero brezza, glicerina, computer Bucholz, seta visto polso autobus sbaglio casa cassa, greco greci greche saggio saggi liscio, questo, virtù città è e perché
Pan Italian: čɛ́lere čelɛste, perɔ́ ᴰzɛro breᵀzza, g·ličerina, kompʸuter Bukh̤olᵀz, seta visto polso autobus s̀balʸo kas̀a kassa, greko greči greke saǧǧo saǧǧi lišo, kwesto, virtú čittá ɛ́ e perké
Slovene: zelen moder rdeč, veja agent agenta pet petega, govor visok visoka, oče sestra, slovenščina Celje
Pan Slovene: zelḗn mṓdər ərdéč, vej̍a agént agɛ́nta pḗt pɛ́tega, govor visók visɔ́ka, ɔ́če sɛ́stra, slovḗnščina C̍ɛ́lj̍e
Similar conventions are also used for English detailed system and different romanizations.

Jože Fabčič

Monday, November 27, 2017

Panlatin Browser Extension Setup

Basic Setup

Panlatin extension simplifies reading of foreign texts for us, westerns. To read a page in a Latin script, it usually needs some setup. Let’s me guide you with an example of reading Chinese weather forecast.

The instructions were tested in Panlatin version 2.2. So, check first if you have this version installed. Open the extension list in the browser, verify the version and click the Options to enable conversion for the wanted language, Chinese.

The numbers in the screenshot show the sequence of actions.
  1. Click the abbreviation “zh” for the Chinese. This changes the language description under the line.
  2. Verify the language abbreviation.
  3. Click the word “disabled” to become “enabled” as pictured.
  4. Click the transliteration system name. For Chinese, only one system is currently defined, pinyin. Other languages can have several systems. Soon after clicking, the "Transliterated text" window shows the transliteration.
  5. Point the mouse to different positions and you can read the descriptions. For now, leave the prosition to be "default".

Then open your wished page in browser. I chose the official weather forecast.

The page opens, but transliteration is not yet enabled. The Panlatin extension uses by default a simple language detecting that relies on the page authors to mark the correct language. So, the weather forecast is not marked with lang=”zh” attribute in html. Let’s mark it within the extension. Click the popup. There is no abbreviation “zh” after the "Page language" label. Click button Change (1).

The popup changes.
  1. To enable the language transliteration, leave the default “Enabled modifying”
  2. Verify the language abbreviation. By default, it’s the same as selected at the Options screen. For Chinese, it’s “zh”.
  3. Verify or change the range of pages that will be marked as Chinese. The default is to show all pages within the same site, what is “”. “www” is skipped. If you want to mark * or *.cn as chinese, use the buttons Shorten and Extend to change the range (address) to “” or “cn”. You can also define it for specific pages within site.
  4. Finally, click “Add” button.
Once a rule is added, it can be deleted, its language can be corrected, or another detailed rule can be specified.

Refresh the web page. It shows now the native and transliterated words.

There is another view possible that also shows the possible dictionary translations. Go back to the Panlatin Options page and at step 5) select the position “above”. Then refresh the web page. When pointing to particular words, possible translations are shown.

Ups, some suggestions from the dictionary are not good enough. In the title, the characters are incorrectly grouped as “sān 天全tiānquán guó” while they should be 三天sāntiān 全国quánguó”. The dictionaries are not perfect.

You can add up to 1000 entries into custom dictionary. Let’s enter this particular missing entry. Click the popup, then select “Dictionaries” button.

Enter the character into the 4 cells as shown in the screenshot. You can copy the text from the web page, the formatting will be skipped. You can also skip either traditional or simplified entry. Then click button “Add”. The entry is stored.

The additional buttons allow deleting all the entries, deleting one entry or entering many entries. If you have plenty of dictionary entries, you can click “Show as file” and the following screen is shown.

For Chinese dictionary, each line needs to have exactly 4 fields (as said above, some can be empty) that are separated by tabulators. Such text can be produced in Microsoft Word or in Libre Office Writer in a table of 4 columns that is then converted into text and cells separated by tabulators.

To finish our simple example, show again the weather page and refresh it. The title looks better.

Did you notice that the popup is different after the page language is defined? You can enable or disable conversion for the language - if language is disabled, no pages are changed.

Few more Options

The options page is divided into three sections with the horizontal lines. The active (clickable) texts are shown with gray background.
The upper section has three buttons for enabling, the list of abbreviations for the languages that can be romanized or detailed, and additional display button. The languages with the enabled romanization or details show the abbreviations in bold. The available languages are:
  • The button "Enable All non-Latin" enables romanization from all the languages that don't use the Latin script. It enables romanization of languages: Arabic (ar), Bulgarian (bg), Greek (el), Hindi (hi), Korean (ko), Macedonian (mk), Russian (ru), Serbian (sr), Ukrainian (uk), Chinese (zh).
  • The button "Enable All Latin" enables detailing for the languages that use the Latin script. It enables experimental detailed display of pronunciation of languages: German (de), English (en), Spanish (es), French (fr), Italian (it), Polish (pl), Slovenian (sl).
  • The button "Disable All" disables any changes to the texts.
  • Click to the language abbreviation shows the details of a language in the second section.
  • It's also possible to switch on “Showing romanization examples while selecting language”. This is used for an overview of changing. If this option is enabled, the following click to the language abbreviation shows example, if conversion is enabled and position is default.

The middle section shows the options for one language. It shows the following:
  • Conversion: either "enabled" or "disabled". Click to one of these two values changes the status.
  • System: the list of romanization or detailing systems. Some languages only have one system, "default". The other languages have several systems. If the conversion for this language is enabled and position is default, click to the system name also shows example romanization. The system name in bold shows the current system.
  • Position of original and converted text. The default position is for most of the languages such that only the romanized text is displayed. For Chinese language, the romanized word follows the native word. The position "above" shows the converted words above the native words in pale-yellow background. In Chinese, the tool tip above each word shows the dictionary translation. The bold position name shows the current position.
  • The Native and Converted texts are shown for each language and conversion system. For most of the languages, it's possible to convert text back or forth by typing new text to the Converted box or to the Native box and the other box shows the corresponding text. There are few limitations, there is no conversion from Converted text written in English, Korean and Chinese. The button Clear removes all the text from both the example boxes. The button Unicode shows the Unicode character numbers for the native text.
  • Some romanizations are not reversible. While all native texts can be romanized, it's not possible to produce all native texts from the romanized form using just character replacement rules. For example, Greek standard domestic (ELOT) romanization converts Κλειδωνιά to Kleidonia. When the latter word is converted back to Greek spelling, it becomes Κλειδονια. Stress marks are lost and distinctions between omega and omicron are lost, too. You can try another romanization that better preserves the original form.

Context menu and popup

The Panlatin form is such that the original form can be produced from it. So, it may be different from some other romanization. The original form, to be sent to other program, is available from the menu.
When the text was romanized with the default position, a part of it can be selected and the context menu has item "Copy original text from Panlatin form". Clicking it copies the native text to the clipboard. The native text is generated from the romanized Panlatin text, so it can differ from the original text if the romanization system is not reversible.
To have the original text in such a case, either select a reversible romanization system or select a the position above.

The web extension is available for Chrome and Firefox browers.

There is a dedicated support page for questions, please ask questions there.

Last verified on 27-dec-2017 for Panlatin version 2.2
by Jože Fabčič

Saturday, September 16, 2017

Казахстан меняет алфавит для казахского языка с кириллицы на латинский алфавит

Раньше для казахского языка или его предшественников использовались разные скрипты. В настоящее время используется кириллический алфавит. Пример текста: "Қазақ тілі — Қазақстан Республикасының мемлекеттік тілі, сонымен қатар Ресей, Өзбекстан, Қытай, Моңғолия және т.б. елдерде тұратын қазақтардың ана тілі.".

В 2017 году разрабатывается новый стандарт использования латинского алфавита. В сентябре 2017 года была опубликована базовая схема транслитерации. Казахский язык в латинском алфавите не собирается использовать диакритики, просто будут использоваться простые буквы a-z. Буква «х» не используется для родных казахских слов. Для конкретных казахских звуков имеется 8 диграфов.

Таблица преобразования:

Кириллица: а,б,ц,д,е,ф,г,х,һ,i,и,й,к,л,м,н,о,п,қ,р,с,т,ұ,в,у,ы,з,ә,ө,ү,ң,ғ,ч,ш,ж
Латинский алфавит: а,b,c,d,e,f,g,h,h,i,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,y,z,ае,ое,ue,ng,gh,ch,sh,zh

Дальше будут известны более подробные сведения. Например, кириллический алфавит использует «я» или «э», и пока отображение не было опубликовано.


Расширение браузера Panlatin версии 1.6 может транслитерировать html казахские тексты в плановый латинский алфавит.
Эти шаги заключаются в следующем:

1. Добавьте расширение «Panlatin» в браузер Chrome или Firefox.
2. Выберите «Опции» в «Управление расширениями».
3.1 Нажмите на аббревиатуру языка «kk» для казахского языка.
3.2 Измените значение «Отображение примеров латинизации при выборе языка» на «да».
3.3 Измените значение «Конверсия» на «включено».

В окне примера показана транслитерация: “Qаzаq tili — Qаzаqstаn Respwblikаsynyng memlekettik tili, sonymen qаtаr Resej, Оеzbekstаn, Qytаj, Monggholija zhаеne t.b. elderde turаtyn qаzаqtаrdyng аnа tili.”.

Следующие веб-страницы с определенным казахским языком по атрибуту «lang» в элементе <html> преобразуются в латинский скрипт. Некоторые веб-сайты не определяют атрибут «lang». Для таких сайтов язык должен быть определен ниже, как уже сделано для сайта «». Сайт (домен, компьютер и дополнительный путь) находится перед двоеточием «:». После двоеточия указывается аббревиатура языка «kk».

Чтобы отключить преобразование, измените значение «включено» на «отключено».

Существует дополнительная информация о расширении Panlatin (на английском) в Panlatin browser extension for Chrome and Firefox.

См. также английскую версию этого текста.

Йоже Фабчич, Словения (Jože Fabčič, Slovenia)

Kazakhstan is changing the script for Kazakh language from Cyrillic to Latin

Kazakhstan is changing the script for Kazakh language from Cyrillic to Latin

In the past, different scripts were used for Kazakh language or its predecessors. Currently, the Cyrillic script is used. An example text is: "Қазақ тілі — Қазақстан Республикасының мемлекеттік тілі, сонымен қатар Ресей, Өзбекстан, Қытай, Моңғолия және т.б. елдерде тұратын қазақтардың ана тілі.".

In 2017, a new standard for usage of Latin script is under development. In September 2017, basic transliteration scheme was published. The Latin Kazakh is not going to use any diacritics, just the plain letters a-z will be used. The letter "x" is not used for native Kazakh words. There are 8 digraphs for specific Kazakh sounds.

The conversion table is:

Cyrillic: а,б,ц,д,е,ф,г,х,һ,i,и,й,к,л,м,н,о,п,қ,р,с,т,ұ,в,у,ы,з,ә,ө,ү,ң,ғ,ч,ш,ж


There are going to be more details known later. For example, the Cyrillic script uses "я" or "э" and no mapping has been published yet.


Panlatin browser extension version 1.6 can transliterate existing html Cyrillic Kazakh texts into currently planned Latin script. See also the screen shot with Russian texts below.
The steps are as follows:
1. Add the extension “Panlatin” to Chrome or Firefox browser.
2. Select the “Options” button in “Manage Extensions”.
3.1 Click the language abbreviation “kk" for Kazakh language.
3.2 Change the value of “Showing romanization examples while selecting language” to “yes”.
3.3 Change the value of “Conversion” to “enabled”.

The example window show the transliteration to “Qаzаq tili — Qаzаqstаn Respwblikаsynyng memlekettik tili, sonymen qаtаr Resej, Оеzbekstаn, Qytаj, Monggholija zhаеne t.b. elderde turаtyn qаzаqtаrdyng аnа tili.”.

The next web pages with defined Kazakh language by attribute “lang” in <html> element are converted to Latin script. Some web sites don't define attribute “lang”. For such sites, the language needs to be defined below as already done for site “”. The site (domain, computer, and optional path) is before the colon “:”. After the colon, the language abbreviation “kk” is specified.

To disable the conversion, change the “enabled” back to “disabled”.

There is more information on Panlatin extension at Panlatin Browser Extension for Romanization and Detailing.

See also the Russian version of this text.

Jože Fabčič