Ergonomic Arabic transcription in Vim

May 28, 2023
Tags: transcription vim
Length: short

In a previous post I described how I have implemented the Alt-Latin keyboard layout for Arabic transcription in Vim. While this works reasonably well for one-off words, it gets tedious and cumbersome if you do a lot transcription, for example for writing a large number linguistic examples or entire paragraphs.1 This is because the layout includes some awkward key-chords, like Alta+u for ū or Altw+g for ġ. (A plus indicates sequential key presses.) These are not only physically cumbersome to type but are also somewhat difficult to remember. This got me thinking about other solutions that may be faster and more intuitive and ergonomic. After some testing I came up with the following scheme that I find works much better.

This alternative approach relies on linearly sequenced key presses without the use modifier keys. It exploits the fact that some character sequences, such as aa, .d, and _t, are rare or non-existent in English and in many other languages. These sequences can therefore be used to insert transcription-specific characters without interfering with other typing. The entire scheme can be described as follows:

  • Long vowels with macron (ā, ī, ū) are typed as the double corresponding letter.2
    a+a = ā, etc.

  • Dotted versions of letters (ḍ, ṭ, ġ, etc.) are typed with a dot followed by the letter. The dot is places above or below as appropriate.
    .+d = ,
    .+g = ġ, etc.

  • Underlined letter are typed with an underscore followed by the letter.
    _+d = , etc.3

  • š is typed with v followed by s.
    v+s

All these also have corresponding uppercase versions, typed as you’d expect, .+D gives , for example.

For ʿayn and hamza I have not figured out a good combination so I (somewhat hesitantly) keep the Alt-Latin chording:

  • Altp = ʿ

  • AltP = ʾ

The code below is what I have in my .vimrc to provide this functionality. It is toggled on and off for the current buffer with :EALLToggle. The code is rather primitive, just enabling and disabling a bunch of insert mappings, but it is simple, easy to modify to other transcription systems or user preferences, and it gets the job done.

Overall, I have found this scheme to offers much more comfortable and ergonomic typing of Arabic transcription than does Alt-Latin style key-chording.

function! EALLToggle()
  if !exists("b:eallmappings")
    let b:eallmappings = 0
  endif 
  if b:eallmappings == 0
    let b:eallmappings = 1
    echo "EALL mappings activated for this buffer"
    inoremap <buffer> <M-p> ʿ
    inoremap <buffer> <M-P> ʾ
    inoremap <buffer> aa ā
    inoremap <buffer> ii ī
    inoremap <buffer> uu ū
    inoremap <buffer> AA Ā
    inoremap <buffer> II Ī
    inoremap <buffer> UU Ū
    inoremap <buffer> .d ḍ
    inoremap <buffer> .D Ḍ
    inoremap <buffer> .t ṭ
    inoremap <buffer> .T Ṭ
    inoremap <buffer> .s ṣ
    inoremap <buffer> .S Ṣ
    inoremap <buffer> .r ṛ
    inoremap <buffer> .R Ṛ
    inoremap <buffer> .z ẓ
    inoremap <buffer> .Z Ẓ
    inoremap <buffer> .h ḥ
    inoremap <buffer> .H Ḥ
    inoremap <buffer> .g ġ
    inoremap <buffer> .G Ġ
    inoremap <buffer> vs š
    inoremap <buffer> vS Š
    inoremap <buffer> _d ḏ
    inoremap <buffer> _D Ḏ
    inoremap <buffer> _t ṯ
    inoremap <buffer> _T Ṯ
  elseif b:eallmappings == 1
    let b:eallmappings = 0
    echo "EALL mappings deactiviated for this buffer"
    iunmap <buffer> <M-p> ʿ
    iunmap <buffer> <M-P> ʾ
    iunmap <buffer>aa
    iunmap <buffer>ii
    iunmap <buffer>uu
    iunmap <buffer>AA
    iunmap <buffer>II
    iunmap <buffer>UU
    iunmap <buffer>.d
    iunmap <buffer>.D
    iunmap <buffer>.t
    iunmap <buffer>.T
    iunmap <buffer>.s
    iunmap <buffer>.S
    iunmap <buffer>.z
    iunmap <buffer>.Z
    iunmap <buffer>.h
    iunmap <buffer>.H
    iunmap <buffer>.g
    iunmap <buffer>.G
    iunmap <buffer>vs
    iunmap <buffer>vS
    iunmap <buffer>_d
    iunmap <buffer>_D
    iunmap <buffer>_t
    iunmap <buffer>_T
  endif
endfunction

command! EALLToggle call EALLToggle()
  1. Arabic transcription of entire paragraphs is generally a bad idea, but may be required for academic publishing in certain journals. 

  2. Of course, if you want to extend this to non-standard long vowels like ō, it will sōn run into trouble. 

  3. This is not optimal, but -d intervenes with the hyphenated article and double dd etc. are too common.