Arabic plural forms for GNU ngettext

copyright (c) 2004, Arabeyes.org, Muayyad Saleh Alsadi<alsadi [AT] gmail [DOT] com>

This document is released under terms of GNU FDL

Introduction

GNU gettext family of functions offer a nice way to translate programs, ngettext allow translator to offer more human translation for messages like

Copying XXX File(s)
جار نسخ XXX ملف/ملفات
so that the plural forms of the word "file" in the example is selected according to the value of XXX. GNU libc manual offers examples for many languages, some have a single plural form, which mean that they use same word for plurals and singulars, others have two upto four plural forms. Non of those examples fits for Arabic. It's confusing for Arabic because as there are will defined rules for numbers forms, but this is not our subject, the subject is not the number ("العدد"), it's the word following that number ("المعدود"). There is no single fixed rule because of flexibility of Arabic language, Arabic is so rich that one could have structure his sentence to have a single plural form! another could have 6 plural forms!!

Trivial solution: A single plural form

As a mathematician this is the first solution that I should think of, but it was the last solution I thought of (my be we mathematicians tend to ignore trivial solutions! ha ha just being serious)

nplurals=1; plural=0
which mean that we have a single plural form (which is too little IMHO),
جار نقل 0 من الملفات
جار نقل 1 من الملفات
جار نقل 2 من الملفات
جار نقل 3 من الملفات
جار نقل 11 من الملفات
جار نقل 99 من الملفات
جار نقل 100 من الملفات
جار نقل 101 من الملفات
جار نقل 103 من الملفات
This solution is grammatically correct, the last one is read as "جار نقل ثلاثة ومئة من الملفات" How flexible out language is!!

Arabeyes currently used solution: 4 plural forms

This is temporary solution:

nplurals=4; plural=n==1 ? 0 : n==2 ? 1 : n>=3 && n<=10 ? 2 : 3
جار نقل 0 ملف
جار نقل ملف واحد (1)
جار نقل ملفين إثنين (2)
جار نقل 3 ملفات
جار نقل 11 ملف
جار نقل 99 ملف
جار نقل 101 ملف
جار نقل 103 ملف
the last one is جار نقل ثلاثة ومئة ملف and not جار نقل مئة وثلاث ملفات

Arabeyes Rich Classical solution: 6 plural forms

Arabeyes.org have induced the plural forms for Arabic language to be

nplurals=6; plural=n == 0 ? 0 : n == 1 ? 1 : n == 2 ? 2 : n >= 3 && n <= 10 ?
3 : n >= 11 && n <= 99 ? 4 : 5;
which mean that we have 6 plural forms (which is too much IMHO), I used to think that the last rule [5] is to be followed by plural as if not what is the difference between it and the previous rule. The difference is in the gramatical marker (Tashkeel) as for the last one it's always Maroor. Messages look native and intelligent like:
printf (ngettext ("copying %d file\n", "copying %d files\n", n), n);
a sample translation could yeild
لم يتم نقل الملف بعد
جار نقل ملف واحد
جار نقل ملفين إثنين
جار نقل 3 ملفات
جار نقل 11 ملفاً
جار نقل 99 ملفاً
جار نقل 101 ملفٍ
جار نقل 103 ملفٍ
I used to consider the case of 101 and 103 wrong but it's correct if read from right to left as "ثلاثة ومئة".

This formula could case a real-world 'printf' message like this:

printf (ngettext (
		"%s is copying %d file from %s to %s\n",
		"%s is copying %d files from %s to %s\n",
		n),
	pname, n ,"/some/where","/else/where");
to SEGFAULT for n= 0, 1 or 2, because omitting the number '%d' in translation would pass the integer number value as a pointer to '%s'. A smart translator could work around this by including the number between parentheses when he can't include it natively. This approach would yield
برنامج test ينقل 0 ملف من ‎/some/where‎ إلى ‎/else/where‎
برنامج test ينقل 1 ملف من ‎/some/where‎ إلى ‎/else/where‎
برنامج test ينقل ملفين إثنين (2) من ‎/some/where‎ إلى ‎/else/where‎
برنامج test ينقل 3 ملفات من ‎/some/where‎ إلى ‎/else/where‎
برنامج test ينقل 11 ملفاً من ‎/some/where‎ إلى ‎/else/where‎
برنامج test ينقل 103 ملفٍ من ‎/some/where‎ إلى ‎/else/where‎

Modern solution: 2 plural forms

I have induced another solution independently, based on the common way to read/write numbers in the present time It's based on reading the digits from Left to right except for first two digits. (it's correct as digits are added to gather, 100 plus 12)

Syntax:

nplurals=2; plural=(n>9)?(n % 100 >2 && n % 100 < 11):(n >2);
having only 2 plural forms: which would yield a safe and clean results for both the toy message and the real-world 'printf' message, but for some numbers the message won't look natural and seems artificial, look at this sample:
جار نقل 0 ملف
جار نقل 1 ملف
جار نقل 2 ملف
جار نقل 3 ملفات
جار نقل 11 ملف
جار نقل 101 ملف
جار نقل 103 ملفات
جار نقل 127 ملف
notice that it only seems artificial for n=2, it's not wrong (as it's Tameez), it's just artificial, it could be considered as an extension to our so flexible the language. This formula is much simpler as it only have two plural forms and it won't SEGFAULT the application, it needs no extra workarounds from the translator, and if the translation for n=2 seems unnatural, but the parentheses in the workaround seems more unnatural.

Translators please note NOT to use Tanween AlFatih تنوين الفتح or any other grammatical marker for Tamyeez تمييّز as it becomes Majroor مجرورة بالإضافة for n=1000, using the fool proof principle "سكن تسلم"

New Simplified Classical solution: 2 plural forms

I modified the last rule to be read from right to left, using the following syntax:

nplurals=2; plural=(n<=2 || n>=11)?0:1;
having only 2 plural forms: which would yield a safe and clean results for both the toy message and the real-world 'printf' message, but for some numbers the message won't look natural and seems artificial, look at this sample:
جار نقل 0 ملف
جار نقل 1 ملف
جار نقل 2 ملف
جار نقل 3 ملفات
جار نقل 10 ملفات
جار نقل 11 ملف
جار نقل 99 ملف
جار نقل 103 ملف
It's acceptable just like the previous one, but problem is that we can't show the grammatical marker (for example in 99 and 103).

Final classical solution

I have induced another solution based on Arabeyes solution, by reducing the 6 rules to 3, omitting the zero and two cases to be merged with another third case , considering this as an extensions to our flexible language. using the following syntax:

nplurals=3; plural=(n>99)?1:(n > 2 && n < 11)?2:0;
having only 3 plural forms:

جار نقل 0 ملفاً
جار نقل 1 ملفاً
جار نقل 2 ملفاً
جار نقل 3 ملفات
جار نقل 10 ملفات
جار نقل 11 ملفاً
جار نقل 99 ملفاً
جار نقل 100 ملفٍ
جار نقل 103 ملفٍ
Again, the red cases 0,1 and 2 are not wrong. They are correct. The same way "ثلاثة ومئة ملفٍ" is correct but not common as compared to "مئة وثلاثة ملفات".

Conclusion

Our language, Arabic, is very flexible, and we could handle plural forms to have 1 up to 6 plural forms. Which rule is the most natural I don't know here is the table of the most unnatural/artificial/wrong cases of each
Rulecriticsexample
one formغير بشريجار نقل 1 من الملفات
4 formsحذف العدد أو تعويضه داخل الأقواس في الحالات الأولى وحذف التشكيل-
6 formsحذف العدد أو تعويضه داخل الأقواس في الحالات الأولىلم يتم نقل أي ملف (0) أو جار نقل ملف واحد (1) أو جار نقل ملفين إثنين (2)
2 modern formsحالة الصفر والواحد والإثنين وحركات التشكيل إضافة إلى أنها ليست أصيلة بل مبتدعة في القراءة من اليسار لليمين بل والخلط كما في 127جار نقل 0 ملف ، جار نقل 2 ملف
2 classic formsالحالات الأولى حتى الإثنين وحركات التشكيل إلا أنها تتوافق مع الطريقة الأصيلة في القراءةجار نقل 0 ملف ، جار نقل 2 ملف
3 classical formsالحالات الثلاث الأولى لكنها تحل مشكلة التشكيل وبطريقة أصيلةجار نقل 0 ملفاً ، جار نقل 2 ملفاً
I think I prefer the last 3-forms solution.

I think it's very important to provide at least three (perl) scripts to convert the old 4 forms to the 6 forms and another to convert the 6 forms to the modern 2 forms or 3 classical forms