Friday 27 January 2012

A Case for Shahmukhi Script for the Punjabi Language

One of the major disadvantages of using Shahmukhi to write Punjabi, as critics point out, is that unlike Gurumukhi, it's not phonetic. In addition to that short vowel sounds aren't usually represented and there is no letter for the consonant [ɳ]. As a result ھن may be pronounced [hʊn] (now) or [hən] (are). All this can be pretty confusing for a student. What is rarely mentioned, however, is that there is a flip side to the story too.

Let's begin with what critics gripe about the most: an un-phonetic script. While it's true that there are three letters for S (س،ص،ث), four for Z (ز،ذ،ظ،ض ), two for K (ق،ک), two for G (گ،غ) and so on, it's also true that these characters aren't used randomly. There is certainly a logic behind all this.

Take the letters  س،ص،ث for instance. If you find ص or ث you can be pretty certain the word has come from Arabic or Persian; while words with س are more likely to have either Sanskrit origins or are recent borrowings from English. 

The same is true of letters غ .گ،غ is normally found in words from Arabic or Persian while گ is predominately used for words that have come from Sanskrit.

Gurumukhi, though simple, robs you of this wealth of etymological information and virtually veils the richness of the language and the culture of the people who influenced it in the past.

Unlike Gurumukhi, Shahmukhi doesn't care about short vowels and therefore they are often omitted from the writing. This makes a script a little ambiguous but at the same time more tolerant of different pronunciations.

Take for instance the name of the holy city of Amritsar. While educated Punjabis (influenced by Hindi-Sanskrit) tend to pronounce it [əmrɪt'sər], the better Punjabi pronunciation is [əmrət'sər]. While Gurumukhi tends to favour the Hindi-influenced elites, Shahmukhi spelling امرتسر (Amrtsr) leaves you both the options open.

Finally, and this is where many supporters of Shahmukhi are can't counter, is that it doesn't have the letter [ɳ]. Fortunately, I have come across several works where authors have used O over ں to create a new letter for [ɳ].

Here is a short sentence to illustrate what I have just said:

IPA: 

kʌl sku:l d͡ʒɑ:ɳ vəkt d͡ʒəðð ʊsnɛ zəmɪ:n tɔ̃ bəstɑ: t͡ʃʊkkɛ'ɑ: tɑ̃ ʊsnu: jɑ:ð a:'ja: kɪ ʊsðɪ: kɪtɑ:b faʈɪ: ho'jɪ: æ tɛ kʌləm ʈʊʈʈɪ: ho'jɪ: æ

Shahmukhi:

کل سکول جان وقت جدّ اس نے زمین توں بستا چکّیا تاں اس نو یاد آیا کہ اس دی کتاب پھٹی ھوی اے تے قلم ٹٹّی ھوی اے۔

Here's how it's spelled:

KL SKWL JAN WQT JDD US NY ZMYN TWN BSTA CHKKYA TAN US NW YAD AYA KE US DY KTAB PHTY HWY AY TY QLM TTTY HWY AY.

The underlined words are Arabic in origin. Gurmukhi gives no clue whatsoever to it. The unphonetic words waqt (better spelled as vakt) and qalam (phonetically kalam) give you an idea.

Thursday 12 January 2012

Languages, books and the National Library of India

Langues, livres et la Bibliothèque nationale d'Inde

La Bibliothèque nationale d'Inde a plus de 22.65.000 livres et trois sur quatre sont en anglais. Il y a seulement 6.43.255 livres en autres langues Indiennes. En la section des langues étrangères, le chinois a 15.000 livres, le arabe a 12.000 livres le persan en a 12.000. On y trouve 5.000 livres en français et la section des langues slaves a 65.000 livres.

La bibliothèque possède 85.000 livres en bengali mais le nombre des livres en hindi est seulement 80.000. Le site web non donne pas de quelconque information concernant combien de livres dans la bibliothèque sont en pendjabi et télougou.

La condition actuelle en Inde le montre: bien que le gouvernement indien utilise deux idiomes - le hindi et l'anglais - officiellement au niveau fédéral; en vérité anglais est plus populaire et beaucoup d'indiens dans les le sud et l'est de la nation préfèrent utiliser l'anglais. Les livres à la bibliothèque nationale sont une autre preuve de ce phénomène.

========================================

Sometimes I wonder why don't they declare English our national language. My compatriots in South India and East India would surely welcome the decision; albeit North Indians (especially the Hindi speaking states) may not agree. Currently India doesn't have a national language. The Central Government uses Hindi and English but as you would suspect, Hindi is only nominally used. Perhaps they preserve it to use on the World Hindi Day. 

I found another illustration of this a couple of days ago, while I was browsing through the website of the National Library of India and from there I figured out that about one in three books in the library are in English. They didn't explicitly mention this fact but if of the 24,65,352 books in the library only 6,43,255 are in indigenous languages, you can safely think the rest aren't in Chinese or Russian.

Only 80,000 books in the library are in Hindi; even Bengali, a regional language, boasts of 85,000 books. But that's perhaps because the National Library is located in West Bengal.

Coming back home, I was really saddened to find out of all Indian languages they don't list the number of books they have only for Punjabi, English and Telegu. Perhaps that's a polite way of saying they haven't got much in Punjabi. 

It reminds me of a famous saying here in Punjab and it goes something like this:

If a Bengali were to suddenly get rich, he would construct a library in his house but if a Punjabi got lot of money overnight, he would make the flashiest of bars and have a collection of the most exquisite wines.

Perhaps the Bengalis took it literally. They gave the world the first non-European Nobel Laureate but that doesn't mean they have earned the right look down upon unruly Punjabis.

Turning to other Indian languages, Sanskrit has over 20,000 books which is huge when you compare it to only 500 books the Kashmiri section has. This makes me think do we really know the people we are - or at least claim to be - fighting for? After all we have fought four wars with Pakistan over Kashmir and now the territory is divided into three regions, one controlled by Pakistan, India and China each.

Here is the report card of other Indian languages:

Tamil: 57,000
Gujarati: 37,000
Marathi: 37,000
Malyalam: 34,000
Kannada: 32,000
Urdu: 20,000
Oriya: 19,500
Assamese: 12,000
Sindhi: 2,100

Impressive as it may sound to some, when you compare it with the Slavic language section that houses a decent 65,000 books you doubt if these languages are really indigenous or foreign.

There more more books in Chinese at the library than in Assamese and Sindhi; 15,000. That's probably because there is a China town in Calcutta.

Arabic and Persian have 12,000 each, next comes French with 5,000 books and Romanian has 2,000.

Saturday 7 January 2012

Dutch Wikipedia has a million articles now, at least one in ten is computer generated

The Dutch Wikipedia recently crossed the one million mark. It's the fourth edition after English, German and French Wikipedias to pass that number. But despite this it's has been removed from the ten largest Wikipedias that show up on the homepage.



In its place there is now the Chinese Wikipedia and this I believe has happened because the Dutch used bots (automated computer programmes) to create more than 100,000 articles in the spate of only ten days. If I remember correctly that happened between 7 December, 2011 and 17 December, 2011. Unfortunately, I don't remember where I read that neither can I find more information on it.

I find it hard to believe that these computer generated articles add anything of value to an encyclopaedia. Personally speaking, I would like it to cover only 100,000 articles in depth than 1,000,000 superficially. But again, showing off is perhaps a part of the contemporary culture.

 ڈچّ ویکیپیڈیا پر 1,000,000 مضمون، 10% سے زیادہ کمپیوٹروں نے لکھے ہیں

ڈچّ ویکیپیڈیا میں پچھلے دنوں میں 1,000,000 مضمون (آرٹِیکلس) ہو گئے۔ اس سے ڈچّ ویکیپیڈیا انگریزی، جرمن اور فرانسیسی ویکیپیڈیوں کے بعد 1,000,000 کا مقام حاصل کرنے والا چوتھا ویکیپیڈیا بن گیا ہے۔ لیکن اس کے باوجود بھی ڈچّ ویکیپیڈیا اب دس بڑے ویکیپیڈیوں میں شمار نہی ہے۔ اس کو اس لِسٹْ سے بیدخل کر دیا گیا ہے۔

اس کے پیچھے ایک چھوٹی سی وجہ ہے: ڈچّ لوگوں نے مضمونوں کی گنتی بڑھانے کے لئے بوٹس یانی کمپیوٹر پروگراموں کی مدد لی تھی۔ اور ایے سب کرنا اگر ویکیپیڈیا کے قانونوں کے مطابق حرام نہی ہے تو اسے حلال بھی نہی کہا جا سکتا۔ صرف 10 دنوں کے اندر اندر 100,000 سے زیادہ مضمون ڈچّ ویکیپیڈیا پر لکھے گئے اور ایے سب 7 دسمبر، 2011 سے 17 دسمبر، 2011 کے درمیان ہوا۔

مجھے نہی لگتا کہ کمپیوٹر کے ضریئے لکھے گئے مضمونوں میں بہت کچھ پڑھنے کے قابل ہوگا۔ 100,000 مضمونوں کو عمدہ طریقے سے لکھنا کئی گنا بہتر ہے 1,000,000 مضمونوں کو خراب طریقے سے لکھنے کے۔

Friday 6 January 2012

عراق جنگ کا خاتما۔۔۔

نئے سال کی پہلی پوسٹ میں عراق کا ذکر کیا تھا۔ امریکہ عراق سے تو چلا جایگا لیکن کیسی نے اسوال کا جواب نہی دیا کہ اگر 4،400 امریکی فوجیوں کی موت ملک کے لئے قربانی تھی تو کمْ سے کمْ 100،000 عراقیوں کی موت کیا ہے جسکا بڑے بڑے عالمی اخناروں میں ذکر تک نہی؟؟؟ 

Thursday 5 January 2012

ਦੂਜੇ ਦਰਜੇ ਦੇ ਇਨਸਾਨ

Resumo en Esperanto:

Hodiaŭ nova artikolo pri la Iraka milito en The Indian Express sciigas siajn legantojn kiom da mono Usono elspezis kaj kiom da usonaj soldatoj mortiĝis en la milito sed ĝi silentiĝas pri kiom da senpekaj irakanoj mortiĝis. Plie la artikolo sugestas ke irakanoj ne estas sufiĉe "inteligentaj" por solvi siajn proprajn problemojn.

ਅੱਜ ਫਿਰ ਦ ਇਨਡਿਅਨ ਐਕਸਪਰੈਸ ਦੇ 13-ਵੇਂ ਵਰਕ 'ਤੇ ਦ ਇਕਾਨਾਮਿਸਟ ਵਲੋਂ ਲਿਖਿਆ ਇਕ ਲੇਖ ਇਸ ਗਲ ਵਲ ਇਸ਼ਾਰਾ ਕਰਦਾ ਹੈ ਕਿ ਅਸੀ, ਤੀਜੀ‌ ਦੁਨੀਆ ਵਿਚ ਰਹਿਣ ਵਾਲੇ, ਸ਼ਾਇਦ ਦੂਜੇ ਦਰਜੇ ਦੇ ਇਨਸਾਨ ਹਾਂ ।

ਲੇਖ ਦਾ ਨਾਂ ਹੈ ਇਸ ਨੂੰ ਸੰਘੀ ਬਣਾ ਦੋ* ਲੇਖ ਕਹਿਦਾ ਹੈ:

1. ਇਰਾਕ ਵਿਚ 4,400 ਤੋਂ ਵੱਧ ਅਮਰੀਕੀ ਸਿਪਾਹੀ ਮਾਰੇ ਗਏ,
2. ਇਰਾਕ ਜੰਗ ਦੀ ਕੀਮਤ $8000 ਕਰੋੜ ਅਮਰੀਕੀ ਡਾਲਰ ਹੈ,
3. ਅਤੇ ਇਰਾਕ ਜੰਗ ਦੀ ਵਜਿਹ ਨਾਲ ਅਮਰੀਕਾ ਨੇ ਅਰਬ-ਦੁਨਿਆ ਵਿਚ ਨਵੇਂ ਦੁਸ਼ਮਣ ਬਣਾ ਲਏ ਹਨ ।

ਪਰ ਲੇਖ ਵਿਚ ਇਕ ਬਹੁਤ ਜ਼ਰੂਰੀ ਗਲ ਬਾਰੇ ਕੋਈ ਜ਼ਿਕਰ ਨਹੀ ਹੈ... ਕਿਨ੍ਹੇ ਬੇ-ਗੁਨਾਹ ਇਰਾਕੀ ਇਸ ਬੇ-ਜ਼ਰੂਰੀ ਲੜਾਈ ਦੀ‌ ਭੇਟ ਚੜ੍ਹ ਗਏ, ਕਿਸੀ ਨੂ ਕੋਈ ਅੰਦਾਜ਼ਾ ਤਕ ਨਹੀ । ਇਰਾਕੀ ਫੈਮਿਲੀ ਹੈਲਥ ਸਰਵੇ ਦੇ ਮੁਤਾਬਕ 104,000 ਤੋਂ 223,000, ਲੈਨਸੈਟ ਸਰਵੇ ਦੇ ਮੁਤਾਬਕ ਘੱਟੋ-ਘੱਟ 601,027 ਅਤੇ ਆਪਿਨਿਅਨ ਬਿਜ਼ਨਸ ਰੀਸਰਚ ਸਰਵੇ ਮੁਤਾਬਕ 1,220,580 ਤੋਂ ਵੱਧ ਸਾਨੂੰ ਕੋਈ ਅੰਦਾਜ਼ਾ ਤਕ ਨਹੀ ਕਿਉਂ ਕਿ ਇਰਾਕੀ ਦੂਜੇ ਦਰਜੇ ਦੇ ਇਨਸਾਨ ਹਨ ਅਤੇ ਇਸ ਕਾਰਨ ਉਨ੍ਹਾਂ ਦੀ ਜ਼ਿੰਦਗੀ‌ ਜਾਂ ਮੌਤ ਕੋਈ‌ ਮਾਇਨੇ ਨਹੀ‌ ਰੱਖਦੀ‌ ।

ਲੇਖ ਇੱਥੇ ਹੀ ਨਹੀ ਰੁਕਦਾ, ਉਸ ਦੇ ਮੁਤਾਬਕ: 

"ਪਰ ਇਹ ਗਲ ਬਹੁਤ ਚੰਗੀ ਹੋਵੇਗੀ ਜੇ ਇਰਾਕੀ ਆਪਣੀਆਂ ਮੁਸ਼ਕਲਾਂ ਨੂੰ ਖ਼ੁਦ ਹੀ ਸੁਲਝਾਣ, ਪਰ ਜੇ ਅਸੀ ਪਿਛਲੇ ਕੁਝ ਵਰ੍ਹਿਆਂ 'ਤੇ ਨਜ਼ਰ ਮਾਰੀਏ, ਤਾਂ ਲੱਗਦਾ ਨਹੀ‌ ਕਿ ਉਹ [ਇਰਾਕੀ] ਇਸ ਕੰਮ ਨੂੰ ਸਮਝਦਾਰੀ ਨਾਲ ਅੰਜਾਮ ਦੇ ਸਕਦੇ ਹਨ ।"

ਰਾਬਰਟ ਫਿਸਕ ਅਕਸਰ ਕਹਿਦਾ ਹੈ ਕਿ ਅਰਬ ਮੁਲਕਾਂ ਦੇ ਤਾਨਾਸ਼ਾਹ ਆਪਣੇ ਲੋਕਾਂ ਨੂ ਬੱਚੇ ਸਮਝਦੇ ਹਨ । ਦ ਇਕਾਨਾਮਿਸਟ ਪੜ੍ਹ ਕੇ ਲੱਗਦਾ ਹੈ ਕਿ ਉਹ ਇੱਕਲੇ ਨਹੀ ਹਨ ।

* = Make it Federal, The Economist, 5 January 2012 via The Indian Express