News & Reports

Celebrating the Arabic Language, as Artificial Intelligence Helps It Expand

This year’s World Arabic Language Day arrives amid growing world interest in the language and as artificial intelligence is helping to expand its presence online.

The United Nations observes the day on December 18, which coincides with the day in 1973 that Arabic was adopted as an official language of the organisation. The theme of this year’s observance is “Arabic: the Language of Poetry and Arts”.

Arabic’s long history of enriching philosophy and the arts has been acknowledged in verse. Ahmad Shawqi (1870-1932), known as the “Prince of Poets”, praised Arabic, saying, “He who filled languages with beauties … placed beauty and its secret in the letter Ḍād,” a letter peculiar to Arabic.

With more than 350 million native speakers, Arabic is the fourth most widely spoken language in the world, according to Babbel magazine.

In a message on last year’s World Arabic Language Day, Audrey Azoulay, director-general of Unesco, noted that “the Arabic language constitutes a link between three continents, at the intersection of Europe, Asia, and Africa.”

“The Arabic language constitutes a link between three continents, at the intersection of Europe, Asia, and Africa. … Its geographical centrality meant that it was not only the language of merchants, but also of scholars, artists and philosophers.”

Audrey Azoulay, director-general of Unesco

“Its geographical centrality meant that it was not only the language of merchants, but also of scholars, artists and philosophers,” Azoulay said, adding that the language had gained strength and diversity from this position, “resulting in thinking of exceptional historical significance.”

Arabic is also the language of the Qur’an, and for many centuries it prevailed as a language of politics, science, and literature, directly or indirectly influencing many other languages. It aided in transfer of scientific and philosophical knowledge to Europe during the Middle Ages. It also enabled dialogue between cultures along the land and sea routes of the Silk Road from the coasts of India to the Horn of Africa.

Artificial Intelligence Supports Arabic

In a famous 1903 poem titled “The Arabic Language Laments Its Fate with Its People,” Hafez Ibrahim (1872–1932), the “Nile Poet”, personified Classical Arabic and imagined how it felt about contemporary efforts to replace it with colloquial forms. In the poem, Arabic speaks: “The Book of God is expansive in word and purpose, and is not narrow in its verses and sermons. How can I narrow down today to describe a machine, and format names for inventions?”

Today, it is machines and inventions that are describing Arabic, as advances in artificial intelligence and other technologies help computers master the language and expand its presence online and in the world.

Among the Arab researchers leading such efforts is Ahmed Ali, principal engineer in the Arabic Language Technologies Group at Hamad Bin Khalifa University’s Qatar Computing Research Institute (QCRI). He recently discussed his group’s work in an interview with Al-Fanar Media. A transcription of that conversation follows, edited for length and clarity.

Al-Fanar Media: Do you think that the recent events in Gaza have increased interest in the Arabic language?

Ahmed Ali: Of course the recent events in Gaza had an impact on international public opinion and interest in the Arabic language. For example, the world wants to understand many statements and audio recordings, in Classical and Levantine Arabic, especially the Palestinian dialect. This requires modern technology to understand the content of these files. Therefore, these events have broadly increased interest in learning about the Arabic language, as well as its different dialects. We believe that investing in this technology is an important means, so that the world can see part of the truth about what is happening in our Arab region.

The Arabic language is less used in scientific research outputs at the global level. How can we enhance its spread?

Ahmed Ali: Worldwide, most international research output isin English. The Arabic language has its own advantages; it has a unique diversity between spoken dialects and writing style.

During the past two decades, there have been two waves of increase in research in Arabic, both spoken and written. The first was after the attacks of September 11, 2001, when interest in the Arabic language increased for the purposes of understanding it. As for the second wave, it accompanied Web2.0 and the boom in Arabic content on social media, in 2011. For example, the Qatar Computing Research Institute (QCRI) published hundreds of research papers on understanding and analysing the Arabic language in the best scientific conferences.

Does the development of artificial intelligence (AI) endanger Arabic?

Ahmed Ali: AI is a machine that can learn from the data it sees, just like a little child. For example, we developed a speech machine, which converts written text into spoken Standard Arabic, for news bulletins and educational curricula. We have also developed auto-speakers for different dialects for social purposes, where vernacular dialects are dominant.

Based on your expertise, how can speech be processed by computer?

Ahmed Ali: Given the richness of the Arabic language, dealing with it requires taking into account many challenges. Their processing requires a huge amount of data. These challenges include the abundance and difficulty of Arabic morphology, such as the word وسيعالجونها whose translation will be (and+they+will+cure+it).

“We believe that investing in this technology is an important means, so that the world can see part of the truth about what is happening in our Arab region.”

Ahmed Ali, a principal engineer in the Arabic Language Technologies Group at Qatar Computing Research Institute (QCRI)

Mastering diacritics and their influence on meaning is another challenge, such as the words علم، عِلْم، عَلَم، عَلِمَ، عَلَّمَ )imagine if cr could signifiy car, care, cure, or core).

There is also the difficulty of identifying proper nouns (in English, they begin with capital letters, but Arabic has only lowercase letters), as well as the presence of many spelling and grammatical errors in Arabic script and a lack of adherence to punctuation marks, in addition to the lack of electronic linguistic resources available to researchers.

Are there technologies or tools that have been developed to support the use of the Arabic language?

Ahmed Ali: At QCRI and the Arabic Language Technologies Group, we aim to support the presence of the Arabic language online by building technologies that help computers master the Arabic language and make them first-class citizens in cyberspace.

The institute works to contribute to building and supporting Arabic-language technologies, and making them accessible to developers and programmers, as well as users, by publishing research related to these technologies, and building and developing programmes to process the Arabic language.

There are major projects concerned with Arabic language technologies, such as the “Farasa” programme, for automatic Arabic language processing, and ASAD, Arabic Social Media Analytics and unDerstanding, which detects hostile language, hate speech, feelings, dialect, and others. There are the Canary and Natiq programs, for converting audible voice into written text and vice versa, the Shaheen programme, for automatic translation between Arabic (and its dialects) and English, and Tanbih for analysing news, identifying propaganda and rumour campaigns, and determining intellectual and political positions. Moreover, NeuroX program works to understand technology-created neural networks and computer prediction mechanisms.

How can these technologies be utilised in Arab higher education?

Ahmed Ali: Hamad Bin Khalifa University is the first in the Middle East to offer innovative massive open online courses, in cooperation with EdX. These courses are a great opportunity to use modern technology in education. Speech recognition can be used to transcribe online lectures, and text analytics can be used to track students’ learning progress.

Do these techniques contribute to teaching the Arabic language to non-native speakers?

Ahmed Ali: The QVoice project is to build speech technology for automatic Arabic pronunciation learning, empowering Arabic learners of different age groups, and native language (L1) backgrounds, through accurate detection of pronunciation mistakes and appropriate feedback. It also aims to enable learners, especially non-native speakers, to learn and practice Modern Standard Arabic. It also aims to help native Arabic speakers reduce the impact of accents, and provide a learning experience tailored to the specific needs of individual learners with targeted feedback. Such an experience can boost confidence and encourage learners to continue learning.

At QCRI and the Arabic Language Technologies Group, we aim to support the presence of the Arabic language online by building technologies that help computers master the Arabic language and make them first-class citizens in cyberspace.

Ahmed Ali

The project also aims to enhance Arabic speech research, by better modeling the Arabic phonetic space, to deal with different dialects and speaking styles. It also aims to improve second language speech models, explore different acoustic modeling and augmentation techniques, enrich multilingualism, and introduce multimodality into Arabic speech research.

Amid widespread misinformation in digital content and the need to improve verification efforts, how can we benefit from these technologies?

Ahmed Ali: QRCI’s Tanbih project works to develop techniques related to information analysis, especially in the context of combating rumours and enhancing media screening skills. It also aims to address the challenges posed by the spread of false information and rumours in Arabic-language content. The project includes research and development of tools and techniques to detect, analyse and combat the spread of misinformation online.

The project stresses the importance of enhancing critical thinking and media examination skills among users, to enable them to distinguish between reliable and unreliable information. QCRI has been actively involved in research related to natural language processing, machine learning, and information retrieval, which are key aspects of developing rumour-detection tools, and has benefited media organisations as well as the United Nations.

Related Reading

Countries

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button