News in English

These efforts seek to preserve PH languages in the age of AI

Language is expressive, cultural, dynamic — and now, liberating. With the theme of “Wikang Mapagpalaya,” the Komisyon sa Wikang Filipino (KWF) celebrated more than a hundred Filipino languages this month of August.

“Mapagpalaya” elicits a sense of agency and inclusivity. In an August 22 webinar by the KWF, DOST-PCIEERD (Philippine Council for Industry, Energy and Emerging Technology Research and Development) Executive Director Dr. Enrico C. Paringit emphasized the role of technological advances, particularly in natural language processing (NLP), in improving the digital accessibility of Philippine languages.

“’Yung bilang na lamang ng wika na nasa atin ay maaaring maging dahilan para magbuhos pa tayo ng atensiyon sa paglilinang ng wika. At hindi lamang ito dahil gusto nating pagandahin ang wika. Ito rin ang susi upang mapayabong pa natin, mapaganda pa natin ang lebel ng teknolohiya,” said Dr. Paringit.

(Our number of languages can be a reason for us to pay more attention to the cultivation of language, not just because we want to improve it, but because it is key to improving our technology.)

Natural language processing is a subfield of artificial intelligence that enables computers to process, interpret, and, today, generate texts written in human language. Data is gathered from a set of written and spoken text, usually digitized.

A notable effort to collect data for a Philippine Languages Database is 2011’s ISIP Project 6, which produced a corpus with native pronunciations and contexts for Filipino words. In 2018, researchers from UP Diliman developed Marayum, a community-built web dictionary for low-resource languages Asi, Cebuano, Kinaray-A, and Hiligaynon.

Ang pangarap natin diyan ay hindi lang itong apat na ito ang maitaguyod or malagyan ng corpora o web dictionary, kung hindi ay pati na rin ang ibang lenggwahe sa Pilipinas.” 

(Our hope is to develop a web dictionary not just for these four [languages], but also for other languages in the Philippines.)

Meanwhile, the MinNa LProc Research & Development Laboratory was established in 2021 for NLP research, including corpus-building, for languages specific to Mindanao.

Since the AI boom of the late 2010s, “natural language processing” has been colloquially associated with generative artificial intelligence. ChatGPT is an example of an NLP algorithm that captured public attention in 2021.

In the following year, DOST-ASTI built upon Open AI’s technologies for the iTanong project, an interface that enables interaction with relational databases through Filipino words, including Taglish terminologies.

Simply put, if the Philippines’ languages aren’t put within systems that the people of tomorrow will likely be using or interacting with – such as systems that make use of NLP – the usage of these languages may shrink, and in extreme cases, potentially disappear.

NLPs as a gateway for interaction

Relational databases are integral in data storage. Think, sophisticated Excel-like sheets that contain government and business information. Traditionally, the barrier to interacting with the contents of these databases is technical skill.

With iTanong, however, it may soon be possible for Filipinos to interact directly with these databases without technical know-how. A user processing a government document, for example, may be able to ask the interface to fetch information about her application and get verifiable results in their native tongue.

While the research for iTanong is ongoing, senior research specialist at DOST-ASTI, Elmer Peramo, in an interview with AI advocate Dominic Ligot, revealed plans for iTanong to go beyond English, Tagalog, and Taglish, and cater to other Philippine languages as well.

“Understanding and interpreting a mix of languages and dialects allows iTanong to handle data more accurately, especially in contexts where meaning and intent can significantly shift based on subtle lingustic cues,” said the DOST-ASTI. 

Must Read

Lodi, awit, omsim: The nuances of Filipino AI language training

“In environments where specific terminologies and localized expressions are prevalent, iTanong’s nuanced understanding ensures that it can meet specialized needs and support a wide range of operational scenarios.”

This is one example wherein by giving the option to converse with a system in Filipino, not only does it make it easier for the Filipino majority to make use of a potentially helpful technology, it also preserves usage of the language.

DOST-ASTI is optimistic that by June 2025, several government agencies, including DOST will begin using the technology. They are aiming for a wider roll-out in January 2026.

“Kung hindi tayo makikiisa at makikibahagi sa mga pananaliksik ukol sa paggamit ng teknolohiya sa ating wika, maaaring ito rin ay isang kaparaanan upang ang ating wika ay tuluyan nang mawala, sapagkat alam natin na kapag hindi natin naipapasok sa larangan ng tinatawag na ‘digitalization’ ang mga bagay-bagay, parang hindi sila nag-e-exist, ‘di ba?” said Paringit.

(“If we don’t unite, and participate in research concerning the use of technology with language, this may lead to the total disappearance of the language. We know that things that do not undergo digitalization can seem non-existent, right?)

A diagram shows a use case for the iTanong project

Despite these advances, Paringit highlighted key concerns in NLP today: ambiguity, data bias and fairness, concerns about privacy and ethics, among others.

To remedy these issues, he encourages continuous research and development, comprehensive policies on AI use, and collaborations in the fields of linguistics and AI.

“Isang prospekto na gusto kong ibahagi ay ‘yung pagbubuo ng mga patakaran para sa paglago pa ng artificial intelligence. At sana po ay makita natin ang pagkilala rin sa papel ng wika upang ito ay higit pang malinang at makilala rin. Hindi lamang ito parang topic o subject na gusto lang nating pag-aralan sapagkat gusto lang natin ma-improve yung tool o mapaganda yung tool; kung hindi, meron din tayong malalim na kagustuhan na mapaglinang rin ang ating wika.”

(One prospect I’d like to share is the creation of policies for artificial intelligence. I hope that we acknowledge the role of linguistics so that we can cultivate it. It’s not just a topic or subject that we like to study to improve our tools – we have a deep desire to develop our languages.)

Research and development for iTanong is set to conclude on December 2024. – Rappler.com

Must Read

What happens when AI reaps what it sows?

Читайте на 123ru.net