NLP Labeling

ChatGPT for Bahasa Indonesia

Today, we announce the development of a “ChatGPT for Bahasa Indonesia.”. In today's rapidly evolving technological landscape, groundbreaking advancements set the stage for future innovations. One such revolutionary development is the Large Language Model (LLM), exemplified by OpenAI's ChatGPT.
Post Header Image
Ayu Purwarianti, Hammam Riza, Ivan Lee, Michell Setyawati Handaka, On Lee
April 18, 2023
April 18, 2023
Post Detail Image

Introduction

Today, we announce the development of a “ChatGPT for Bahasa Indonesia.”.

In today's rapidly evolving technological landscape, groundbreaking advancements set the stage for future innovations. One such revolutionary development is the Large Language Model (LLM), exemplified by OpenAI's ChatGPT.

However, most LLM research has predominantly centered on English, leaving a void in the market for other languages and concentrating the technology's advantages primarily among English-speaking nations.

Problem: ChatGPT Limitations

Despite the impressive growth and success of ChatGPT, OpenAI's revolutionary Large Language Model, which garnered over 100 million users within a mere two months of its launch, has certain limitations:

  1. Limited Bahasa Indonesia support: ChatGPT's training data for Bahasa Indonesia is significantly smaller than that for English, resulting in limited support for the language. As per Statista's data from January 2023, the most common languages employed for web content, ranked by their share of websites, include English at a dominant 58.8% and Indonesian with a considerably smaller portion of 0.6%. This disparity highlights the need for expanded research and development to cater to Bahasa Indonesia.
  1. Country-specific knowledge: As a "Jack of all trades, master of none," ChatGPT lacks specialized, in-depth knowledge about particular countries, topics, and industries. For example, it easily recognizes US brands like Coca-Cola but may not recognize household brands like Limun Linggardjati in Indonesia.
  1. Outdated information: ChatGPT's training data encompasses material up to 2021, meaning it lacks knowledge of events and developments. Consequently, it cannot provide real-time updates on weather conditions, stock market prices, and other current affairs.

Recently, there has been a notable increase in demand from Indonesian companies looking for ChatGPT-like capabilities tailored specifically for Bahasa Indonesia.

To address this demand, Datasaur.ai, GLAIR.ai, and Prosa.ai have collaborated to develop a Bahasa Indonesia-specific LLM that caters to the diverse needs of businesses in the region by addressing the above ChatGPT limitations.

Solution: Promising Preliminary Results

Below are some preliminary results where “ChatGPT for Bahasa Indonesia” outperforms ChatGPT.

Legal Questions

Below is an example of the chatbot answering questions about the Omnibus Law (“Undang-Undang Cipta Kerja”):

Question: Apa itu ketenagakerjaan?

English translation: What is employment?

Expected answer: See the image below.

English translation: Article 1 - In this law, the following terms are defined: Employment refers to all matters related to the labor force before, during, and after the period of employment.

GPT-4 answer: The response is overly general and fails to cite the origin of the information provided. This can lead to mistrust regarding whether the information is correct.

ChatGPT for Bahasa Indonesia answer: The response is precise and concise. The definition is derived directly from a government document source, which is cited and provided.

Financial Questions

Question: Berapa limit harian transfer antar bank?

English translation: What is the daily transfer limit between banks?

Expected answer: For the purposes of this initial model, our Bahasa Indonesia training data includes a corpus of information provided by BCA Bank. See the image below.

English translation: A table of Interbank Transfer rates

GPT-4 answer: The response is overly general and fails to cite the origin of the information provided.

ChatGPT for Bahasa Indonesia answer: The response is precise and concise. The definition is derived directly from information on BCA’s website.

Conclusion

Developing a Bahasa Indonesia LLM offers significant advantages to Indonesian companies and users: it better understands the country and language-specific prompts and is more concise and precise in its answers.

The preliminary results of ChatGPT for Bahasa Indonesia are encouraging. These findings demonstrate that it is feasible to harness the capabilities of an LLM and tailor it specifically for Bahasa Indonesia.

Future developments will focus on feeding in more diverse types of data, including Indonesia’s many local dialects and everyday slang and providing better tools for understanding document scans, tables, and images via Optical Character Recognition (OCR).

Together, we are ushering in a new era of language technology that will shape the future of communication and collaboration across speakers of Bahasa Indonesia.

No items found.