Today, we announce the development of a “ChatGPT for Bahasa Indonesia.”.
In today's rapidly evolving technological landscape, groundbreaking advancements set the stage for future innovations. One such revolutionary development is the Large Language Model (LLM), exemplified by OpenAI's ChatGPT.
However, most LLM research has predominantly centered on English, leaving a void in the market for other languages and concentrating the technology's advantages primarily among English-speaking nations.
Despite the impressive growth and success of ChatGPT, OpenAI's revolutionary Large Language Model, which garnered over 100 million users within a mere two months of its launch, has certain limitations:
Recently, there has been a notable increase in demand from Indonesian companies looking for ChatGPT-like capabilities tailored specifically for Bahasa Indonesia.
To address this demand, Datasaur.ai, GLAIR.ai, and Prosa.ai have collaborated to develop a Bahasa Indonesia-specific LLM that caters to the diverse needs of businesses in the region by addressing the above ChatGPT limitations.
Below are some preliminary results where “ChatGPT for Bahasa Indonesia” outperforms ChatGPT.
Below is an example of the chatbot answering questions about the Omnibus Law (“Undang-Undang Cipta Kerja”):
Question: Apa itu ketenagakerjaan?
English translation: What is employment?
Expected answer: See the image below.
English translation: Article 1 - In this law, the following terms are defined: Employment refers to all matters related to the labor force before, during, and after the period of employment.
GPT-4 answer: The response is overly general and fails to cite the origin of the information provided. This can lead to mistrust regarding whether the information is correct.
ChatGPT for Bahasa Indonesia answer: The response is precise and concise. The definition is derived directly from a government document source, which is cited and provided.
Question: Berapa limit harian transfer antar bank?
English translation: What is the daily transfer limit between banks?
Expected answer: For the purposes of this initial model, our Bahasa Indonesia training data includes a corpus of information provided by BCA Bank. See the image below.
English translation: A table of Interbank Transfer rates
GPT-4 answer: The response is overly general and fails to cite the origin of the information provided.
ChatGPT for Bahasa Indonesia answer: The response is precise and concise. The definition is derived directly from information on BCA’s website.
Developing a Bahasa Indonesia LLM offers significant advantages to Indonesian companies and users: it better understands the country and language-specific prompts and is more concise and precise in its answers.
The preliminary results of ChatGPT for Bahasa Indonesia are encouraging. These findings demonstrate that it is feasible to harness the capabilities of an LLM and tailor it specifically for Bahasa Indonesia.
Future developments will focus on feeding in more diverse types of data, including Indonesia’s many local dialects and everyday slang and providing better tools for understanding document scans, tables, and images via Optical Character Recognition (OCR).
Together, we are ushering in a new era of language technology that will shape the future of communication and collaboration across speakers of Bahasa Indonesia.