A group of Polish research centres have launched a project to create a Polish equivalent of the large language model (LLM) that powers the ChatGPT artificial intelligence (AI) system.
They intend for their product – called PLLuM, the Polish Large Language Universal Model – to be open source and free to use, and for it to form the basis of a Polish-speaking intelligent assistant that can help provide public services.
“We cannot afford to be left behind,” wrote the Research and Academic Computer Network (NASK), a state research institute, announcing the plans last week just ahead of the one-year anniversary of the launch of ChatGPT.
NASK is part of a consortium of six institutions, led by the Wrocław University of Science and Technology (PWr), that will develop PLLuM.
Polacy nie gęsi, swój polskojęzyczny model AI mają. A dokładniej to zaraz będą mieli, bo oto właśnie nadchodzi… PLLuM! 🗣️
— NASK (@NASK_pl) December 1, 2023
NASK notes that existing AI systems, such as ChatGPT or Google’s Bard, have two general problems – they cost money to use and are closed (meaning that their algorithms cannot be examined or modified by users) – but also one specific to Poland: that they were trained using little Polish-language content.
“There is a high chance that, when preparing its answers, [ChatGPT] overwrites some knowledge of Polish culture, customs and facts with data from other languages,” says Jan Kocoń, an AI scholar at PWr. “There are also some grammatical and stylistic errors.”
Training PLLuM with “a significantly greater share of texts originally written in Polish and containing information about Poland (Polish science, art, history, law, economy and others) will increase the visibility of our language and culture, which are noticeably marginalised in currently available models”, adds NASK.
“We already have almost 300 gigabytes of text collected from various sources, and this number is growing all the time,” said Kocoń. His university has labelled the project a “Polish ChatGPT”.
🧵🇵🇱PLLuM (Polski duży model językowy), wytrenowany na treściach głównie polskojęzycznych, oraz oparty na nim inteligentny asystent, powstaną w ramach współpracy sześciu wiodących jednostek naukowych z obszaru AI oraz przetwarzania języka naturalnego.https://t.co/zyxN8YbcT8
— Michał Podlewski (@TrajektoriaAI) December 2, 2023
PWr says that, by giving Poland a leg up in the “technology race”, the project “can support not only the development of science, but also small and medium-sized enterprises, which in the field of IT and AI are the driving force of the Polish economy”
NASK adds that “PLLuM is intended to serve not only scientists and businesspeople, but above all Polish society”. This will include the creation of “a Polish-speaking intelligent assistant, which will aim to increase the availability of public services, both digital and during a traditional visit to an office”.
The team behind PLLuM plan to have the first version available for open testing in the first half of next year. They say that the project is to be carried out in accordance with ethical and responsible AI practices, including keeping the data representative, transparent and fair.
Poland’s @TechToTheRescue, which connects tech firms with nonprofits, wants to shift the conversation around AI by harnessing it for good.
In a hackathon, over 450 programmers helped NGOs create solutions on climate change, mental health and human rights https://t.co/CgKaeYQL3n
— Notes from Poland 🇵🇱 (@notesfrompoland) June 23, 2023
Notes from Poland is run by a small editorial team and published by an independent, non-profit foundation that is funded through donations from our readers. We cannot do what we do without your support.
Main image credit: NASK/Twitter
Agata Pyka is an assistant editor at Notes from Poland. She is a journalist and a political communication student at the University of Amsterdam. She specialises in Polish and European politics as well as investigative journalism and has previously written for Euractiv and The European Correspondent.