History 2023년 01월

Is AI-generated Art a Creation?

Karlo AI logo

Exploring the Coexistence of Humans and AI with Kakao Brain - Part 1


In March 2016, the five-game Go match between professional player Lee Sedol and the artificial neural network-powered AlphaGo took place in Seoul, attracting the world's attention. AlphaGo, trained on 160,000 human-played Go records, emerged victorious with a 4-1 win against Lee Sedol. Despite receiving some human assistance, the fact that AlphaGo learned to play Go better than most human players in just five weeks was enough to amaze and frighten people.


In 2017, Internet companies and telecom providers began to release artificial intelligence speakers one after another. They provided comprehensible content by recognizing the colloquial language, creating a vibe of familiarity for machine speech sounding just like humans, among both kids and grown-ups without exception. This was the very time AI started to gain a foothold among the general public, moving beyond the realm of engineers.


Throughout the past 5 to 6 years, the conveniences brought by AI have permeated every aspect of daily life. The old-fashioned, clearly robotic ARS guidance voice on the other side of the phone and the outdated smartphones that no longer support facial or fingerprint recognition have unwittingly become inconvenient. 


What will be the next AI that penetrates everyone's life? 


Two AI technologies have become the talk of the town in 2022: Multi-modal image generation technology and super-large AI language model. Cases that apply these technologies are making a stir both domestically and abroad. On the other hand, some fear that "the day AI replaces human jobs is just right around the corner." Will it really be so? Together with Kakao Brain krews, we talked about artificial intelligence's present and near future and the wise coexistence with humans. In Part 1, we looked at multi-modal image generation technology, and in Part 2, we focused on the Korean super-large AI language model.


#Please tell me what you want to draw. We'll create the style based on our data. 

‘Upper body of an Asian man wearing a hip-hop style hooded t-shirt. Expressed in Renaissance style oil painting.’

a portrait painted by Karlo


‘Upper body of an Asian man wearing a hip-hop style hooded t-shirt. Pencil sketch.’

a portrait sketched by Karlo


To generate an image, we tried typing a description of the image in the AI artists app B^DISCOVER by Kakao Brain specifying a similar theme but with a different mode of expression. After a short while, 14 visually appealing creations became accessible. In case the output does not fulfill a requester's desired image, adjustments to the request can be made to resolve the problem. If the requester has a more specific requirement, such as a side view or a painting style reminiscent of Michelangelo from the Renaissance, the model will accommodate such commands.

How artificial intelligence is implemented as a service

An AI model is built by learning the dataset, which is provided in the form of an API. It is implemented as a service to general users through the provided API.

The B^DISCOVER service uses the state-of-the-art multi-modal AI, "Karlo." This AI was developed by Kakao Brain and introduced in April 2022, as a refined version of "minDALL-E," which was released in December 2021. Karlo boasts 39 billion parameters and has been trained on a massive dataset of 120 million text-image pairs. Compared to minDALL-E, Karlo's model size has increased threefold, while its image generation speed and training dataset size have doubled.


The training dataset is crucial to the learning process for AI, acting as a virtual repository of information. By carefully selecting the content of the dataset, the AI's knowledge base can be tailored and adapted to specific domains. The breadth and depth of the AI's understanding are directly influenced by the diversity of the data in the training dataset.


Coyo is a large-scale dataset of about 740 million image-text data obtained by Kakao Brain. That being said, it is a library that can optimize composition and diversity as requested. The model can train on additional data to suit the service to be released, which makes the company distinctively competitive as an AI-specialized company. 


It is also a background that can incarnate a natural image for a text of unfamiliar requests, such as "the appearance of a man wearing a hoodie expressed with the style of Renaissance painting."


There were also disappointing parts. Low-match pictures came out when descriptions like "ink-painting strokes in Chosun Dynasty," where data is not abundant on the Internet, were input as commands. The shape of the hand that AI finds challenging to draw also looked awkward. Some parts may be far from one's taste. It is impressive when considering the work speed and quality of the results.


Can AI-generated images be considered a work of art? Joy, the PO of the 'B^ DISCOVER' service, explains that "When humans create artwork, they start from inspiration and go through a creative process to capture their soul. However, 'B^ DISCOVER' performs work based on the commands inputted by humans and the publicly available data, so it can be seen as 'results' at best". Despite its remarkable abilities, which can evoke amazement and fear, AI still remains dependent on human involvement to accomplish anything.


#A well-trained AI is a brilliant tool in the hands of its creator

However, the technology only shines in its true brilliance when it is transformed into a functional piece, much like a gem turned into jewelry. Prompt engineers play a crucial role in 'transforming' AI technology into user-friendly services by asking appropriate questions based on their engineering knowledge, allowing AI to perform its tasks effectively. Questions they ask AI, the results generated thereby, and the adjustment process are reflected in the product development stage. They can be considered as "trainers" of AI."


B^ DISCOVER is an English-based service. Being fluent in English is a must for prompt engineers. But Molly and Hailey believe that having a curious mindset and being interested in the world around them is just as important. That's because asking questions from human perspectives can maximize AI's potential.

Karlo used personal pictures from Krews to redo their profile images
Karlo used personal pictures from Krews to redo their profile images


In North America, many cases of multi-modal image generation technology are only being released and not accessible by the general public or operated in a limited capacity due to political or gender equality controversies. Unlike these cases, KakaoBrain strives for an open community approach. Coyo and Karlo were made available to the developer community on GitHub in September and December 2022, respectively, followed by the public-facing service B^DISCOVER. The company's direction is set towards open-sharing to receive a variety of perspectives and rapidly update, and it has the assurance of being able to handle any prompt, thanks to Karlo's training on its own dataset.


While preparing for "B^ DISCOVER," these krews met various practicing artists. Regarding the multi-modal image, artists were divided over generation technology, just like the general public. The pros highlight that people can get much help for simple tasks. When digital creation tools like PhotoShop first emerged, there was a debate about whether images by those tools were artworks. On the other hand, the cons are concerned about artists who work and live off of it.


Marilyn says, "The age has come where creative thinking is being noticed more than production skills." She also says, "Professional artists sometimes make it a rule to keep their unique prompts private, considering them as their unique creation know-how." Meanwhile, Haley predicts, "The creation process involves the pain of realizing one's intention through prompts." and "If you try using multi-modal image generation technology directly, you will notice it is a novel, useful tool for the creator, not a human substitute."


At present, B^ DISCOVER, which primarily operates on its T2I (Text to Image) feature, only utilizes around 20% of its capacity. Joy announced plans to enhance the feature to generate more diverse images, repair damaged parts of images in a less noticeable way to humans, and add further visual elements of the same style by simply inputting a word or natural language description.


While some people may have vague fears about AI creation, changes that are set to unfold in the near future may come from unexpected sources. Healy looked ahead into the future, "The system can actively encourage discussions and idea exchanges based on one's own thoughts, so it could be noticed as a creative learning tool in the field of education."


People previously hindered in drawing due to physical constraints can now easily express their artistic aspirations. This opens up new opportunities for those with abundant imagination but limited access to art education. AI can perform the 'creation' task when artists simply convey their thoughts in language.


#Big, Large, Hyper? What truly matters is human thought

The expressions used to describe artificial intelligence, including big data, deep learning, large scale, and hyper-scale, have evolved over the years. These are relative terms used to reflect AI's extent, depth, or scale. What was once considered large and deep data during the "deep learning" era is now just a size that students can use for training purposes.


A lot of the data that used to require human intervention for labeling is now automated. The "data labeler" job was once seen as a promising future career, but that is no longer the case. 


Artificial intelligence is a leading technology. When it envisions various possibilities, new applications, and services emerge and induce real-life changes. There's no need to excessively interpret it or increase fear by just focusing on one aspect of the technology.


The widespread use of personal computers was once feared to lead to job loss for many. However, most people have now embraced their potential and used them as a tool. Artificial intelligence is currently making its way into the mainstream, just like personal computers before it.


Hands-on experience speaks louder than words. Explore publicly available AI and imagine the different ways it can transform your life. On B^DISCOVER, you can bring Michelangelo back to life as your personal artist with just a few written lines.


#Karlo's atelier  
This is an image drawn by Karlo after adding the prompts of Pizza, Delivery, and Action Movie, with the subject of a cat.
This is an image drawn by Karlo after adding the prompts of Pizza, Delivery, and Action Movie, with the subject of a cat. 


This is a milestone for research and development of Kakao Brain's large-scale AI image generation model. After releasing minDALL-E model in December 2021, RQ-Transformer in March 2022, dataset COYO-700M in August 2022, and B Discover in October 2022, followed by Karlo 1.0 API in January 2023.


📍Karlo has been updated to version 2.0 

On July 6, 2023, Kakao Brain released the Karlo 2.0 API, which improves image quality and generation speed by training on a 300 million text-to-image dataset ☞ Learn more

You can experience the features in the Karlo 2.0 API on the Karlo service (web) ☞ Go to 


📍Here is an article about KoGPT, the Korean super-giant AI language model ☞ Go to read

목록 보기
추천 콘텐츠