Open-source AI models and big tech monopolies

Sungkyu Lee

2023년 7월 5일

The most compelling social value of open source software lies in its ability to reduce barriers to entry. Open source obliterates capital-based technological barriers effortlessly. Examples like Linux, Firefox, and Chromium demonstrate this. By offering open code, software has fostered new ecosystems of innovation and increased the involvement of volunteer code contributors from diverse communities. Collaborative in nature, open source has become the culture of developers.

Amidst the flourishing generative AI revolution, concerns about the concentration of power in big tech have intensified. OpenAI, Google, and Microsoft are on the brink of monopolizing the emerging market. Various Korean companies have already voiced concerns about "technology dependence and colonization" by global big tech, prompting calls for regulation. While there may be some exaggeration and leaps, the diagnosis is not entirely incorrect. The nature of the technology market allows early movers like Google, Facebook, and Apple to dominate significant portions of the market share.

Open source presents a potential remedy for market monopolies. The premise is the emergence of competitive open source alternatives. Stable AI's Stable Diffusion and Meta's LLaMA are already on par with any generative AI model currently available. These open-source AIs are accessible to all and can be customized. They have become so popular that they are often considered rivals to ChatGPT and Dall-E. This has sparked excitement among many developers.

However, the issue lies in the "span of control" of these open-source generated AI models. Unlike previous open source software, these models are the result of developers training on their own datasets. While the code is open, the "intelligence" remains closed. Fine-tuning may not alter the essence, as it is still within the control of the developer's model planning and design. While the barrier to entry has been lowered, the range of degrees of freedom has been diminished. Meta's Lama model, in particular, reveals the design intentions of Meta, a late entrant in the big tech industry aiming to expand its market.

Now, more than ever, it is crucial to open up high-quality datasets for open-source AI models to combat market concentration and incorporate public value. This extends beyond monolingual data. However, constructing high-quality, reliable, and legally risk-free datasets is a formidable barrier to entry. It is costly, and the stakes are high. Managing this task outside the realm of big tech is not easy. Policy intervention is thus indispensable in this space.

Numerous approaches exist to counter big tech's market dominance, yet few achieve this without stifling innovation. One option is to combine open-source AI with open datasets. To amplify the synergies in a global ecosystem, governments must take action, invest, and foster collaboration. Korean developers should be able to train models using European data, and European developers should have the opportunity to enhance open-source AI models with Korean data. The path ahead won't be easy, but it is time for governments to make a decision. If they acknowledge the detrimental effects of Google's monopoly in the advertising market, they must act swiftly.