The Internet has flourished on a philosophy of openness, collaboration, and, most notably, sharing. Among these principles, sharing stands out as the linchpin. The vast reservoir of information and art that is the internet owes its existence to the voluntary contributions of creative individuals. These pioneers rebelled against the outdated copyright frameworks of the print era, giving rise to open, collaborative licensing models. Examples like Creative Commons and various other open source licenses serve as testament to this spirit of sharing.
Unfortunately, the very sharing that fueled the internet's revolutionary growth is now facing a crisis. The "goodwill sharing" that birthed a transformative remix culture is being co-opted for commercial gain by a select few tech companies. We rejected traditional intellectual property rights and chose to exchange knowledge, content, and artwork for the common good, but, in return, we now face an existential threat. This poses a daunting challenge to those who believed in the power of collaboration to enhance the creative ecosystem. Innovations like Dall-E, Stable Diffusion, and Midjourney are pushing back against this trend.
Initially, open sharing licenses like Creative Commons aimed to build a richer commons, with CC0, in particular, being a shining example. It allows you to utilize, transform, and combine images, videos, music, and photos without any restrictive conditions. It's a selfless gift extended to passionate creators, artists, and researchers. However, no one foresaw that machines would exploit this commons.
The exploitation of this commons gives rise to two significant problems: human creators are discouraged from sharing, and machine-generated content saturates the market. As AI platforms learn from the wealth of data created by humans and stored in digital commons, they replicate human creative styles with increasing sophistication. For instance, AI platforms can draw from petabytes of data in digital commons, referred to as "common crawls," to enhance the quality of their output. Sadly, the common crawl is now turning against creators and artists, discouraging sharing and acting as a hub for sharing-avoidance behavior. What once served as a digital commons for innovative startups and researchers has become a data goldmine for tech giants to easily amass vast datasets. This is why more and more websites are blocking Common Crawl's data harvesting mechanisms. In this process, the cherished spirit of sharing that has sustained the internet is slowly dimming, and machine-generated content is filling the void.
Ensuring safe sharing has become an urgent concern in the digital ecosystem. A fresh framework is imperative to grant creators control over their data without hindering sharing. The argument for segregating primary works such as images and videos from data used for machine learning and managing them under distinct licenses is compelling. The responsibility now falls on creators to devise innovative solutions that keep the connection between sharing, emulation, and creation intact. If necessary, regulatory measures may need to be employed against AI platforms, making it clear that they cannot exist without human creativity.