How To Protect Your Art From Big AI


20 August 2024
Words – Jacky Winter
Illustration – Sebastian Cestaro

In an ocean of fear and anxiety about the rise of generative AI, we've teamed up with our good friends from Jacky Winter to break down how AI image models are trained, and what you can do to protect your work from being used to train them.

  


  

Whether you’re open to seeing how AI could be used to assist your creative process, or sit more towards the Hayao Miyazaki “I strongly feel that this is an insult to life itself” end of the spectrum, most of us in the creative industry can agree that when it comes to how our work is used, consent, control and compensation are paramount.


Unfortunately, when it comes to training generative AI models on existing art, many have adopted an “it's better to ask for forgiveness than permission” approach. The copyright laws we have in place to protect artists were not written to deal effectively with concepts such as generative AI, data mining or machine learning, and the laws and regulations in this area are lagging far behind the pace at which the technology has developed. With AI now making it possible to generate an image or mimic a specific artist’s exact style with the click of a button and simple text-based prompt, many creatives are justifiably feeling more than a little concerned.

There are however some tips and tools you can use to make it harder for AI companies to use your work for AI training purposes. But before we dive into what’s possible (and not possible) protection-wise, let's start with a little more context.

How does AI training work? 

  • AI companies start by extracting huge amounts of data from the internet. Basically any artwork, image, photo, or other media that has been uploaded online and is publicly viewable can potentially be collected, or “scraped” by AI web crawlers.

  • Once collected, each piece of data is labelled or categorised. For example, a picture of a tree is tagged as a “tree”, so that the AI model can learn to identify it.

  • The AI model processes all the information it is fed and, with some fine-tuning, learns to recognise patterns, make predictions and ultimately generate images or other content.

  • The performance of an AI model and its ability to learn effectively depends heavily on the quality, quantity and diversity of the data it is trained on. For reference, one of the largest free datasets, known as LAION-5B (and also what Stable Diffusion was trained on), currently contains almost 6 billion images.
     

     

    PROTECTING YOUR WORK:
    OPT-OUTS AND DO-NOT-TRAIN REQUESTS 

    Below is a summary of tips and tools currently available to protect your work from AI. This is not an exhaustive list, but we have tried our best to cover the main methods and platforms. A heads up – bar deleting everything and going offline completely, there is no method which provides 100% protection.

    Our aim is to provide you with a starting point to implement at least some protection until longer term solutions, such as AI specific laws and regulations, are put in place. 

    Have I Been Trained / Do Not Train Registry

    • Have I Been Trained” is a free search tool by Spawning.ai that anyone can use to see if their work has been included in the LAION-5B dataset. Relevant images from the search results can then be selected and added to the Do Not Train Registry.

    • This Registry is essentially a designated a list where artists can add their work or other intellectual property, which tells AI companies that you do not consent to it being scraped.

    • Adding your work to the Do Not Train Registry does not remove it from models that have already been trained, nor does it prevent scraping by all AI training platforms or organisations that choose to ignore it.

    • However major players such as HuggingFace (the largest repository of models and datasets) and Stability AI (creators of Stable Diffusion) have agreed to honour it.


     

      Robots.txt and Meta-tags.html

      • A robots.txt file is a text file that tells bots and crawlers which pages they can and cannot access on your website. Currently there are rules and tags you can add to your website’s robots.txt file to block certain AI crawlers, including those from Open AI (the organisation behind the Dall-E image generator and ChatGPT) and Google Bard.

      • Meta-tags.html his is another coding file which provides metadata, or information about data on your website. You can currently add “noai” and “noimageai” meta tags, which signal to crawlers that you are opting out of having your content being used for AI training purposes.

      • Github has a full list of currently available tags and settings. Note that these rules and tags are simply requests for AI crawlers not to scrape - which means that while they may be sufficient to stop some, other crawlers can choose to ignore them.


      Kudurru (currently in beta)

      • Kudurru is another tool from Spawning.ai which actively blocks AI scrapers from your website by rejecting or misdirecting them. Notably, this tool will even work on crawlers that ignore opt-outs or no-scrape requests.

      • A plugin for Wordpress websites is currently available, with support for other hosting platforms in development.

        Those who self-host can also email a request to kudurru@spawning.ai to participate in the beta.

         


          

        PLATFORM-SPECIFIC OPT OUTS


        Meta (including Instagram and Facebook) 

        • Meta recently announced that it would be training its AI models on Instagram and Facebook posts. Opting out is currently only available to users in the EU/UK, due to the strict data and privacy laws in effect there. Currently there is no opt out option for other countries.
          • EU/UK users can opt out by lodging their “right to object” via the settings and privacy menu of their account. The process is more tedious than it should be, but this article by MIT technology Review steps it out and covers both Facebook and Instagram.

           

          Adobe (including Photoshop, Illustrator, Lightroom) 

          • Adobe recently faced significant backlash when it released a badly worded update to its Terms of Use, which caused many of its users to believe that their files and content would be used to train Firefly, Adobe’s AI system.

          • Adobe has since responded by clarifying the terms and stating that “We don’t train generative AI on customer content...We don’t train generative AI models on your or your customers’ content unless you’ve submitted the content to the Adobe Stock marketplace.”

            So it appears this one doesn’t actually require an “opt out”, but given the recent controversy and high number of artists and illustrators who rely on Adobe’s programs, we thought it was worth covering. 

            ARTWORK CLOAKING TOOLS

            Glaze and Nightshade

            Glaze and Nightshade are anti-AI tools developed at the University Of Chicago as part of The Glaze Project. These apps make calculated changes to pixels within an image, essentially creating a “cloak” which distorts the way AI sees and processes it.

            • The changes are hard to detect through human eyes and are not easily removed by actions such as cropping, resizing, screenshotting or applying another filter to the artwork. 

            • Glaze distorts the style of art seen by AI. For example, a cartoon style drawing that has been Glazed might instead appear to be in an etching style when processed by AI. The Glaze software can be downloaded and installed on your computer, or limited access to the web app via invitation is also available for those without the required computer specs. Glaze is also currently integrated into the anti-AI art sharing platform, Cara.

            • Nightshade changes the subject matter. For example, an image of a cow may instead be seen as a purse through an AI lens. So, while an image may still be scraped by an AI crawler, it’s value as a training tool is greatly diminished. You can download Nightshade here.

            • Both Glaze and Nightshade can be applied to a single artwork to achieve double protection. However changes made by these tools are more visible on art with flat colours and smooth backgrounds, so may not be suitable for artists or illustrators working in this style.

               

              This article has a handy scroll-over feature which shows cloaked vs original artwork. The Glaze Project has warned that these are not a permanent solutions against AI mimicry.

               

              Glaze offers excellent protection from "style imitation" by making subtle changes to images that (depending on your artwork and settings) are undetectable to the naked eye. Example Illustration by Sebastian Cestaro.

               

              Mist

              Mist is another artwork cloaking tool developed by Psyker Team that poisons AI systems so that they cannot effectively imitate an artists signature style. 

              • AI systems trained on "Misted" images typically output mages with an ugly full-bleed watermark that renders the image useless to bad faith users.

              • The developers claim that Mist offers enhanced protection against AI though imperceptible noise that takes only 3-5 minutes to process and is resilient against denoising methods.
                 
              • Misted images do tend to display a visible swirl overlay but much like Glaze and Nightshade, the intensity varies depending on the level of detail and texture in the image.

              • Get Mist here


                Overlai App

                Overlai is a promising iPhone-based app that allows users to easily process images directly from their device right before uploading to their website or social platform of choice. An Android version and Adobe Photoshop plugin are planned for release in 2024. 

                • Overlai embeds an invisible watermark and meta data in your image to signal to compliant AI models that the image should not be used for training purposes.

                • Uses blockchain technology to create a permanent register for your images. 

                • For models that ignore the watermark protocol, randomized data poisoning is introduced to the data set to protect the creators work and discourage further unauthorized use.

                • Founded by world-renowned photographers Paul Nicklen and Cristina Mittermeier.

                • Our experiements showed Overlai to be a convenient and fast way to process images with very little to no noticeable effect on images, even flat vector graphics. 

                  Get Overlai here

                   

                   

                   

                  “IT IS ALWAYS POSSIBLE FOR TECHNIQUES THAT ARE CURRENTLY EFFECTIVE TO BE OVERCOME BY A FUTURE ALGORITHM, POSSIBLY RENDERING PREVIOUSLY PROTECTED ART VULNERABLE. THE HOPE IS THAT TOOLS LIKE GLAZE AND NIGHTSHADE WILL PROVIDE SOME PROTECTION TO ARTISTS UNTIL LONGER TERM LEGAL OR REGULATORY SOLUTIONS ARE ESTABLISHED”.

                     

                  "NO AI TRAINING" NOTICES AND COPYRIGHT

                  It is worth adding a “No AI Training” notice or clause to your website and contracts, which states that any use of your work to train generative AI modules is prohibited. The Author's Guild recently released a sample clause to this effect (it is written with writers in mind, however something similar could be adapted for use by artists).
                   

                  • Legal notices can also be ignored, but having them in place gives you a stronger starting point should you ever wish to pursue formal legal action, or in the event laws are eventually enacted which require AI companies to remove from their systems any content that has been scraped without permission or against an artist’s terms.

                  • Those based in the USA may consider registering their works with the US Copyright Office. Fees vary depending on the type of registration, but start at USD65 for a standard application.


                    We just want to clarify here that you don’t need to register in order to get copyright ownership - copyright is automatically granted to you the moment you create the work. Registration is only a requirement for US works if you actually want to go to court and sue for infringement.

                     


                     

                    SUPPORT INDUSTRY GROUPS AND ARTIST RIGHTS ORGANISATIONS

                    Though it can at times feel like we are alone in this fight, there are some incredibly smart people advocating on our behalf behind the scenes to educate the public, lobby legislators and even bring class-action lawsuits against rogue AI companies.

                    One of the best things you can do to protect the future of human creativity is to join or support an industry group that actively advocates for artists rights. Most are based in Europe and the US but membership is open to artists from all over the world and comes with huge benefits. 

                    Europe

                    North America

                     

                    Now, with all that said, our intention is not to fear-monger, or make people like they need to put all their time and energy into glazing their works or becoming web-coding experts in an effort to avoid all AI scraping.

                    We’re sharing this information as we believe it’s important to stay informed and know what your options are, so you can do what’s practicable for you.

                    Optimistically, we do believe that artists and illustrators will make it out the other end of this. Art isn’t dead and there will always be a demand for great work made by real life humans.

                     


                     

                    Guest post by Jacky Winter, a next-gen artist agency that gives creative ideas and careers a place to soar. 


                    Website  |  Instagram  |  PencilBooth  |  LinkedIn