• 31337@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    edit-2
    1 month ago

    I stated that they can do this, and asked if they could be sued if they used near-verbatim code generated from an LLM, just like they could be sued if they copy-pasted AGPL code.

    Edit: Tools like CoPilot tell you if your code is similar to publicly available code so you can avoid these issues.

    Edit: Just looked up EFF’s position and I tend to agree with it:

    Artificial Intelligence and Copyright Law

    Artists are understandably concerned about the possibility that automatic image generators will undercut the market for their work. However, much of what is criticized is already considered fair use under copyright law, even if done at scale. Efforts to change copyright law to transform certain fair uses into infringement carry serious implications, are likely to interfere with the innovative potential of AI tools, and ultimately do not benefit artists. In fact, the use of these tools could expand the capacity of artists to create expressive works. Policymakers should emphasize the importance of human labor and investment in what receives copyright protection to maintain wages and dignity. Artists should be protected from efforts by large corporations to both substitute their labor with AI tools and create a new, unnecessary copyright regime around AI-generated art.

    Machine Learning is a Fair Use

    The process of machine learning for generative AI art is like how humans learn—studying other works—it is just done at a massive scale. Huge swaths of data (images, videos, and other copyrighted works) are analyzed and broken into their factual elements where billions of images, for example, could be distilled into billions of bytes, sometimes as small as less than one byte of information per image. In many instances, the process cannot be reversed because too little information is kept to faithfully recreate a copy of the original work.

    The analysis work underlying the creation and use of training sets is like the process to create search engines. Where the search engine process is fair use, it is very likely that processes for machine learning are too. While the act of analysis may potentially implicate copyright, when that act is a necessary step to enabling a non-infringing use, it regularly qualifies as fair use. If the intermediate step were not permitted, fair use would be ineffective. As such, when factual elements of copyrighted works are studied and processed to create training sets—which, once again, is how we humans learn and are inspired by themes and styles in art and other works—that is likely to be found a fair use.

    https://www.eff.org/document/eff-two-pager-ai

    • trollbearpig@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      2
      ·
      1 month ago

      What point are you trying to make? That the fact that someone can break the law means we should not have laws? I honestly don’t get what you are trying to say.

      • 31337@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        1
        ·
        1 month ago

        I’m saying using code for training is a different issue that copyright infringement. I edited my post above to better lay out my position.

        • trollbearpig@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          2
          ·
          edit-2
          1 month ago

          And that’s the whole point of my comment, did you even read it? To summarize, there is currently a loophole in law that allows these bullshit arguments about it being different than straight up copying shit (though this haven’t been litigated yet, so it’s not yet clear if these arguments are actually valid). This means that while a person reading my AGPL code and copying it (without following the license) is 100% illegal, doing the same through an LLM may be legal. So this means that open source licenses can be bypassed by first training an LLM with the code and then extracting the code from the LLM. This is terrible for open source, and in general for anyone who wants to make a living from creating copyrighted work. So we should close this loophole, and I’m glad there is a push to close this through better laws. Even if these laws are comming from Disney, Sony, and all those awful companies.

          So again, what’s the point you are trying to make here? That we shouldn’t make these laws stronger to prevent this bullshit? I honestly don’t understand what you are trying to argue here, nothing of what you have said has anything to do with this conversation.

          • 31337@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            3
            arrow-down
            1
            ·
            1 month ago

            That we already have laws that protect copyright infringement (which seem like they would still apply if it was spit out by an LLM or not), and no more should be made. That training on public data is fine.

            • trollbearpig@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              arrow-down
              2
              ·
              1 month ago

              Any arguments to defend your position? I’m giving you a very clear example of the awful consecuences of following that path. And the same applies to any creative work. You are just being dismissive without proposing any real solution. Do better man.

              • 31337@sh.itjust.works
                link
                fedilink
                English
                arrow-up
                3
                arrow-down
                1
                ·
                1 month ago

                The EFF link I posted above provides evidence. Again, here’s a quote from part of it:

                The process of machine learning for generative AI art is like how humans learn—studying other works—it is just done at a massive scale. Huge swaths of data (images, videos, and other copyrighted works) are analyzed and broken into their factual elements where billions of images, for example, could be distilled into billions of bytes, sometimes as small as less than one byte of information per image. In many instances, the process cannot be reversed because too little information is kept to faithfully recreate a copy of the original work.

                As I mentioned before, Copilot at least, helps people avoid copyright infringement by notifying you if your code is similar to public code. The solution I’m proposing is no new laws, and just enforcing the ones we have. Most of the laws being proposed look like attempts at regulatory capture to me.