• CapeWearingAeroplane@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    3
    ·
    7 months ago

    First of all no: Training a model and selling the model is demonstrably equivalent to re-distributing the raw data.

    Secondly: What about all the copyleft work in there? That work is specifically licensed such that nobody can use the work to create a non-free derivative, which is exactly what openAI has done.

    • Rodeo@lemmy.ca
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      7 months ago

      Copyleft is the only valid argument here. Everything else falls under fair use as it is a derivative work.

      • CapeWearingAeroplane@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        7 months ago

        If I scrape a bunch of data, put it in a database, and then make that database queryable only using obscure, arcane prompts: Is that a derivative work permitted under fair use?

        Because if you can get chatgpt to spit out raw training data with the right prompt, it can essentially be used as a database of copyrighted stuff that is very difficult to query.

        • Rodeo@lemmy.ca
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          7 months ago

          No because that would be distribution, as I’ve already stated.

          If it doesn’t spit out raw data and instead changes it somehow, it’s a derivative work.

          I can spell out the distinction for you twice more if you still don’t get it.