C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Dedicated website with AI that has processed all papers

From: Jonathan Wakely <cxx_at_[hidden]>
Date: Wed, 28 May 2025 13:32:40 +0100
On Wed, 28 May 2025 at 10:18, Frederick Virchanza Gotham via
Std-Proposals <std-proposals_at_[hidden]> wrote:
>
> On Wed, May 28, 2025 at 12:33 AM Oliver Hunt wrote:
> >
> > Well now you have one person saying that they do not
> > give you permission to use their work.
>
>
> That's three now: Jonathan, Oliver, René. I'm keeping a list here.
>
>
> > Even if we were to try to assume goodwill on your part, when someone
> > said you did not have their consent to steal their work, you said “I don’t
> > care, I’m going to do it anyway”. That you think that is a reasonable, or
> > even remotely ethical, behavior is absurd. Step 1 of being in any community
> > is demonstrating some amount of respect for others in the community, and
> > you haven’t demonstrated even the slightest semblance of that.
>
>
> This is clearly an emotive subject for some.
>
> If I told you that I was using 'grep' at the command line to go
> through papers, I don't think that you'd complain (I hope not). Now
> this is where my mindset diverges from the mindsets of a few people
> here: I don't think that using 'grep' at the command line is much
> different from using an offline (i.e. no internet connection) large
> language model to do retrieval-augmented generation to search for
> papers. The contents of the paper, and also the data derived from it,
> are erased from the AI's memory as soon as the search ends -- nothing
> is stored persistently anywhere.

If you're not training the LLM on the papers and producing derivative
works from those papers, then copyright isn't relevant.

> We don't all have to be aligned here in our morals, beliefs and
> values, but we have to be respectful of other people's wishes when it
> comes to using their work. There are limits to this though.
>
> In my previous email I gave facetious examples such as "drinking
> decaffeinated coffee" or "a USB stick purchased on the night of a full
> moon", but I'll get more down to Earth here. Let's say someone writes
> a paper and puts at the end of it, "I don't consent for you to use
> 'grep' on this paper", or perhaps something like "I don't consent for
> you to search for individual terms in this paper", and they email it
> to Nevin. I don't think that such a paper should be accepted and given
> a document number -- more to the point I think that there should be a
> clear list of things you're agreeing to by submitting a paper to the
> worldwide C++ community for it to be scrutinised and voted on.

We don't need to reinvent copyright law. Creating a derivative work is
not the same as searching using grep.

Authors cannot prevent you from using grep on a document that you have
a copy of. They do not need to say "you may not create derivative
works without permission" because that is what the law says for any
copyrighted work. Permission to distribute or modify must be granted
explicitly, it doesn't need to be removed explicitly.

> Note, by the way, that the original idea for my program was to do
> nothing more than pluck out papers when you ask it something like
> "Which papers mention adding a new cv-qualifier to the language?".
> This kind of thing would be near-impossible to search for properly
> using a semantic search tool -- I mean lots and lots of papers mention
> 'cv-qualifier' without actually suggesting adding a new one to the
> language.
>
> What's going on here in this thread I think boils down to this:
> Some people see offline AI as nothing more than another tool along the
> same lines as 'grep', whereas other people are strongly adverse to the
> use of AI (or at least its use in some contexts) and don't want their
> work being processed by an AI. I imagine that this will play out all
> over the world in all manner of scenarios and situations as AI becomes
> more and more advanced in the coming months and years.
>
> There are times when it is appropriate to use a person's work in a way
> that they are asking you not to. I'm not saying that this is
> definitely one of those situations, but in my own personal opinion, I
> think offline AI retrieval-augmented generation is fair use. But
> anyway I've pressed pause on the AI side of it for now, as I'm going
> to focus on using the Xapian library first of all. Here's how the
> project looks right now:
>
> https://github.com/healytpk/paperkernelcxx/
>
> And you can see that all the AI stuff is disabled in the following
> file because the preprocessor macro 'PAPERKERNELCXX_USE_AI' is
> undefined -- and also the Github Actions workflow doesn't retrieve the
> AI library (i.e 'libllama'):
>
> https://github.com/healytpk/paperkernelcxx/blob/main/main_program/ai.cpp

Received on 2025-05-28 12:32:56