Date: Wed, 28 May 2025 10:17:38 +0100
On Wed, May 28, 2025 at 12:33 AM Oliver Hunt wrote:
>
> Well now you have one person saying that they do not
> give you permission to use their work.
That's three now: Jonathan, Oliver, René. I'm keeping a list here.
> Even if we were to try to assume goodwill on your part, when someone
> said you did not have their consent to steal their work, you said “I don’t
> care, I’m going to do it anyway”. That you think that is a reasonable, or
> even remotely ethical, behavior is absurd. Step 1 of being in any community
> is demonstrating some amount of respect for others in the community, and
> you haven’t demonstrated even the slightest semblance of that.
This is clearly an emotive subject for some.
If I told you that I was using 'grep' at the command line to go
through papers, I don't think that you'd complain (I hope not). Now
this is where my mindset diverges from the mindsets of a few people
here: I don't think that using 'grep' at the command line is much
different from using an offline (i.e. no internet connection) large
language model to do retrieval-augmented generation to search for
papers. The contents of the paper, and also the data derived from it,
are erased from the AI's memory as soon as the search ends -- nothing
is stored persistently anywhere.
We don't all have to be aligned here in our morals, beliefs and
values, but we have to be respectful of other people's wishes when it
comes to using their work. There are limits to this though.
In my previous email I gave facetious examples such as "drinking
decaffeinated coffee" or "a USB stick purchased on the night of a full
moon", but I'll get more down to Earth here. Let's say someone writes
a paper and puts at the end of it, "I don't consent for you to use
'grep' on this paper", or perhaps something like "I don't consent for
you to search for individual terms in this paper", and they email it
to Nevin. I don't think that such a paper should be accepted and given
a document number -- more to the point I think that there should be a
clear list of things you're agreeing to by submitting a paper to the
worldwide C++ community for it to be scrutinised and voted on.
Note, by the way, that the original idea for my program was to do
nothing more than pluck out papers when you ask it something like
"Which papers mention adding a new cv-qualifier to the language?".
This kind of thing would be near-impossible to search for properly
using a semantic search tool -- I mean lots and lots of papers mention
'cv-qualifier' without actually suggesting adding a new one to the
language.
What's going on here in this thread I think boils down to this:
Some people see offline AI as nothing more than another tool along the
same lines as 'grep', whereas other people are strongly adverse to the
use of AI (or at least its use in some contexts) and don't want their
work being processed by an AI. I imagine that this will play out all
over the world in all manner of scenarios and situations as AI becomes
more and more advanced in the coming months and years.
There are times when it is appropriate to use a person's work in a way
that they are asking you not to. I'm not saying that this is
definitely one of those situations, but in my own personal opinion, I
think offline AI retrieval-augmented generation is fair use. But
anyway I've pressed pause on the AI side of it for now, as I'm going
to focus on using the Xapian library first of all. Here's how the
project looks right now:
https://github.com/healytpk/paperkernelcxx/
And you can see that all the AI stuff is disabled in the following
file because the preprocessor macro 'PAPERKERNELCXX_USE_AI' is
undefined -- and also the Github Actions workflow doesn't retrieve the
AI library (i.e 'libllama'):
https://github.com/healytpk/paperkernelcxx/blob/main/main_program/ai.cpp
>
> Well now you have one person saying that they do not
> give you permission to use their work.
That's three now: Jonathan, Oliver, René. I'm keeping a list here.
> Even if we were to try to assume goodwill on your part, when someone
> said you did not have their consent to steal their work, you said “I don’t
> care, I’m going to do it anyway”. That you think that is a reasonable, or
> even remotely ethical, behavior is absurd. Step 1 of being in any community
> is demonstrating some amount of respect for others in the community, and
> you haven’t demonstrated even the slightest semblance of that.
This is clearly an emotive subject for some.
If I told you that I was using 'grep' at the command line to go
through papers, I don't think that you'd complain (I hope not). Now
this is where my mindset diverges from the mindsets of a few people
here: I don't think that using 'grep' at the command line is much
different from using an offline (i.e. no internet connection) large
language model to do retrieval-augmented generation to search for
papers. The contents of the paper, and also the data derived from it,
are erased from the AI's memory as soon as the search ends -- nothing
is stored persistently anywhere.
We don't all have to be aligned here in our morals, beliefs and
values, but we have to be respectful of other people's wishes when it
comes to using their work. There are limits to this though.
In my previous email I gave facetious examples such as "drinking
decaffeinated coffee" or "a USB stick purchased on the night of a full
moon", but I'll get more down to Earth here. Let's say someone writes
a paper and puts at the end of it, "I don't consent for you to use
'grep' on this paper", or perhaps something like "I don't consent for
you to search for individual terms in this paper", and they email it
to Nevin. I don't think that such a paper should be accepted and given
a document number -- more to the point I think that there should be a
clear list of things you're agreeing to by submitting a paper to the
worldwide C++ community for it to be scrutinised and voted on.
Note, by the way, that the original idea for my program was to do
nothing more than pluck out papers when you ask it something like
"Which papers mention adding a new cv-qualifier to the language?".
This kind of thing would be near-impossible to search for properly
using a semantic search tool -- I mean lots and lots of papers mention
'cv-qualifier' without actually suggesting adding a new one to the
language.
What's going on here in this thread I think boils down to this:
Some people see offline AI as nothing more than another tool along the
same lines as 'grep', whereas other people are strongly adverse to the
use of AI (or at least its use in some contexts) and don't want their
work being processed by an AI. I imagine that this will play out all
over the world in all manner of scenarios and situations as AI becomes
more and more advanced in the coming months and years.
There are times when it is appropriate to use a person's work in a way
that they are asking you not to. I'm not saying that this is
definitely one of those situations, but in my own personal opinion, I
think offline AI retrieval-augmented generation is fair use. But
anyway I've pressed pause on the AI side of it for now, as I'm going
to focus on using the Xapian library first of all. Here's how the
project looks right now:
https://github.com/healytpk/paperkernelcxx/
And you can see that all the AI stuff is disabled in the following
file because the preprocessor macro 'PAPERKERNELCXX_USE_AI' is
undefined -- and also the Github Actions workflow doesn't retrieve the
AI library (i.e 'libllama'):
https://github.com/healytpk/paperkernelcxx/blob/main/main_program/ai.cpp
Received on 2025-05-28 09:17:49