Date: Wed, 28 May 2025 18:41:56 +0200
It is a (slippery?) slope or gradient of possibilities:
 
 -> With conventional tools you can create tags/keywords and index and cross-reference all papers.
 
 -> With AI you could also automatically create summaries and cross-references of the papers and then load all summaries at once for each request, not only single papers.
 
Whether that is fair use, probably depends on the length of the summary: 5 keywords, 1 sentence, 1 paragraph, 1 page per paper?
 
 
 
Remark:
If you want to work with several papers at once:
The summaries would fit into the context window, which is like a RAM, whereas all the papers would be too large for it.
You would have to train or at least "fine-tune" a model with all papers to work on the full-text; or pre-select, which ones are loaded into the context window for the question.
 
 
-----Ursprüngliche Nachricht-----
Von:Oliver Hunt via Std-Proposals <std-proposals_at_[hidden]>
Gesendet:Mi 28.05.2025 18:28
Betreff:Re: [std-proposals] Dedicated website with AI that has processed all papers
An:std-proposals_at_[hidden]; 
CC:Oliver Hunt <oliver_at_[hidden]>; 
I think you need to be clearer about your goals.
If the intent is to publish a tool where people can post a paper and get a summary of it, that’s hugely different from a tool that you can ask “reword (explain/describe) proposal X”, or if your intent is not for it to be shared with a model that is trained on those papers, I _suspect_ you’re fine.
The problems arise (at least for me) when you distribute a model that is trained on copyrighted works (some folk may be ok with you using their papers for training, but you would need their permission).
Let’s drop reference to AI and go for a much simpler tool for rewording/searching: grep and sed
One option is “I have trained a model on all these papers and it can provide a summary”: This is functionally equivalent to “I have copied all of these papers into a directory, and if you ask about a paper it produces that paper after running sed to replace a bunch of words with their synonyms and remove the author’s names”.
The other is: “A person loads a paper into my AI and gets a description": This is functionally equivalent to the user puts a paper they have downloaded (and have the authors and original doc available) and your program runs sed and provides searching, etc.
The first option involves copying other peoples work and distributing it without consent, and the second doesn’t. It’s super important to understand “AI” descriptions are not anything more than a very expensive sed+grep - they’re a purely mechanical transform of original works, and the only reason for any variance is the deliberate addition of randomness - in the above sed example, it would be equivalent to having multiple synonyms for each word and choosing the synonym randomly each time.
—Oliver
 
Received on 2025-05-28 16:49:45
