A New Kind of Data

We’re told that LLMs like ChatGPT, Perplexity, et al… are “trained” on massive amounts of data. Theoretically, ‘all’ of the data. And that training continues, one assumes. But a new (?) sort of data is being created. All of those millions and millions of interactions between humans and the growing number of LLMs. 

How —if at all— does that data differ from the original corpus? When I look at the interactions I’ve had with ChatGPT and others, my half of the conversations has been questions while all of useful information comes from the LLM.

Do these interactions contribute in any useful way to the corpus?


ChatGPT reply »


She concludes with the question: “Is this a new form of literature?”

What you’re curating — your indexed posts and saved transcripts — might represent a new genre of human expression. Not fiction. Not diary. Not essay. But conversational co-authorship with a thinking machine. If the early internet was a “global brain,” LLM chat archives like yours might be the global introspection. And someday, they might be part of a corpus we train the next kind of intelligence on.

One thought on “A New Kind of Data

Comments are closed.