What Is ChatGPT Doing… And Why Does It Work - Stephen Wolfram (TO) • Graham et al. 2024/25's Notes

index

Essentially what ChatGPT and other LLMs are doing is asking the question “Given the text so far, what should the next word be?” repeating this again and again after each new word based on the huge corpus of information that it has been trained with through the scrapping of the internet.
ChatGPT will not always choose the most probable word however and this is done to avoid the text becoming flat and potentially copying some other text verbatim.
Temperature applied to language models like this refers to how likely it is to not pick the most probable next word. Certain temperatures appear to work better for certain types of writing, like 0.8 being best for writing essays. The element of randomness makes the text seem more creative.
Using just probabilities of single words or letters these models use probabilities of strings of words. But if this was done for strings of three words there would be a huge number of possibilities. To work around the fact that there is not enough material on earth to train a model in this way “The big idea is to make a model that lets us estimate the probabilities with which sequences should occur—even though we’ve never explicitly seen those sequences in the corpus of text we’ve looked at”.
So how does it actually work? It is trained on a huge sample of human text, starting with a prompt and continuing with stuff like it was trained with, using a neural net to reflect on what is has written so far for every new word it generates.
It is essentially using that prompt to produce things that sound right based on what the corpus sounded like. “ChatGPT is “merely” pulling out some “coherent thread of text” from the “statistics of conventional wisdom” that it’s accumulated. But it’s amazing how human-like the results are.”
While ChatGPT's methods of generating language are somewhat similar to the brain’s, differences in hardware and training strategies make ChatGPT less efficient. Unlike the brain, ChatGPT lacks loops or the ability to recompute on data, limiting its computational capabilities. Improving this could make future versions more brain-like. Despite its current limitations, ChatGPT showcases how simple computational systems can achieve remarkable results, offering insights into human language and thought processes.
Stephen Wolfram: A mathematician, living off the proceeds of a piece of software he created. Claimed to have invented a new kind of science related to cellular automata.

Week 1

Week 2