Language Models Can only Write Poetry - Allison Parish (TO) • Graham et al. 2024/25's Notes

index

There is a widespread apocalyptic fear among those who write literature about how these language models have become so strong and able to produce poetry (for example)
Parish goes as far as to claim that computers can only write poetry, but what does that actually mean?
Parish goes on to explain how language models are simply based on probabilities assigned to stretches of text. What word is likely to follow any given word.
Unigram language models are the most primitive form of language model, the model is trained based on what is inputted into the model (with what it was ‘trained’) and based on the frequency of certain words it produces can produce a text.
N-gram language models: Unigram language models simply use frequency of specific tokens (words, letters, etc.), thus a sentence like ‘I the I an my’ would have a high frequency in a unigram model. The order of tokens in language matters however, this is where N-grams come into play. N-grams look not only the probability of a single token but also strings of tokens, the word I might have a certain probability of being used while ‘I am’ would also have a certain probability.
Markov chains in writing: Markov chains are strings where the next token is based on the one that came before it. In essence it is focused on the probability that a word is to follow another (within the text that trained the model). Token’s in this instance can be words, provided a certain starting word and based on probabilities it looks at what words might be likely to come next. These are a type of n-gram model.
Neural network language models: Just as the unigram model had its limitations so does an n-gram model, they functionally do not work at scale (they need too large of a corpus and too much memory, eg. words that follow a bigram model based on the alphabet would have 26x26=676 n-gram counts increasing exponentially with the value of n). Neural network models learn what tokens and sequences of tokens occur in certain contexts and can thus make better predictions about what is to follow. As a result of the structures of NNLM they can better grasp contextual similarities which allows them to model long range dependencies in language much better. Training neural network models takes a tremendous amount of energy and can have a significant environmental impact. Regardless of it is an n-gram or NNLM the main focus is predicting sequences.
So how is it that language models can only produce poetry: every utterance has three components the locutionary act (the actual words spoken or written), the illocutionary force (the intended action of the utterance), and the perlocutionary effect (the result of the utterance). Using the example of ‘class dismissed!’ a language model does not have the illocutionary force to actually dismiss the class. Poetry is then said to be hollow and void just as that produced by a language model is hollow and void because the output is missing the context to evoke a real change. Despite this the outputs can still trigger an emotional response, this being posited as an aspect of poetry. Parish is concerned with new forms of poetry to better understand how human linguistic behaviors work.
Immersive fallacy: ‘even if you had the perfect language model you would still need to invent poetic form’, basically you still need to make the actual poem out of the simulated poetry. We have to ‘make chess out of stones’, in other words the language models adhere to convention and become inane and it is up to the user to find ways to make it weird and interesting.
‘Language models cant produce poetry worth reading’: how should we move forward? Not to simply tell the model to produce poetry but to instead have the model produce ideas and juxtapositions to work from. Or to use the models to look at the form of the language models themselves.
Creating poetry vs creating a poem: language models produce poetry because the evoke feeling and mimic meaning but are constrained by algorithms. They cannot create a poem because a poem is an intentional arrangement made by a person.
The technical aspects of this make sense insofar as understanding the basics of unigrams, n-grams, and NNLMs. The more philosophical aspects related to poetics have mostly gone well over my head.

Week 1

Week 2

Your Notes

Topic 1: language model models word words text chatgpt make based human probability meaning produce poetry work probabilities experiment understanding understand neural