The Hacker News discussion primarily revolves around comparing and contrasting Large Language Models (LLMs) with traditional Markov chains for text generation.
Here are the three most prevalent themes:
1. LLMs as Sophisticated/High-Order Markov Chains (or not)
There is significant debate over whether LLMs fundamentally differ from Markov chains, or if they are simply very high-order, continuous-space versions of them. Some argue that any autoregressive system using a context window satisfies the Markov property, regardless of underlying implementation (neural network vs. lookup table). Others contend that the continuous latent space and mechanisms like attention fundamentally break the discrete, fixed-context limitations of traditional N-gram Markov models, making the analogy unhelpful for practical comparison.
Quote: "An LLM inference system with a loop is trivially Turing complete if you use the context as an IO channel, use numerically stable inferencing code, and set temperature to 0... If you want to make a universal Turing machine out of an LLM only requires a loop and the ability to make a model that will look up a 2x3 matrix of operations based on context and output operations to the context on the basis of them (the smallest Turing machine has 2 states and 3 symbols or the inverse)." - @vrighter
Quote: "Both LLMs and n-gram models satisfy the markov property, and you could in principle go through and compute explicit transition matrices... But LLMs aren't trained as n-gram models, so besides giving you autoregressive-ness, there's not really much you can learn by viewing it as a markov model" - @krackers
2. Creativity, Novelty, and Generation Outside of Training Data
A core tension in the discussion is whether LLMs can genuinely create or solve problems "not in their training data," in contrast to Markov chains, which are often asserted to only output sequences seen in the corpus. Users debate whether LLM outputs that deviate from the training material—like the simulated Trump speech—are true creativity or merely "hallucinations" resulting from exploiting the density of the learned probability distribution.
Quote: "LLMs will generate token sequences that didn't exist in the source material, whereas Markov Chains will ONLY generate sequences that existed in the source." - @Sohcahtoa82
Quote: "Hallucinations are not novel ideas. They are novel combinations of tokens constrained by learned probability distributions." - @johnisgood
3. The Importance of Attention and Latent Space over Discrete States
Commenters highlight that the key differentiator, aside from sheer scale, is how LLMs utilize continuous representations. Attention mechanisms allow LLMs to dynamically weigh long-range dependencies, whereas traditional Markov chains struggle with context lengths that quickly become prohibitively large (the curse of dimensionality). This capability allows LLMs to handle semantic relationships that are impossible for simple N-gram models to capture.
Quote: "The main reason here is that sometimes, the last N words (or tokens, whatever) simply do not have sufficient info about what the next word should be. Often times some fragment of context way back at the beginning was much more relevant. ... Attention solves this problem." - @ComplexSystems
Quote: "With NN-based LLMs, you don't have that exact same issue: even if you have never seen that n-word sequence in training, it will get mapped into your high-dimensional space. And from there you'll get a distribution that tells you which words are good follow-ups." - @kleiba