Better Markdown Streams

If you've used ChatGPT, Claude, Grok or other AI chat applications, you may have noticed that responses sometimes change shape while they stream. A line that looked like a plain paragraph turns into a heading. A row of pipe suddenly becomes a table. A link stays plain until its closing parenthesis lands.

This instability comes from the app trying to render a markdown response before the response is finished. Markdown is how chat apps turn plain model output into structure: headings, lists, links, tables, blockquotes, and code blocks. In a normal document, such as a blog post or documentation page, this is straightforward: wait until the full markdown text exists, parse it, and render the result.

The hard part is that, in a chat interface, the document spends most of its lifetime in an ambiguous state. While a response is streaming, the app is rarely holding a complete markdown document. It is holding a prefix of one: a heading without its trailing newline, or a paragraph that might still become a table header, or a link reference that may resolve only after its definition appears later, or an open code fence that might continue for another 200 lines.

The obvious implementation makes the feeling worse: keep the accumulated markdown string, append every chunk, parse the whole thing again, and render whatever tree comes back. This works, and for small outputs it can even be acceptable, but it scales poorly. For long responses, it means the first paragraph is parsed hundreds of times after it has already become impossible for that paragraph to change. Worse, the UI keeps asking the renderer to reintrepret old content while the user is trying to read.

This is a

The better way to think about a streamed response is not as a document that gets recomputed. It is a document that becomes stable from top to bottom.

The parser should remember

Incremental parsing starts with a small change in posture. Instead of treating each update as a reason to reinterpret the whole message, the parser should keep state across chunks and remember what it is inside, what has already been closed, and what is still waiting for more input.

That difference matters. If a heading has closed, emit the heading. If the parser is still inside a paragraph and the next line might change what that paragraph means, emit nothing yet. The parser is no longer a pure function over a growing string. It is a reader with memory.

This gives the renderer a much stronger contract. It does not need to diff a brand-new document against the old one. It receives newly finalized blocks, appends them to the committed list, and leaves previous blocks alone. The only interesting place left is the end of the stream: the live edge. In other words, instead of treating the streamed response as one unstable document, it is split into two regions: committed content and pending content. The pending content is where markdown is still deciding.

Committed content and pending content

Markdown is mostly line-oriented. Many blocks reveal themselves as soon as their first line is read. An ATX heading like # Heading is already a heading. A thematic break like --- is already a break. A fenced code block announces that everything after it is literal until the closing fence.

But some lines can reach backward. They can take something that looked like paragraph text and reveal that it was something else all along.

Setext headings are the cleanest example. This first line looks like a paragraph while it is the only line the parser has seen:

Tables

Then the next line arrives, and the meaning of the previous line changes:

Tables
======

If the renderer committed Tables as a paragraph the moment it appeared, the second line would force a correction. The paragraph would disappear and an H1 would take its place. The parser was not being indecisive; the grammar simply had not given it enough information yet.

Tables behave the same way. This first line is still just paragraph text with pipes:

| Feature | Benefit |

Only when the delimiter row arrives does it become the header of a table:

| Feature | Benefit |
| ------- | ------- |

Only then can the parser say, with confidence, that the previous line was a table header. If the UI has already treated it as a paragraph, the UI now has to take that back.

Link reference definitions are quieter but just as important. A line like [note]: https://example.com is not prose. It is invisible bookkeeping for [note] elsewhere in the document. If it is rendered eagerly as text, it has to be removed later.

This is the heart of the problem. Most Markdown blocks become safe quickly. Paragraphs are the block that absorbs uncertainty. They are what a line becomes when nothing else has claimed it yet, and they are the block most likely to be corrected by what comes next.

Once you see paragraphs as the source of uncertainty, the rest of the design becomes simpler. Do not make every block wait for the slowest block. Let each block stream according to the amount of uncertainty it carries.

Some blocks are atomic. Headings and thematic breaks are decided on a single line, so the parser can emit them immediately. Some blocks stay open, but their identity is locked in once they begin. Fenced code blocks, blockquotes, lists, and many HTML blocks fall into this group. A table joins the group as soon as its delimiter row confirms the header.

Those blocks can be rendered while they are still open because future input can add to them, but it cannot turn them into a different kind of block. The renderer can keep their DOM in place and update only the live portion.

Paragraphs need a different rule. Until a paragraph closes, the parser should treat it as provisional. The renderer can either hold it back or show it as an explicit preview at the tail. What it should not do is mix provisional paragraph DOM into the committed block list and then pretend it will never have to move.

Tables show why this feels better

Tables are where the payoff is easiest to see. A naive renderer shows a table being typed character by character. Cell contents grow, columns resize, and the whole thing jitters until the row is done.

An incremental parser gives the renderer a better unit: the row. A table row is one line of Markdown. Before the newline, the cells are not final. After the newline, they are. So the table can appear once the delimiter row confirms the header, then grow one complete row at a time.

At first, while only the header line has arrived, the parser waits. The line could still be a paragraph. The moment the delimiter row (| --- | --- |) is complete, the table becomes real. Alignment is known. Column count is known. From there, each body row is appended as a stable unit.

The user sees rows arrive, not cells twitch. The table feels like it is being built, not repaired.

| Column 1

Code blocks stream smoothly

Code blocks sit at the other end of the spectrum. Once a fenced code block opens, everything inside is literal until the closing fence. There is no inline parsing to resolve and no paragraph waiting to be reinterpreted. The block is unfinished, but its identity is settled.

That means the renderer can keep a single <pre> on the page and append text as chunks arrive. It does not need to wait for line boundaries, and it definitely should not wait for the closing fence. For long code answers, holding the whole block back would turn a stable interaction into a long blank pause.

Paragraphs are the honest pause

Paragraphs are the price of correctness. They are the one place where the parser may need one more line before it can tell the renderer what is really happening.

In practice, that's a shorter wait than it sounds. A paragraph closes on a blank line, at the end of the stream, or when the parser recognizes a block that is allowed to interrupt it. Those signals are usually visible from the first few characters of the next line — ## , > , + , ```, or just an empty line.

From React's point of view, having a committed list with stable keys means React can efficiently reconcile updates without re-rendering or shuffling previously rendered items. As the list grows monotonically, new elements only being appended. React simply adds to the DOM without touching existing elements, leading to smoother UI updates and improved performance, especially during streaming or incremental rendering.

The parser-renderer contract

The whole story comes down to a contract between the parser and the renderer.

The parser advances incrementally. It remembers the open structure, consumes new text, and emits only blocks that have become stable since the last chunk. It may also expose the current open block for a preview, but that block is marked by its position at the live edge.

The renderer commits incrementally. It mounts finalized blocks once, preserves their keys, and never asks old DOM to prove itself again. It renders open-safe blocks live and treats the open paragraph as provisional.

Everything above the live edge is boring on purpose. New chunks move the parser forward. They do not cause old blocks to reflow, remount, or shift. The document grows downward instead of constantly reforming.

Closing thought

LLM streaming turns Markdown from a snapshot into a session. The right question is no longer simply “what does this document look like?” It is “given everything seen so far, what part of this document is stable enough to keep?”

A good streaming Markdown pipeline answers that question carefully. It parses incrementally, emits closed blocks aggressively, streams open-safe blocks live, and lets the tail be the only volatile place on the page. The result is a response that feels like it is being written in front of you, not assembled and reassembled after every token.

The ideas in this article are implemented in markdown-parser, an open-source, streaming-capable Markdown parser written in TypeScript. It follows the CommonMark spec, supports GFM tables, and is designed to emit stable blocks incrementally so your renderer never has to reparse what it has already committed. A demo is available here if you want to see incremental parsing in action or use it in your own projects.