Skip to main content
BLOG

Serve Markdown to AI agents: content negotiation done right

One Cloudflare demo page fell from 12,345 HTML tokens to 725 as Markdown. Serve it to AI agents with Accept: text/markdown, the right Content-Type, and Vary.

By IMozzUpdated 2026-06-07
Serve Markdown to AI agents — aiSiteReady

Serving Markdown to AI agents isn't a new protocol — it's ordinary HTTP content negotiation. An agent sends Accept: text/markdown, your server answers with Content-Type: text/markdown; charset=utf-8, and it tells caches Vary: Accept. That's the whole mechanism. The media type text/markdown has been registered since RFC 7763, and the rules are plain RFC 9110 (IETF). No AI-specific standard is needed.

Think of it as one layer in the readiness stack, not a standalone trick. A bot first has to reach your page and read it without running JavaScript. A Markdown representation makes that read cleaner. It hands the model your text instead of a DOM full of nav, styles, and widgets. It's the natural next step after you list .md files in llms.txt: there you point at Markdown, here you actually serve it.

Key takeaways

  • Markdown for agents is standard content negotiation over a registered media type, not a new AI protocol (IETF RFC 7763).
  • In one Cloudflare demo page, the HTML estimated at 12,345 tokens and the Markdown at 725. That's a per-page example, not an industry average (Cloudflare).
  • Three ways to ship it: .md URL variants, same-URL runtime negotiation, or build-time export. A hybrid often beats any single one (Vercel).
  • Vary: Accept alone won't save you on every CDN — CloudFront and Cloudflare need explicit cache-key config, or the cache gets poisoned.
  • Most failures are the same few: HTML labeled as Markdown, missing Vary, missing charset, and soft 404s. aiSiteReady checks this and scores your site 0–100.

Why do AI agents prefer Markdown over HTML?

It's an operational preference, not a rule. No RFC says agents must "prefer Markdown." Models, retrieval pipelines, and browsing tools mostly extract and process text, not DOM trees. So Cloudflare and Vercel both recommend serving agents Markdown, and some AI clients already request it directly (Cloudflare, Vercel).

The clearest win is token density. In Cloudflare's own example for a single documentation page, the HTML was estimated at 12,345 tokens and the Markdown at 725, roughly 94% less (Cloudflare). Treat that as a demonstration of one page, not an average. Cloudflare's own announcement post lands more conservatively: 16,180 HTML tokens versus 3,150 as Markdown, about 80% less (Cloudflare).

A side-by-side comparison: an HTML page card whose nav, sidebar, cookie banner, scripts, and footer are highlighted as noise and stamped 12,345 tokens, next to a compact Markdown card showing only headings and text stamped 725 tokens, with a 94 percent reduction badge between them.

Why does that matter beyond bandwidth? Every menu, footer, cookie banner, and client-side widget the model reads is context spent on the wrong thing. RFC 7763 draws the line cleanly: HTML is a publishing format, Markdown is a writing format (IETF). Handing an agent text that's already clean is simpler and more predictable than asking it to rebuild your main content from markup on every fetch.

How does text/markdown content negotiation work?

The semantics are entirely standard. Accept is a request header where the client lists media types it can use. The server picks a representation and declares it with Content-Type. The q weights run 0 to 1: q=1 is top priority, q=0 means "not acceptable," and a missing q defaults to 1 (IETF RFC 9110).

Content negotiation flow. A client request carrying Accept: text/markdown reaches a decision — does the client want Markdown and does the resource have it? If yes, the server returns 200 with Content-Type text/markdown; charset=utf-8 and Vary: Accept. Otherwise the default is 200 text/html with Vary: Accept. If nothing is acceptable and the server won't fall back, it returns 406 Not Acceptable with Vary: Accept.

Two details trip people up. First, charset is required on text/markdown. RFC 7763 says so explicitly, so always send ; charset=utf-8. Second, Vary: Accept does double duty. It tells caches the response may only be reused for requests with the same Accept, which effectively extends the cache key. It also signals that the body was chosen by negotiation (IETF RFC 9110). Here's the same URL answering three different requests.

GET /guide HTTP/1.1
Accept: text/markdown, text/html;q=0.8

HTTP/1.1 200 OK
Content-Type: text/markdown; charset=utf-8
Vary: Accept
Cache-Control: public, s-maxage=600

# Guide
Page text...
GET /guide HTTP/1.1
Accept: application/json

HTTP/1.1 406 Not Acceptable
Content-Type: text/plain; charset=utf-8
Vary: Accept

No acceptable representation for this resource.

When the server can't satisfy Accept and won't fall back, 406 Not Acceptable is the correct status. But RFC 9110 also lets a server ignore the preference and serve a default instead of returning 406. So this is a policy choice, not an automatic behavior (IETF RFC 9110). RFC 7763 defines an optional variant parameter for Markdown flavor, and RFC 7764 registers specific flavors. It's only an identifier, not a real compatibility mechanism, so most public deployments leave it off.

What are the three ways to serve Markdown?

There are three practical patterns, and the right one depends on how much you control your CDN. Separate .md files are the safest and most cacheable. Same-URL negotiation is the cleanest but the most cache-sensitive. Build-time export is the most predictable for static sites. Vercel recommends supporting both .md endpoints and negotiation, because plenty of agents never send Accept: text/markdown at all (Vercel).

ApproachComplexityCachingBest forChoose when
.md URL variantsLowSimplestStatic hostingYou want a fast, safe launch
Same-URL negotiationMedium–highMost finickyAPI-style logicYou control origin and CDN
Build-time exportMediumVery predictableDocs / SSGContent comes from one source

.md URL variants give every HTML page a sibling, like /docs/intro and /docs/intro.md. Diagnosis is trivial, static hosting works out of the box, and you never depend on Vary. The cost is URL duplication, so keep HTML canonical and advertise the Markdown as an alternate. In Next.js, a rewrite plus a Route Handler does it. App Router handlers use the standard Web Request/Response API (Next.js).

// app/api/md/[...path]/route.ts
export async function GET(req: Request, { params }) {
  const { path } = await params;
  const doc = await loadDocAsMarkdown(path.join('/'));
  if (!doc) return new Response('Not found', { status: 404 });

  return new Response(doc.body, {
    status: 200,
    headers: {
      'Content-Type': 'text/markdown; charset=utf-8',
      'Link': `</docs/${path.join('/')}>; rel="canonical"`,
    },
  });
}

Then declare the alternate on the HTML page: <link rel="alternate" type="text/markdown" href="/docs/intro.md" />.

Same-URL negotiation lets a human and an agent hit one address while the server chooses the format. Architecturally it's the cleanest: one resource, no navigation duplicates. The catch is caching. The response must carry Vary: Accept, or the first variant to arrive poisons the cache for everyone else. An Express handler captures the logic.

app.get('/guide', async (req, res) => {
  res.vary('Accept');
  const page = await loadPage('guide'); // { html, markdown }
  if (!page) return res.status(404).type('text/plain').send('Not found');

  const best = req.accepts(['text/markdown', 'text/html']);
  if (best === 'text/markdown')
    return res.type('text/markdown; charset=utf-8').send(page.markdown);
  if (best === 'text/html')
    return res.type('text/html; charset=utf-8').send(page.html);
  return res.status(406).type('text/plain').send('No acceptable representation');
});

Build-time export publishes both HTML and Markdown from one source at build time. There's no runtime converter, the cache is maximally predictable, and you can check the artifacts before deploy. Hugo treats this as native through multiple output formats (Hugo).

[mediaTypes."text/markdown"]
  suffixes = ["md"]
[outputFormats.MARKDOWN]
  mediaType = "text/markdown"
  isPlainText = true
  permalinkable = true
[outputs]
  page = ["HTML", "MARKDOWN"]

Why does Vary: Accept matter on a CDN?

Content that depends on Accept is only safe once the cache actually distinguishes the variants, and real CDN behavior diverges from the spec. The standard answer is Vary: Accept. But Fastly treats it as a secondary cache key (Fastly), CloudFront doesn't cache on request headers by default (AWS), and Cloudflare doesn't consider a generic Vary in caching decisions at all — so it needs a custom cache key or a Worker to tell Markdown and HTML apart (Cloudflare). So "I set Vary: Accept, therefore I'm safe" is only partly true.

Here's the failure mode Vercel calls out explicitly. An agent requests the URL first with Accept: text/markdown, the origin honestly returns Markdown, the CDN caches that object as plain /guide, and the next browser gets Markdown instead of the page (Vercel). On CloudFront that's the default outcome, because without a cache policy it doesn't vary on headers at all.

When I built aiSiteReady's Markdown check, this was the exact trap I hit most. On CloudFront's default policy, the first agent's Markdown response got cached as plain /guide and then served to every browser that followed. The origin code was correct; the cache key was the bug.

A four-step cache-poisoning sequence: an AI agent requests slash guide with Accept text/markdown, the origin returns Markdown correctly, a CDN with no Vary caches it as the plain slash guide object, and a browser then receives raw Markdown instead of HTML — flagged as the wrong result.

The fixes are CDN-specific. On CloudFront, whitelist Accept in the cache policy's ParametersInCacheKeyAndForwardedToOrigin. Note that a response-headers policy does not fix the cache key: it only changes the headers CloudFront returns to viewers — applied to every response whether it's served from cache or origin (AWS). On Fastly, normalize Accept down to two buckets first, since raw Accept permutations are huge and can wreck your hit ratio (Fastly). On Cloudflare, add a custom cache key or use the managed Markdown-for-Agents feature. The rule of thumb: if you're unsure about your CDN, ship .md siblings instead.

What mistakes break Markdown for agents?

Almost every broken deployment lands on the same short list. Each one follows directly from the Content-Type / Accept / Vary rules or from how search engines treat status codes.

MistakeHow to detect itThe fix
HTML body labeled text/markdowncurl -i shows the header but the body starts <!doctype html>Return real Markdown, or correct the Content-Type
Negotiation with no Vary: AcceptCompare two Accept values through the CDNAdd Vary: Accept to both responses
Missing charsetHeader checkSend Content-Type: text/markdown; charset=utf-8
A missing .md returns 200curl -i /page.md, Search ConsoleReturn 404 or 410, not a pretty 200 page
Wrong status for unsupported typesTest Accept: application/jsonReturn 406 when you won't fall back
Canonical only on HTMLInspect headers on the .mdAdvertise .md as alternate; add a canonical Link header

That soft-404 row matters most for SEO. Google defines a soft 404 as a page whose content says "not found" while the server still answers a 2xx status like 200 OK (Google). If a .md resource is gone, return 404 or 410. If the negotiation layer itself breaks, that's a 5xx — never a 200 with a stub. And if your facts only appear after client-side JavaScript runs, no static Markdown export will capture them.

How do you verify your Markdown setup?

Check two planes at once: the negotiation headers and the actual body. A few curl calls confirm status, Content-Type, the presence of Vary: Accept, and that the body matches its declared type. Run them against the public edge URL, not just localhost — most negotiation bugs live in the cache, not the origin code.

# headers only
curl -sSI -H 'Accept: text/markdown' https://example.com/guide
# full body — should be Markdown, not <html>
curl -sS  -H 'Accept: text/markdown' https://example.com/guide | head -20
# negative test: unsupported type should 406 (or fall back by policy)
curl -sSI -H 'Accept: application/json' https://example.com/guide

Wire that into CI as a smoke test so a regression can't ship. The check is small and catches the common breaks before production.

#!/usr/bin/env bash
set -euo pipefail
URL="${1:?usage: check-md.sh https://example.com/page}"
HEADERS="$(curl -sSI -H 'Accept: text/markdown' "$URL")"
BODY="$(curl -sS -H 'Accept: text/markdown' "$URL")"
grep -qi '^content-type: text/markdown;.*charset=utf-8' <<<"$HEADERS" || { echo 'FAIL: Content-Type'; exit 1; }
grep -qi '^vary: .*accept'                              <<<"$HEADERS" || { echo 'FAIL: missing Vary'; exit 1; }
grep -qi '<html\|<!doctype'                             <<<"$BODY"    && { echo 'FAIL: body is HTML'; exit 1; }
echo OK

One honest caveat before you over-invest: agent support for Accept: text/markdown is still uneven. Many agents don't send the header, which is exactly why the hybrid (.md files plus negotiation) is more reliable than betting on a single path (Vercel). Keep your agent Markdown conservative, since flavor extensions aren't standardized identically across consumers.

How do you check your whole site at once?

Validating one route by hand is fine. Checking every template, on every release, across Markdown negotiation and the layers beneath it, is not. That's the job aiSiteReady does. It fetches your site the way an agent would and runs a Markdown-negotiation test inside the content-accessibility block, returning prioritized fixes and reproducible HTTP evidence — as part of a 0–100 score.

This sits alongside the discoverability and protocol checks, on top of the content-accessibility gate that decides whether an agent can read your page at all. The exact checks and weights live on the methodology page. This guide is the serving counterpart to the discovery map you publish with llms.txt. Both are spokes under what AI agent readiness means.

Run a free scan to see whether ChatGPT, Claude, Perplexity, and Google AI can fetch a clean Markdown version of your pages: whether your Content-Type is honest, whether Vary: Accept is present, and which fixes to make first, in English, Ukrainian, or Russian.

The short version: you don't need a new standard to feed agents clean text. You need the right Content-Type, an honest Vary, and a cache that tells HTML and Markdown apart.

IMozz has 20 years in software development, with the past year spent building with LLMs. He builds aiSiteReady, a read-only scanner that checks whether AI agents can read a site. It server-renders its own content as a working example.

Frequently asked questions

Do I need a special protocol to serve Markdown to AI agents?
No. It's ordinary HTTP content negotiation: the agent sends Accept: text/markdown, and you reply Content-Type: text/markdown; charset=utf-8 plus Vary: Accept. The media type was registered in RFC 7763, and the negotiation rules are standard RFC 9110. There is no new AI-specific standard involved; you're just using HTTP the way it was designed.
Should I use .md URLs or same-URL content negotiation?
Often both. Vercel recommends supporting .md endpoints and negotiation together, because many agents never send Accept: text/markdown. Separate .md files are the safest, most cacheable rollout and work on static hosting out of the box. Same-URL negotiation is cleaner architecturally, but only if your CDN actually keys the cache on Accept.
Is Vary: Accept enough to make my CDN safe?
Not by itself. Fastly treats Vary as a secondary cache key, but CloudFront ignores request headers unless your cache policy whitelists Accept, and Cloudflare doesn't act on a generic Vary without a custom cache key or Worker. Always test the public edge URL, not just your origin, before trusting the setup.
Will serving Markdown hurt my SEO?
Not if you keep HTML canonical. Advertise the Markdown copy as an alternate format with a link rel=alternate type=text/markdown tag, and on the .md file send a rel=canonical HTTP Link header, which Google supports for non-HTML documents. Avoid soft 404s: a missing .md must return 404 or 410, never a 200 with a stub page.