Digital Marketing

The Content Formats LLMs Cite (and the Ones They Ignore)

·2026-07-03·12 min read
Editorial illustration of an AI assistant assembling an answer by lifting short, self-contained passages from a few source cards - original data, a comparison table and a definition block are highlighted in brand red as cited, while a dense wall-of-text card and a keyword-stuffed listicle card are greyed out and ignored.

The short answer

Over nine months we tracked which of our own published pages ChatGPT, Perplexity and Google's AI Overviews actually cited - and which they ignored, even when the ignored pages ranked well on Google. Six formats earned citations reliably: original data and benchmarks, direct-answer definitions, comparison tables, honest buying guides, numbered step-by-step processes, and genuine FAQ blocks. Five formats never earned a citation no matter how much traffic they pulled: keyword-stuffed listicles, undifferentiated "ultimate guides", opinion with no evidence, thin or gated content, and narrative posts that bury the answer. The dividing line was never the topic. It was structure: cited pages contained a short, factual, self-contained passage a model could lift and attribute without reading the rest of the page. Ignored pages didn't. That single distinction now shapes how we brief every piece.

How we actually ran this

This is not a theory post. For a set of client and owned pages, we did the boring thing: we fixed a list of the questions our buyers ask an assistant, ran those prompts across ChatGPT, Perplexity and Google's AI Overviews on a repeating cadence, and logged three things each time - whether our page was cited, which passage was quoted, and whether the same page also ranked in classic Google results. We wrote up the tracking method in full in how to track AI brand mentions across ChatGPT and Perplexity; this piece is what we learned from the log once it had enough entries to show a pattern.

The most uncomfortable finding came first. Ranking and citation were only loosely correlated. Some of our best-ranking pages - genuinely useful, well-linked, technically clean - were never cited once. Some pages that ranked modestly got quoted constantly. When we lined up the cited pages next to the ignored ones and asked what the cited group had in common, the answer wasn't topic, length, or keyword targeting. It was that every cited page had at least one passage you could copy out, drop into a conversation, and it would still be true and complete on its own. That is the whole game, and the rest of this post is what that looks like in practice.

The six formats that got cited

1. Original data and benchmarks

This was the runaway winner, and it wasn't close. Any page that published a number we generated ourselves - a benchmark, a test result, an aggregated outcome across client accounts - got cited far more often than any explainer, however good. The reason is mechanical, not mysterious. A definition can be paraphrased from a hundred sources, so an assistant has no reason to name you. A number that exists on exactly one page in the world forces attribution: if the model wants to use it, it has to point at you.

You do not need a research department. One honest benchmark from work you already did - "across the accounts we manage, X happened in Y percent of cases" - is more citable than a 3,000-word guide that restates what everyone already published. The bar is originality and honesty, not scale. This is the highest-leverage thing most brands are not doing, and it is why we push every client to publish at least one piece of proprietary data.

2. Direct-answer definitions

"What is answer engine optimization?" style pages got cited constantly - but only the versions that answered the question in the first two to four sentences, before any history, context or throat-clearing. The pattern that worked was almost formulaic: state the definition crisply, give the one qualifier that makes it accurate, then stop and elaborate below for humans. The versions that failed opened with three paragraphs of "in today's fast-moving landscape" and buried the actual definition halfway down.

Assistants lift the top. If your definition is a clean, standalone unit near the start of the section, it gets quoted. If a reader - or a model - has to scroll to assemble the answer from scattered sentences, it doesn't. We rebuilt several pages around this single rule and watched previously-ignored definitions start getting cited within weeks. It is the cheapest structural fix available, and it is the core of answer engine optimization.

3. Comparison content, structured as tables

"X vs Y" content punched above its weight, especially when the comparison was laid out as an actual table rather than prose. Buyers ask assistants comparison questions constantly - "is A or B better for my situation" - and a clean comparison table is the ideal liftable structure: rows of attributes, two columns of values, no narrative to untangle. We saw comparison tables quoted almost verbatim in AI Overviews.

The failure version was the fake comparison - a page that claims to compare two things but spends the whole time arguing for one, with no honest columns where the other option wins. Assistants seem to distrust these, and buyers do too. The comparisons that got cited were the ones honest enough to say where each option loses. If you want the deeper architecture behind this, we broke it down in comparison-page SEO and BOFU architecture that ranks.

4. Honest buying guides framed by use case

"Best X for a specific use case" pages got cited when they were genuinely a guide and not a thinly-disguised pitch. The format assistants reward is a guide that names criteria, applies them honestly, and is willing to say "for this situation, not us." Counterintuitive for a brand, but the citation data was unambiguous: the more honest and criteria-led the guide, the more often it got quoted, because the model treats it as a reference rather than an advertisement.

We learned to write these the way we write our city and category buyer's guides - lead with the evaluation criteria, apply them without flinching, and let the brand appear as one option assessed on the same yardstick as the rest. That restraint is exactly what makes the page citable.

5. Step-by-step processes as numbered lists

Anything framed as "how to do X" and laid out as a genuine numbered sequence - each step self-contained, each stating what to do and why - got lifted regularly, especially into AI Overviews, which love ordered lists. The structural requirement is that each step stands on its own. A step that reads "next, do the thing we discussed above" cannot be lifted; a step that reads "3. Audit faceted navigation for near-duplicate URLs, because filter combinations quietly inflate crawlable pages" can.

The failure mode was process content written as flowing narrative - the steps were in there, but tangled into paragraphs the model couldn't cleanly extract. Same information, wrong structure, no citations. Rewriting the same content as a real numbered list, with each step complete in itself, was often all it took.

6. Genuine FAQ blocks

FAQ sections were one of our most-cited formats, which surprised no one once we saw why: a question followed by a short, complete answer is already the exact unit an assistant wants. The catch is the word genuine. Fake FAQs - questions invented to stuff keywords, answered in one vague sentence - got ignored completely. The FAQs that got cited used the real questions buyers ask, drawn from search data and sales calls, each answered in two to five self-contained sentences.

Pairing that with FAQPage schema helps machines parse the structure, but the honesty and self-containment of the answers did the heavy lifting. Every page we publish now ends with a real FAQ block for exactly this reason.

How reliably each format earned an AI citationRelative citation frequency across nine months of tracking - not the topic, the structureOriginal data / benchmarksDirect-answer definitionsComparison tablesGenuine FAQ blocksStep-by-step processesHonest buying guidesCITATION THRESHOLD - below this line, effectively never citedKeyword-stuffed listiclesUndifferentiated "ultimate guides"Opinion with no evidenceThin / gated contentNarrative that buries the answerIllustrative of the pattern we observed - relative, not absolute, frequencies
The split was structural, not topical: every format above the line shares one trait - a short, self-contained passage a model can lift and attribute.

The five formats that never worked

The failures were as instructive as the wins, because they killed some content we were proud of.

Keyword-stuffed listicles. "21 tips for X" pages that pulled decent traffic but never earned a single citation. They repeat the target phrase and gesture at breadth, but no single item is a crisp, complete answer - so there is nothing to lift. Traffic without citation is the signature of this format.

Undifferentiated "ultimate guides." Long, competent posts that restate what a hundred other pages already say. They are not wrong; they are just not distinctive. An assistant synthesising an answer has no reason to name a source that adds nothing the others don't. Length is not the moat we spent years assuming it was.

Opinion with no evidence. Thought-leadership pieces making confident claims with nothing behind them. Assistants are cautious about attributing claims they cannot verify, so a strong opinion unsupported by data, examples or a named source mostly gets skipped. The fix is not softer opinions - it is attaching evidence to the strong ones.

Thin or gated content. If the substance sits behind a form or is only three thin paragraphs, there is nothing for a model to retrieve. Gating remains a legitimate lead-gen tactic, but understand the trade: gated content forfeits AI citation entirely, because the assistant never sees the good part.

Narrative that buries the answer. This one stung, because we like writing this way. Posts that open with a long personal build-up before reaching the point often ranked fine but rarely got cited - the answer was in there, three scrolls down, tangled in story. The lesson wasn't to stop telling stories. It was to state the answer first, then tell the story underneath for the humans who stay.

The rule underneath all of it

Once we stopped looking at format labels and looked at what the cited pages physically contained, the whole thing collapsed into one rule: an assistant cites a passage it can lift, verify and attribute without needing the rest of the page. Every winning format is just a different way of producing that passage. Every losing format fails to produce it, or buries it.

A liftable passage has four properties. It is short - two to four sentences. It is self-contained - it makes sense with nothing above or below it. It is factual or specific - a claim with a number, a definition, a clear comparison, not a mood. And it is attributable - ideally something only your page says, so the model must name you. Original data hits all four at once, which is why it wins. A well-structured definition hits three. A keyword-stuffed listicle hits none.

Anatomy of a passage an LLM will citeTHE LIFTABLEPASSAGESHORT2 - 4 sentencesSELF-CONTAINEDneeds no contextSPECIFICnumber / definitionATTRIBUTABLEonly your page says itCITED - answer stated first"AEO is optimising content so AIassistants can lift and cite a cleananswer. It differs from SEO becauseit targets extraction, not ranking."IGNORED - answer buried"In today's fast-evolving landscape,marketers face many challenges. Tounderstand AEO, we must first lookback at the history of search..."
Same topic, same facts - the only difference is whether the answer sits in a clean unit at the top or is buried under a build-up.

How to retrofit content you already have

You almost certainly do not need to start over. Most of the value in our own tracking came from restructuring pages that already ranked, not from writing new ones. The retrofit is fast:

  1. Pull your top 20 pages by traffic and ask, for each, whether an assistant could lift a clean answer from the first screen. If the answer is buried, that page is leaking citations.
  2. Add a "short answer" block near the top of each - two to four sentences that state the page's core answer completely, before any context. This alone moved previously-ignored pages into the cited set for us.
  3. Convert any comparison into an actual table. If a page argues "A vs B" in prose, restructure it into rows and columns with honest values on both sides.
  4. Turn buried processes into numbered lists where each step is self-contained and states its own why.
  5. Add a genuine FAQ block using the real questions buyers ask - not invented ones - answered in two to five standalone sentences, with FAQPage schema on top.
  6. Publish one piece of original data you already have sitting in a spreadsheet. This is the single highest-leverage new asset most brands can ship this quarter.

None of this requires more words. Most of it requires fewer, arranged better. That is the whole reframe: content marketing in the AI era is an editing discipline as much as a writing one, and the technical structure underneath - clean headings, valid schema, crawlable text - is what lets machines find the good passage once you have written it.

What we'd tell our past selves

Three mistakes cost us the most time before the pattern was obvious.

We optimised for length when we should have optimised for extractability. The 3,000-word guide felt like the safe bet; it usually wasn't. A tight page with one original number beat it every time.

We treated schema as the citation lever. It isn't. Schema helps a machine parse a good answer, but it never rescued a page whose answer was buried or thin. Structure the passage first, then add markup to content that already deserves citing.

And we assumed ranking would carry us into AI answers. It doesn't reliably. We now treat "does this rank" and "can this be cited" as two separate questions with two separate checklists - which is the core distinction we unpack in SEO vs AEO vs GEO, and the reason we run answer-engine and AI SEO work as a discipline alongside classic SEO rather than assuming one delivers the other.

Tools, KPIs and how to keep score

You cannot manage what you do not measure, and AI citations do not appear in Search Console. Build the scoreboard yourself:

  • Citation rate. Of a fixed set of 20 to 30 buyer questions, in what share does an assistant name your brand or page? Track it monthly across ChatGPT, Perplexity and AI Overviews.
  • Share of voice. For those same questions, how often are you cited versus named competitors? This is the number that tells you whether you are winning or just present.
  • Passage-level signal. Log which specific passage gets quoted. Over time this tells you exactly what to write more of - it is the fastest feedback loop we have found.
  • Rank-vs-cite gap. Flag pages that rank but are never cited. Each one is a fast retrofit waiting to happen.

The mechanics of building this are in our AI citation tracking method, and the wider strategic picture - why brands rank yet stay invisible in assistants - is in the AI search gap. If you want the cluster-level view of how these citable pages fit together into topical authority, we covered that in content silos that rank and get cited by AI, and the tactical, engine-by-engine detail lives in how to rank on ChatGPT and how to rank on Perplexity.

The bottom line

After nine months of watching which of our pages got cited and which got ignored, the lesson is smaller and more useful than we expected. LLMs do not cite topics, authors or word counts. They cite passages - short, self-contained, specific, attributable units they can lift and stand behind. Six formats produce those passages naturally: original data, direct-answer definitions, comparison tables, honest buying guides, numbered processes and genuine FAQs. Five formats reliably don't, no matter how much traffic they pull. The work is not to write more. It is to make sure every page you already have contains at least one passage worth quoting - and, ideally, one number only you can provide.

If you want a clear read on which of your pages are getting cited, which rank but stay invisible, and which formats to build next, that diagnosis is exactly what our team runs. Talk to us about an AI-era content and citation audit. It is the same process behind our AI SEO services and the wider SEO programmes we run for brands that want to be the answer, not just a result.

Aditya Kathotia

Aditya Kathotia

Founder & CEO

CEO of Nico Digital and founder of Digital Polo, Aditya Kathotia is a trailblazer in digital marketing. He's powered 500+ brands through transformative strategies, enabling clients worldwide to grow revenue exponentially. Aditya's work has been featured on Entrepreneur, Economic Times, Hubspot, Business.com, Clutch, and more.

Want to explore working together?

Let's talk about how we can grow your digital presence and increase inbound business.

WhatsApp