Why the Same Page Gets Cited 18% on ChatGPT and 0% on Perplexity: The Citation Asymmetry Problem

This post represents my personal views and not those of Profound.

AI visibility is not one number, and the dashboards selling you a single AEO score are flattening a problem that is structurally different on every engine. I’ve spent the last two years inside this category, and the question I get more than any other is some version of: why does this page rank #1 on ChatGPT and not appear at all on Perplexity? The answer isn’t a bug, a freshness issue, or a content-quality gap. It’s that the eight engines we track at Profound are not variants of the same search index. They are different retrieval systems with different rewriting behavior, different source taste, and different verification heuristics. Treating them as one funnel is the most expensive mistake mid-market AEO programs make.

This essay is the long version of an argument I keep making in customer calls: citation share is the new market share, but only per engine. Roll it up and you’ve built the GDP of your AEO program: interesting at the macro level, useless as a strategy input.

TL;DR: five reasons engines disagree on the same page

  • Per-engine retrieval differs before ranking: ChatGPT fans a single prompt into 91% unique queries; Perplexity stays 88% literal. They’re not searching the same thing.
  • Popularity taste varies by engine: ChatGPT cites 100K+ view YouTube videos 18.2% of the time; Gemini reaches into the <1K view long tail 28.6% of the time.
  • ChatGPT triangulates with co-citation pairs: Edmunds and KBB get cited together 32% of car queries; Glassdoor and Indeed 29% of career queries. Other engines don’t pair this way.
  • Language and geography widen the gap: Cross-lingual Jaccard overlap of cited hostnames drops to 0.15 on Gemini. 34% of cross-language prompt pairs produce zero hostname overlap.
  • Source-side formatting has a low ceiling: Profound’s Markdown vs HTML A/B across 381 pages produced ~16% lift in bot visits, not statistically significant.

If you read nothing else, internalize this: measure each engine separately, pick two to win on, and accept that the third will look worse for structural reasons you can’t write your way out of.

The premise everyone gets wrong: “AI visibility” is not one number

The industry default (the one most AEO dashboards quietly enforce) is that you sum citations across engines, divide by something, and call it a visibility score. It’s a tidy number. It’s also misleading in the same way that averaging your performance across Google, Bing, and DuckDuckGo would have been misleading in 2015: it hides the only signal that matters, which is where your customers actually are and whether you show up there.

Profound’s own data shows that ChatGPT’s AI search results overlap with Google’s traditional search by about 12%. That’s not a small delta. That’s a different internet. When the cited-source overlap between two engines is that low, any unified metric you build on top of them is averaging two distributions that don’t share a tail. You end up with a number that moves smoothly while the underlying reality (which engine is sending you traffic, which one is repeating a competitor’s framing of your category, which one your buyers actually use) moves in directions the score can’t see.

I want to be precise about the claim. I’m not saying engines are random relative to each other. They cluster: Wikipedia anchors most of them, Reddit shows up everywhere, and a handful of vertical authorities recur. The point is that the next layer down (which specific page, which specific YouTube video, which specific LinkedIn post) diverges so sharply that an aggregate misses the entire optimization surface. The work happens at the per-engine level, or it doesn’t happen at all.

The asymmetry starts before retrieval: engines rewrite your prompt differently

Here’s the part most people miss: by the time an AI engine fetches a single web page, it has already decided what to search for, and that decision is engine-specific. The user types one prompt. The engine rewrites it (sometimes into one query, sometimes into a dozen) and the rewrite logic is wildly different across systems. This is upstream of every content optimization you can do.

Profound ran 10,000 prompts across ChatGPT, Copilot, and Perplexity over two weeks in March and April 2026 to measure exactly this. The results don’t look like minor variance; they look like three different products that happen to share a UI metaphor.

Engine% Unique fanout queries% Original prompt words keptRetrieval behavior
ChatGPT91%13%Reformulates and expands; casts wide net
Copilot47%50%Compresses to shorter strings, adds adjectives
Perplexity14%88%Near-literal; behaves like classic search

Now layer in what each engine changes when it rewrites. Copilot adds adjectives in 47% of rewrites and reshapes the user’s constraints in 24%. Perplexity reshapes constraints in 40% of cases. ChatGPT only adds adjectives in 22% of rewrites but expands the prompt in other ways. Every engine touches brand mentions to some degree: ChatGPT in 11% of rewrites, Copilot and Perplexity in 22 to 24%. The retrieval query rarely matches what the user typed, and the ways it differs are not consistent across engines.

Think about what this means if you’re optimizing a page. You can write the cleanest, best-structured answer to the literal question your customer would ask. ChatGPT will go searching for a reworded, expanded version of that question and may find a page that answers the expansion, not the original. Perplexity will search for something close to your customer’s exact wording. Copilot will search for a compressed, possibly adjective-loaded version. Three different engines, three different retrieval sets, before any quality signal on your content gets evaluated.

The asymmetry isn’t in the ranking. It starts before ranking exists.

📊 The retrieval-side takeaway: If your category gets long, descriptive prompts (“best CRM for a 50-person sales team selling to mid-market healthcare”), Copilot’s aggressive adjective layering changes which pages it sees. If your category gets brand-direct prompts, the rewrite behavior collapses and the engines start to converge. Your prompt mix determines how much the rewrite layer matters to you.

The cleanest illustration of engine-level taste is YouTube, because both ChatGPT and Gemini cite the same source domain, and they disagree on what to surface from it. Profound compared 10,250 YouTube citations from each engine between late December 2025 and late January 2026. Same domain, same time window. Completely different selection criteria.

Engine% citing 100K+ view videos% citing <1K view videos% citing 1K to 100K view middle
ChatGPT18.2%20.3%61.5%
Gemini12.0%28.6%59.4%

ChatGPT skews toward popularity. Gemini over-indexes on long-tail, niche content. This is the kind of asymmetry that no “AI visibility score” can survive. Imagine you’ve published a thoughtful 600-view YouTube tutorial on a specific niche workflow. On Gemini, that video is part of the eligible long tail and stands a real chance of getting pulled into responses. On ChatGPT, you’re competing against 100K+ view videos that the model treats as more credible-by-popularity. The same asset is differentially valuable depending on which engine your audience lives in.

A unified visibility number tells you none of this. A per-engine breakdown tells you to make different YouTube bets for different audiences.

The reason this matters beyond YouTube is that the same logic almost certainly applies to other domains where engines have access to engagement signals. Reddit thread upvotes, LinkedIn post reactions, news article share counts: wherever popularity is observable, engines make different decisions about how much to weight it. ChatGPT seems to use popularity as a proxy for credibility more aggressively than Gemini does. That’s not a fact about your content. It’s a fact about the engine’s training and ranking, and you can’t write your way out of it. You can only choose which game you’re playing.

Co-citation patterns: ChatGPT triangulates, others don’t

The other reason the same page gets very different fates on different engines is that ChatGPT validates answers by pairing sources, and most other engines don’t, at least not as visibly. Across 700,000+ U.S. English ChatGPT conversations between October and December 2025, Profound found that about 18% of conversations trigger at least one web search, and the cited sources cluster in distinctly vertical pairs.

VerticalCo-cited pairCo-citation rate
Auto researchEdmunds + KBB32%
Career researchGlassdoor + Indeed29%
Real estateRedfin + Zillow28%
TravelKayak + Expedia21%
NewsAP + Reuters15%

Wikipedia sits underneath all of it as the default knowledge layer, appearing in roughly 1 in 6 cited conversations. ChatGPT is doing something close to source triangulation: when it cites one vertical authority, it tends to cite the closest competitor alongside it for verification.

🔍 The strategic consequence: On ChatGPT, your citation outcomes are partially coupled to your nearest competitor’s citation outcomes. If you’re a Zillow competitor and Zillow is being cited, you have a structural opportunity to ride along as the co-citation pair. If you’re not the obvious second name in your vertical, ChatGPT is harder to break into than a single-source engine would be. The model is looking for a pair, and the pair slot is contested. On engines without this triangulation behavior, the dynamics are simpler: be the single best answer for the query and you get pulled in.

The other piece of this is the well-known earned-media skew. Across 27 million answer-engine prompts and responses Profound has analyzed, Tier-1 publishers (Forbes, AP, Bloomberg, the prestige outlets) account for only 2.6% of citations. The other 97.4% comes from everything else: vertical specialty sites, Reddit threads, LinkedIn posts, niche YouTube channels, mid-tier blogs. When a brand-specific prompt enters the system, social citations roughly triple (rising from 5.4% to 15%), and Reddit alone takes 8% of citations on brand prompts. The takeaway competitors often miss is that the Tier-1 PR play, which costs the most and is hardest to influence, is the smallest slice of the pie. The 97.4% slice is where ChatGPT’s co-citation pairing actually plays out, and it’s where AEO programs have leverage. For a deeper read on which tools instrument this layer, see my breakdown of the best AEO tools ranked by citation share.

Language and geography make the gap wider, not narrower

If you think the cross-engine gap is bad in English, watch what happens when the prompt shifts language. Profound ran 300 matched English-Spanish prompt pairs across ChatGPT and Gemini over fifteen days in April 2026 and measured Jaccard similarity of cited hostnames.

Language pairJaccard overlap (ChatGPT)Jaccard overlap (Gemini)
English to Spanish (matched prompts)0.33 to 0.340.15 to 0.17
US English to Spain English~0.40 to 0.50~0.30 to 0.35
English to Spanish, zero-overlap rate~12% of pairs34% of pairs

In Gemini specifically, 34% of cross-language prompt pairs produced zero hostname overlap. The English version and the Spanish version of the same question cited entirely different sets of websites. Language drives more divergence than geography. English-vs-Spanish gaps were larger than US-vs-Spain gaps within the same language. That’s a structural fact about how each engine handles multilingual retrieval, and it lands directly on the desk of any global brand running AEO across markets.

The language gap also shows up in citation rates against an English baseline. In Google AI Overviews, where the English (US/UK/AU) social citation baseline sits at 15.5%, Brazilian Portuguese over-indexes by roughly 12 percentage points and Mexican/Spain Spanish by about 5. Italian and German underperform English by 2 to 3 points, French by 5, Japanese by 7, Arabic by 9, Swedish by 12. In ChatGPT Overviews, no non-English market exceeds the English baseline at all. Every language underperforms, with Portuguese and Spanish coming closest and Swedish/Arabic falling 6 to 7 points behind.

What this means in practice: if you’re a global brand reading a single AI visibility score, you’re seeing an average of markets that are structurally over-cited and structurally under-cited, on engines that disagree with each other about which hostnames are even eligible. The right unit of analysis is engine × language × market, not a global number. I’ve seen mid-market customers waste a quarter trying to “raise their AEO score” before realizing that their Mexican market was already at parity on Google AI Overviews and their Swedish market was facing a -12% structural headwind no amount of content work would close quickly.

Why the obvious fix (write better content) doesn’t close the gap

The instinctive response to all of this is: fine, then we write better content. Cleaner structure, fresher data, schema everywhere, Markdown for the crawlers. That instinct isn’t wrong, but it has a ceiling lower than most AEO consultants will admit.

Profound A/B tested serving Markdown versus HTML to LLM crawlers across 381 pages over three weeks. Markdown produced about 16% higher mean bot visits, and the result was not statistically significant. The median page gained roughly one extra bot visit. The lift that did exist concentrated in pages already at the 60th percentile of traffic or above; the median page barely moved. Serving LLMs a cleaner format does not, on its own, produce reliable citation lift. That’s a finding I’d have bet against before the test ran, and it’s the single best illustration of the limit on source-side optimization.

The asymmetry between engines isn’t primarily driven by how your content is formatted. It’s driven by upstream retrieval choices (prompt rewriting, source weighting, co-citation logic, language handling) that you don’t control. You can write the cleanest answer in your category and still be invisible on Perplexity because Perplexity issued a literal version of the user’s prompt and your page didn’t match those exact strings; meanwhile, ChatGPT expanded the prompt into something your page does answer and pulled you in at #1. Same content, two outcomes, and the variable is upstream of you.

There are still real moves on the content side that change which engines you show up on. LinkedIn is the cleanest recent example. Between November 2025 and February 2026, LinkedIn’s citation rank on ChatGPT moved from #11 to #5, and the mix shifted: profile-page citations fell from 33.9% to 14.5% while published-post citations climbed from 20.9% to 26%, and long-form article citations rose from 6% to 8.9%. Combined published content went from 26.9% to 34.9% of LinkedIn’s citation share. That’s not a passive shift. Publishing posts and articles on LinkedIn changed what ChatGPT cited from the domain. It’s a real lever, but notice that the lever moves citations on ChatGPT specifically. The same publishing strategy isn’t guaranteed to move the needle the same way on Perplexity or Gemini. Even your wins are engine-specific.

The brand-prompt vs open-ended split changes which engine matters

One more layer, and it’s the one most mid-market AEO programs underweight: the query type mix in your category determines which engines actually matter. Brand-direct prompts and open-ended prompts trigger completely different engine behaviors, and the gap is large enough to flip your strategy.

Profound’s analysis of 3,380 prompts on ChatGPT 5.4 in March 2026 found that 40% of brand-direct prompts include at least one site: query directed at the brand’s domain, and 71% of those site: queries appear within the first two searches. Open-ended prompts trigger site: queries only 16% of the time, and when they do, only 42% appear in the first two searches. When the user names your brand, ChatGPT goes straight to your site early in the retrieval process. When the user doesn’t, ChatGPT roams.

Shopping behavior shows the same split even more starkly. Across roughly 2 million prompts on ChatGPT from September 2025 through January 2026, open-ended prompts triggered the Shopping carousel 12.1% of the time, while brand-direct prompts triggered it only 3.1%, a 4x gap. And whether Shopping triggers at all is almost entirely a function of product category:

  • Apparel: 5.2x baseline Shopping trigger rate
  • Physical products: 4.7x baseline
  • Consumable grocery: 2.5x baseline
  • Health and medical: 0.5x baseline
  • Travel: 0.05x baseline
  • Service-professional: 0.04x baseline
  • Software/SaaS: effectively zero
  • Financial products: literally zero

A simple category-only checklist reproduced ChatGPT Shopping trigger behavior with 95 to 97% accuracy.

The strategic implication is that your category and your query mix decide which engine surfaces actually matter for you. If you sell apparel and most of your queries are open-ended (“good waterproof jacket for backpacking”), ChatGPT Shopping is a real channel and the buy-link carousel is a real battleground; Walmart and Target alone capture 14% of all offer links there. If you sell B2B SaaS, ChatGPT Shopping is a non-event regardless of what you do, and your AEO program lives entirely on the citation side of the surface. Same engine. Completely different strategic posture depending on where your prompts land.

This is why I keep telling customers: don’t ask “how do I rank on ChatGPT.” Ask “how do I rank on the specific surface within ChatGPT that my prompts trigger,” and then ask the same question for every other engine in your stack. The surface is the unit, not the engine.

What I’d do with this if I were running AEO at a mid-market brand

💡 If I were running an AEO program at a mid-market company today, here’s the order I’d work in:

  1. Map the prompt mix first. Not the prompts I want my customers to ask, but the prompts they actually ask. Brand-direct, open-ended, comparative, transactional. The mix determines which engines and which surfaces within those engines are doing the work. Profound’s Prompt Volumes dataset exists specifically for this, and it’s the step most programs skip because it feels like research rather than action. It’s the only step that makes the later actions non-random.

  2. Pick two engines to optimize for and accept that the third will look worse. Mid-market AEO programs don’t have the resources to win on eight surfaces, and the asymmetry between engines means optimizing for all of them simultaneously usually produces no real movement on any of them. Pick based on your prompt mix. If your category is dominated by open-ended commercial intent in a Shopping-eligible product class, ChatGPT and Google AI Overviews are likely your two. If your category is professional-services research, ChatGPT and Perplexity make more sense. Be honest about the tradeoff.

  3. Measure per-engine citation share separately and ban any internal report that averages them. The averaged number masks the only signal that matters: are you gaining or losing on the specific engine where your customers are. If your ChatGPT citation share is climbing and your Perplexity share is flat, that’s not “AI visibility up.” That’s two different stories that need two different decisions. The tooling side of this lives in posts like best AI visibility optimization platforms.

  4. Treat content as engine-specific, not engine-agnostic. Publishing on LinkedIn moves the needle on ChatGPT. Reddit thread participation moves the needle wherever Reddit gets cited (which is mostly ChatGPT and Google AI Overviews, not Perplexity). YouTube long-tail content has different value on Gemini than on ChatGPT. The format-choice decisions cascade from the engine-choice decisions, not the other way around.

  5. Accept that some of this is structural and won’t change in a quarter. If you’re operating in Swedish and Google AI Overviews under-cites Swedish content by 12 points against the English baseline, no amount of content work fixes that within the calendar year. You optimize within the constraint and report against the engine-specific baseline, not against a fantasy of parity.

Citation share is the new market share, but only per engine

The phrase I keep coming back to is citation share is the new market share. I still believe it. But I want to add the qualifier the dashboards quietly drop: only per engine. An aggregate AI visibility number is the GDP of your AEO program: a useful headline figure for a board slide, an actively misleading input for strategic decisions. Treat it the way you’d treat “global revenue” without a market breakdown: a summary, not a plan.

The eight engines we track at Profound are not eight versions of the same product. They’re eight different retrieval systems that happen to share a chat interface. They rewrite prompts differently. They weight popularity differently. They co-cite differently. They handle language differently. They surface Shopping differently. They cite YouTube videos with view counts an order of magnitude apart. The same page is a top-cited source on one and invisible on another, and the reasons are upstream of anything your content team can change in a sprint.

If your brand can be summarized, it can be cited, but it will be cited differently in eight different places, and your job as an operator is to know which of those places matters to your buyers and to measure each one on its own terms. The surface is the unit, not the engine. The asymmetry isn’t a problem to be solved. It’s the shape of the surface you’re optimizing for. The sooner an AEO program internalizes that, the sooner it stops chasing a unified score and starts winning on the engines that actually convert.

Frequently Asked Questions

Why does the same page get cited on ChatGPT but not Perplexity?

The two engines use different retrieval architectures. ChatGPT generates 91% unique fanout queries from a single prompt, casting a wide net. Perplexity keeps 88% of the original prompt words and runs roughly one query per prompt. They’re searching different things, so they find different things. A page that answers the expanded version of a prompt shows up on ChatGPT; a page that matches the literal prompt wording shows up on Perplexity.

Should I track AI visibility as one combined score?

No. A unified score averages across engines whose retrieval behavior, source preferences, and prompt rewriting patterns are structurally different. Per-engine citation share is the metric that matches the underlying reality. Combining them produces a number that moves smoothly while the underlying engines move in directions the aggregate can’t see.

How different are AI engines from each other in practice?

Profound’s cross-lingual analysis found Jaccard hostname overlap between ChatGPT and Gemini ranging from 0.15 to 0.34 across matched prompt pairs. That means even when engines are asked the same question, fewer than 1 in 3 cited domains overlap. On Gemini specifically, 34% of cross-language prompt pairs produce zero hostname overlap at all.

Does serving Markdown to LLM crawlers improve citations?

In Profound’s A/B test across 381 pages over 3 weeks, Markdown produced about 16% higher mean bot visits than HTML, but the result was not statistically significant. Source-side formatting tweaks have real limits. The asymmetry between engines is upstream of your content. The lift that did appear concentrated in pages already in the top 40% of traffic; median pages barely moved.

See Also