Primary Research · EU E-Commerce · AI-Mediated Commerce

What happens when your customer
asks an AI what to buy —
and your brand doesn't appear?

Over the last year I tested how EU products are recommended across ChatGPT, Gemini, Perplexity, Claude and ChatGPT Shopping — the platforms increasingly shaping how customers decide what to buy.

Each test was repeated across multiple query types and user conditions to isolate how platforms evaluate products.

75+ systematic tests
38 EU products analysed
5 AI platforms: ChatGPT, Gemini, Perplexity, Claude, ChatGPT Shopping
The Threshold Trap Quality Without Visibility The Platform Contradiction The Precision Paradox The Language Divide The Context Switch

"What if, in 18 months, our customers discover brands through AI agents — and we're invisible?"

This is not a future concern. AI platforms are already the first stop for millions of EU consumers making purchase decisions. What they recommend is determined by signals most brands have never measured.

Your paid acquisition playbook was built for a search-based world. The infrastructure for AI-mediated commerce is already built. ChatGPT, Gemini, and Perplexity are not coming — they are here. The question is what they say about your products when your customers ask.

I tested it. These are six things I found.

Six findings. All from primary research.

Each finding traces to a specific query, a specific platform, a specific result. I publish only what I can evidence directly from observed tests.

1
Fitness Equipment · Germany · Customer Acquisition

Four words in a product description cost this brand 72% of its AI-mediated customers

The product was the top AI recommendation for one type of buyer and explicitly listed as a product to avoid for another. The only thing that changed was a single phrase.

Every EU product with a performance specification — speed, capacity, weight limit, range — is exposed to this dynamic. AI platforms do not treat technical limits as capabilities. They treat them as risk signals. The product operating "at maximum" loses to the product with "headroom." Your product descriptions are making that judgement before a single customer asks.

General query
74
Baseline visibility score
Budget query — under €1,000
86%
Platform coverage
Performance query — 20 km/h+
14%
Platform coverage

I tested a €700–920 German-market treadmill with a listed maximum speed of 20 km/h. For budget queries it dominated — top result on ChatGPT Shopping, 7.3/10 average across platforms. For performance queries, coverage dropped to 14%. On Claude, the same product ranked #1 in the budget context appeared under "Models to avoid despite 20 km/h claims." Reason given: "Exactly at threshold. No headroom."

Zero product overlap between the budget and performance recommendation sets on the same platform. The buyers with the highest purchase intent — those with a specific performance requirement — were being actively redirected to competitors.

The manufacturer had no visibility into this. Good reviews, solid distribution, competitive price. Every traditional metric said it was performing. The AI-mediated channel carrying its highest-intent buyers was routing them elsewhere.

The fix required no product change. "20 km/h maximum speed" becomes "20 km/h sustained running speed, 22 km/h peak capability." The signal changes from "at limit" to "has headroom." My model projects recovery from 14% to 70–75% platform coverage on performance queries from this language change alone.
PlatformBudget queryPerformance query
ChatGPT Reasoning 7/10 · Recommended
ChatGPT Shopping 10/10 · Top result 9/10 · One run only
Gemini 9/10 · Near-top
Perplexity 9/10 · Recommended
Claude (Incognito) 7/10 · Recommended
Claude (Contextualised) 9/10 · #1 pick

Tests conducted December 2024 – January 2026. ChatGPT Shopping tested across two independent runs. Claude tested in incognito and contextualised sessions.

2
Baby Care · Germany · Market Share

Certified. Stocked in 900 stores. Invisible to every AI platform I tested.

The product had a top-tier certification, wide distribution, and a competitive price. AI platforms recommended everything around it — not because it was worse, but because its signals were weaker.

This is the finding that matters most to brand and category managers measuring success through retail KPIs — distribution, shelf space, certification — while AI visibility is being determined by an entirely different set of criteria. Quality and distribution are necessary. They are no longer sufficient.

AI visibility — baseline
0%
All platforms, standard query
Query robustness
~10%
Visible in roughly 1 in 10 queries
ÖKO-TEST rating
"Sehr gut"
Certified, dermatologically tested

I tested a DM own-label baby face cream — "sehr gut" (very good) ÖKO-TEST rating, dermatologically tested, hypoallergenic, priced at €0.95–€2.45, available in over 900 DM stores. Under "best baby cream for sensitive skin in Germany", it appeared on none of the five platforms I tested. Not in position 10. Not mentioned in passing. Not there.

I compared it against alverde — another DM own-label product, same retailer, similar price. alverde appeared consistently at around 60% query robustness. The difference was not brand type. alverde holds a top-5 ÖKO-TEST ranking, an "ÖKO-TEST winner" badge on dm.de, and standalone editorial coverage. The product I tested had a mid-tier "sehr gut" and a standard listing.

Same retailer. Same price tier. Opposite outcomes. AI platforms require a product to be top-tier in at least one signal dimension to cross the visibility threshold. Being "good" across several dimensions produces the same result as having no signals at all.
Weleda Calendula ~70% query robustness
alverde Baby — DM own-label, ÖKO-TEST winner ~60% query robustness
Babylove — DM own-label, ÖKO-TEST "sehr gut" ~10% query robustness

Any brand investing in product quality without a parallel strategy for the external signals AI platforms index — third-party editorial coverage, top-tier certification rankings, expert endorsement framing — is building something AI cannot see. Your customers using AI to decide what to buy cannot find you.

3
Baby Care · Germany · Brand Risk

Recommended by four platforms. Warned against by the one with better data.

Five platforms processed the same query. Four recommended the same brand in their top three results. One issued an active safety warning. All five were correct — based on the data each had access to.

For established brands with high market visibility, this is the most uncomfortable finding. The brands most at risk are not the ones with low AI coverage. They are the ones with high coverage and unresolved product quality signals. High visibility alongside an active safety warning on one authoritative platform is not a neutral position. It is a liability that compounds as AI adoption grows.

Penaten is one of Germany's most recognised baby care brands — 80–100% platform coverage, above 70% query robustness. By every standard metric, it performs well in AI-mediated channels. But Penaten Baby Pflegecreme received a "mangelhaft" (deficient) rating in ÖKO-TEST 2024/2025, with confirmed contamination markers: MOAH, paraffin, BHT, and microplastics. The issue had been documented since 2020. No reformulation followed.

Gemini issued an active safety warning in my tests. It cited the ÖKO-TEST results directly and recommended customers choose alternatives. I verified this against the published 2024/2025 results. Gemini was factually correct. The other four platforms recommended Penaten in their top three in the same query.
PlatformPenaten recommended?Safety warning?
ChatGPT Shopping Yes · Top 3 Context-dependent
ChatGPT Reasoning Yes · Recommended
Gemini Visible with caveat Yes · Active warning
Perplexity Yes · Top results
Claude Yes · Top 3

Each platform draws on different source data and weights safety signals differently. Neither approach is wrong. But they produce contradictory outputs for the same product, in the same market, on the same day. Knowing your aggregate coverage tells you nothing about what the most safety-conscious platform — the one your most risk-aware customers may trust most — is telling them.

You need to know what each platform says individually. A single platform with accurate safety data and an active warning is doing more damage to your brand than four platforms with positive recommendations can undo. And you will not see it unless you test.
4
Baby Care · Germany · New Customer Acquisition

Your product only exists for customers who already know it exists

The same product required four simultaneous conditions to become visible to AI. Removing any one of those four conditions returned zero results across every platform I tested.

The buyers in your addressable market are not equally informed. A small fraction already know your product, your retailer, your exact category term. The majority are still forming their preference. AI-mediated discovery is where that majority now goes first. If you only appear for buyers who already know you, AI is not a customer acquisition channel for you. It is a late-stage confirmation tool for existing customers.

Query testedConditions metProduct visible?
Best baby cream for sensitive skin in Germany 2 conditions
Best baby cream fragrance-free for sensitive skin in Germany 3 conditions
Best baby cream for sensitive skin at DM in Germany 3 conditions
Best baby face cream for sensitive skin at DM in Germany 4 conditions Yes · 3 of 5 platforms

The four conditions: category precision (face cream, not just cream), use case (sensitive skin), retailer (DM), and geography (Germany). All four, simultaneously. The word "face" alone — one word — was the difference between visibility and invisibility.

"Face cream" and "care cream" route to different product databases on AI platforms. This is a category indexing problem, not a quality problem. And it is invisible without systematic testing across query variations.
Established products appear with 1–2 conditions. Fragile products need 4 or more. The gap between those numbers is your AI-mediated top-of-funnel exposure. It is the difference between AI working as a customer acquisition channel and AI working only for people who were already going to buy from you.
5
Multi-Language Testing · EU Markets · Market Expansion

Visible in German. Invisible in French. Same product, same platform.

A brand can have strong AI visibility in German and be functionally invisible in French, Dutch, or English queries for the same product on the same platform. The platform is not translating your visibility — it rebuilds it from scratch in each language.

80%
Divergence in product recommendations between English and German queries on the same platform, same product category
5
EU language variants tested, each producing a distinct recommendation set with limited overlap

I tested the same product queries in German, English, and other EU language variants on the same AI platforms. Recommendation sets changed substantially across languages — up to 80% divergence in which products were recommended.

AI platforms draw on different training data, different editorial sources, and different trust signals depending on the language of the query. German testing bodies — ÖKO-TEST, Stiftung Warentest — carry significant weight in German-language results. They are largely absent from English-language outputs. The language of the query determines which signals the platform weights. The geography of the sale does not.

Multi-market EU brands have as many AI visibility profiles as they have language markets. Your German visibility score tells you nothing about what French or Dutch buyers find when they ask the same question about your products.
Before entering a new EU language market, audit AI visibility in that language. The signals that make a product visible in one language corpus may not exist in another. Finding out after launch is significantly more expensive than finding out before.
6
Context Inheritance Effect · Session Testing · Claude · Governance

Same prompt. Different context. €1,185 apart.

The difference was not in the question. It was in what Claude already knew. The same prompt produced completely different recommendations depending on whether the session was logged-in or incognito — because one session carried personal context, and the other did not.

0%
Product overlap between logged-in session and incognito session recommendations. Same platform, same prompt, same day.
€1,185
Average price difference between the two recommendation sets. The logged-in session produced budget recommendations. Incognito produced premium ones.

Using the same prompt in Claude produced completely different recommendation sets depending on session context. In a logged-in session, Claude used prior knowledge about the user to infer budget sensitivity — the user was building a consulting business, and Claude inferred cost-consciousness from that context — and recommended more cost-conscious products. In incognito mode, without that context, it shifted towards premium recommendations.

The difference was not caused by the prompt. It was caused by session-level personal context. When asked directly, Claude confirmed this: the recommendations were shaped by what it knew about the user, not by what the user had asked.

AI systems may use inherited personal context to reshape recommendations even when the explicit prompt stays the same. A premium brand may be systematically suppressed for its most loyal, highest-intent customers — precisely because those customers are the most likely to be logged in and carrying the most context.

This finding opens three important conversations at once.

Methodology. Logged-in testing and clean-session testing are not interchangeable. A brand that tests only in one condition can draw the wrong conclusion about its visibility and recommendation behaviour. Both conditions must be tested to understand the full range of outcomes.

Commercial strategy. AI recommendations may increasingly depend not only on product signals, but on inferred buyer profile and economic assumptions. The same brand may be recommended differently to different people even when the explicit intent is identical.

Governance. If an AI system uses inferred personal context to shape product recommendations, questions of transparency, fairness, and explainability follow. A consumer who does not know their prior session history is shaping the products they are shown has no way to challenge that inference. Under EU frameworks, inferred economic profiling of this kind is not a neutral technical feature. It raises questions around transparency obligations, the right to explanation, and commercial influence over purchasing decisions.

Every AI visibility assessment needs both conditions tested. Clean-session results show what undeclared customers find. Logged-in results show what returning, high-intent buyers find. The gap between the two is your context inheritance exposure — and it is not visible without testing both.
Platform scope: This finding is specific to Claude. The other five findings are observed across multiple platforms.

About this research

All findings trace to documented platform tests on real EU products, conducted between December 2024 and January 2026. I test across query types, user contexts, platform conditions, and language variants. I publish only what I can evidence directly from observed results — not from platform documentation, not from projections.

What AI platforms claim to do and what they demonstrably do are not always the same thing. My methodology was built to find the difference.

5
AI platforms tested
38
EU products analysed
75+
Systematic tests conducted
4
Scoring pillars, P2V framework

What do AI platforms say about your brand when your customers ask what to buy?

I tested products I chose. The findings apply to any EU brand selling through channels where AI now influences the purchase decision. If you want to know your actual visibility profile — which platforms include you, which exclude you, and what is driving each outcome — that is exactly what my assessment delivers.

Book an Intro Strategy Call Explore Services

Maria Berrio · Lex Agentica · Munich, Germany