Over the last year I tested how EU products are recommended across ChatGPT, Gemini, Perplexity, Claude and ChatGPT Shopping — the platforms increasingly shaping how customers decide what to buy.
Each test was repeated across multiple query types and user conditions to isolate how platforms evaluate products.
This is not a future concern. AI platforms are already the first stop for millions of EU consumers making purchase decisions. What they recommend is determined by signals most brands have never measured.
Your paid acquisition playbook was built for a search-based world. The infrastructure for AI-mediated commerce is already built. ChatGPT, Gemini, and Perplexity are not coming — they are here. The question is what they say about your products when your customers ask.
I tested it. These are six things I found.
Each finding traces to a specific query, a specific platform, a specific result. I publish only what I can evidence directly from observed tests.
The product was the top AI recommendation for one type of buyer and explicitly listed as a product to avoid for another. The only thing that changed was a single phrase.
Every EU product with a performance specification — speed, capacity, weight limit, range — is exposed to this dynamic. AI platforms do not treat technical limits as capabilities. They treat them as risk signals. The product operating "at maximum" loses to the product with "headroom." Your product descriptions are making that judgement before a single customer asks.
I tested a €700–920 German-market treadmill with a listed maximum speed of 20 km/h. For budget queries it dominated — top result on ChatGPT Shopping, 7.3/10 average across platforms. For performance queries, coverage dropped to 14%. On Claude, the same product ranked #1 in the budget context appeared under "Models to avoid despite 20 km/h claims." Reason given: "Exactly at threshold. No headroom."
The manufacturer had no visibility into this. Good reviews, solid distribution, competitive price. Every traditional metric said it was performing. The AI-mediated channel carrying its highest-intent buyers was routing them elsewhere.
| Platform | Budget query | Performance query |
|---|---|---|
| ChatGPT Reasoning | 7/10 · Recommended | 0/10 · Excluded |
| ChatGPT Shopping | 10/10 · Top result | 9/10 · One run only |
| Gemini | 9/10 · Near-top | 0/10 · Excluded |
| Perplexity | 9/10 · Recommended | 0/10 · Excluded |
| Claude (Incognito) | 7/10 · Recommended | 0/10 · Excluded |
| Claude (Contextualised) | 9/10 · #1 pick | 0/10 · Listed to avoid |
Tests conducted December 2024 – January 2026. ChatGPT Shopping tested across two independent runs. Claude tested in incognito and contextualised sessions.
The product had a top-tier certification, wide distribution, and a competitive price. AI platforms recommended everything around it — not because it was worse, but because its signals were weaker.
This is the finding that matters most to brand and category managers measuring success through retail KPIs — distribution, shelf space, certification — while AI visibility is being determined by an entirely different set of criteria. Quality and distribution are necessary. They are no longer sufficient.
I tested a DM own-label baby face cream — "sehr gut" (very good) ÖKO-TEST rating, dermatologically tested, hypoallergenic, priced at €0.95–€2.45, available in over 900 DM stores. Under "best baby cream for sensitive skin in Germany", it appeared on none of the five platforms I tested. Not in position 10. Not mentioned in passing. Not there.
I compared it against alverde — another DM own-label product, same retailer, similar price. alverde appeared consistently at around 60% query robustness. The difference was not brand type. alverde holds a top-5 ÖKO-TEST ranking, an "ÖKO-TEST winner" badge on dm.de, and standalone editorial coverage. The product I tested had a mid-tier "sehr gut" and a standard listing.
Any brand investing in product quality without a parallel strategy for the external signals AI platforms index — third-party editorial coverage, top-tier certification rankings, expert endorsement framing — is building something AI cannot see. Your customers using AI to decide what to buy cannot find you.
Five platforms processed the same query. Four recommended the same brand in their top three results. One issued an active safety warning. All five were correct — based on the data each had access to.
For established brands with high market visibility, this is the most uncomfortable finding. The brands most at risk are not the ones with low AI coverage. They are the ones with high coverage and unresolved product quality signals. High visibility alongside an active safety warning on one authoritative platform is not a neutral position. It is a liability that compounds as AI adoption grows.
Penaten is one of Germany's most recognised baby care brands — 80–100% platform coverage, above 70% query robustness. By every standard metric, it performs well in AI-mediated channels. But Penaten Baby Pflegecreme received a "mangelhaft" (deficient) rating in ÖKO-TEST 2024/2025, with confirmed contamination markers: MOAH, paraffin, BHT, and microplastics. The issue had been documented since 2020. No reformulation followed.
| Platform | Penaten recommended? | Safety warning? |
|---|---|---|
| ChatGPT Shopping | Yes · Top 3 | Context-dependent |
| ChatGPT Reasoning | Yes · Recommended | No warning |
| Gemini | Visible with caveat | Yes · Active warning |
| Perplexity | Yes · Top results | No warning |
| Claude | Yes · Top 3 | No warning |
Each platform draws on different source data and weights safety signals differently. Neither approach is wrong. But they produce contradictory outputs for the same product, in the same market, on the same day. Knowing your aggregate coverage tells you nothing about what the most safety-conscious platform — the one your most risk-aware customers may trust most — is telling them.
The same product required four simultaneous conditions to become visible to AI. Removing any one of those four conditions returned zero results across every platform I tested.
The buyers in your addressable market are not equally informed. A small fraction already know your product, your retailer, your exact category term. The majority are still forming their preference. AI-mediated discovery is where that majority now goes first. If you only appear for buyers who already know you, AI is not a customer acquisition channel for you. It is a late-stage confirmation tool for existing customers.
| Query tested | Conditions met | Product visible? |
|---|---|---|
| Best baby cream for sensitive skin in Germany | 2 conditions | No · All platforms |
| Best baby cream fragrance-free for sensitive skin in Germany | 3 conditions | No · All platforms |
| Best baby cream for sensitive skin at DM in Germany | 3 conditions | No · All platforms |
| Best baby face cream for sensitive skin at DM in Germany | 4 conditions | Yes · 3 of 5 platforms |
The four conditions: category precision (face cream, not just cream), use case (sensitive skin), retailer (DM), and geography (Germany). All four, simultaneously. The word "face" alone — one word — was the difference between visibility and invisibility.
A brand can have strong AI visibility in German and be functionally invisible in French, Dutch, or English queries for the same product on the same platform. The platform is not translating your visibility — it rebuilds it from scratch in each language.
I tested the same product queries in German, English, and other EU language variants on the same AI platforms. Recommendation sets changed substantially across languages — up to 80% divergence in which products were recommended.
AI platforms draw on different training data, different editorial sources, and different trust signals depending on the language of the query. German testing bodies — ÖKO-TEST, Stiftung Warentest — carry significant weight in German-language results. They are largely absent from English-language outputs. The language of the query determines which signals the platform weights. The geography of the sale does not.
The difference was not in the question. It was in what Claude already knew. The same prompt produced completely different recommendations depending on whether the session was logged-in or incognito — because one session carried personal context, and the other did not.
Using the same prompt in Claude produced completely different recommendation sets depending on session context. In a logged-in session, Claude used prior knowledge about the user to infer budget sensitivity — the user was building a consulting business, and Claude inferred cost-consciousness from that context — and recommended more cost-conscious products. In incognito mode, without that context, it shifted towards premium recommendations.
The difference was not caused by the prompt. It was caused by session-level personal context. When asked directly, Claude confirmed this: the recommendations were shaped by what it knew about the user, not by what the user had asked.
This finding opens three important conversations at once.
Methodology. Logged-in testing and clean-session testing are not interchangeable. A brand that tests only in one condition can draw the wrong conclusion about its visibility and recommendation behaviour. Both conditions must be tested to understand the full range of outcomes.
Commercial strategy. AI recommendations may increasingly depend not only on product signals, but on inferred buyer profile and economic assumptions. The same brand may be recommended differently to different people even when the explicit intent is identical.
Governance. If an AI system uses inferred personal context to shape product recommendations, questions of transparency, fairness, and explainability follow. A consumer who does not know their prior session history is shaping the products they are shown has no way to challenge that inference. Under EU frameworks, inferred economic profiling of this kind is not a neutral technical feature. It raises questions around transparency obligations, the right to explanation, and commercial influence over purchasing decisions.
All findings trace to documented platform tests on real EU products, conducted between December 2024 and January 2026. I test across query types, user contexts, platform conditions, and language variants. I publish only what I can evidence directly from observed results — not from platform documentation, not from projections.
What AI platforms claim to do and what they demonstrably do are not always the same thing. My methodology was built to find the difference.
I tested products I chose. The findings apply to any EU brand selling through channels where AI now influences the purchase decision. If you want to know your actual visibility profile — which platforms include you, which exclude you, and what is driving each outcome — that is exactly what my assessment delivers.
Maria Berrio · Lex Agentica · Munich, Germany