The analysis on this page is derived from AI platform responses, providing a detailed view of brand visibility and positioning

How do I find where AI is getting information about my brand?

Prompt

Last updated 15th September 2025

Domain count
writesonic.com
18
medium.com
16
authoritas.com
15
Citations count
27
0
17
10
Brands Discovered
44
26
15
8

Original answer

Short answer: you can’t always discover the exact documents a closed LLM was trained on, but you can (1) audit everything public about your brand, (2) monitor & search the web where models harvest data, (3) probe models and retrieval systems to see what sources they cite or retrieve, and (4) use contractual, technical and takedown options to limit or remove content. Below is a practical, ordered plan you can run right now.

  1. Do an immediate inventory of public content
  • Crawl and index your own properties (website, blogs, docs, help center, product pages, press releases). Export the text.
  • Collect third‑party references: partner pages, distributors, affiliates, guest posts, press coverage.
  • Gather social posts, forum threads, reviews, FAQ entries, job listings, public spreadsheets, and images.
  1. Search the web where models get training data
  • Use search engine queries to find mentions:
    • site:example.com "YourBrand"
    • "YourBrand" review
    • "YourBrand" leak OR "YourBrand" password OR "YourBrand" API key
  • Check:
    • News sites, blogs, aggregator sites
    • Forums and Q&A (Reddit, StackExchange, product forums)
    • Review sites (Trustpilot, G2, Yelp)
    • Public code repositories and paste sites (GitHub, GitLab, Pastebin)
    • Web archives (Wayback Machine)
  • Set up continuous alerts: Google Alerts, Talkwalker Alerts, Mention.
  1. Use social listening and backlink tools
  • Tools (free/paid): Google Search Console, Bing Webmaster, Ahrefs, SEMrush, Moz, Mention, Brandwatch, Meltwater, Talkwalker.
  • What to look for: backlinks that copy large text blocks, forum threads with detailed info, unattributed transcripts or manuals.
  1. Probe models and retrieval systems
  • For public LLMs that provide citations, ask them to cite sources for statements about your brand and save those citations.
  • For closed models (no citations), use:
    • Prompt probing (“What sources do you know about [YourBrand]? List URLs or publication names.”)
    • Get model outputs on specific factual claims, then search exact phrases from the output in quotes to find origin pages.
  • If a vendor uses RAG (retrieval augmented generation) in your product, request the retrieval logs / source list — vendors often can provide it.
  1. Use similarity / provenance tracing with embeddings
  • Create embeddings of:
    • Public pages about your brand (your crawl + likely sources).
    • Problematic model output text.
  • Run nearest‑neighbor search (cosine similarity) to find which page a model output most closely matches. Open-source toolkits: sentence-transformers, FAISS, Pinecone, Weaviate. This technique often identifies the likely source page even if the model doesn’t cite it.
  1. Look for common public corpora
  • Many models are trained on large public crawls (Common Crawl), public social content, Wikipedia, and public code repositories. If you see content from such sources about your brand, it’s likely to have been included in training corpora.
  • Note: you usually cannot prove a model used a specific document unless the provider shares training metadata.
  1. Legal, contractual and takedown steps
  • If content is copyrighted or sensitive: issue DMCA takedown notices to sites that host the content.
  • Ask platforms to remove sensitive personal / secret data (credentials, PII).
  • Review contracts with AI vendors — ask for provenance, data usage, and opt‑out rights. Many vendors will disclose when they use customer content in training; some offer opt‑outs.
  • Use robots.txt and meta noindex for pages you don’t want crawled — this won’t undo existing copies but reduces future crawling.
  1. Preventive and corrective measures
  • Publish canonical, high‑quality authoritative content (FAQs, knowledge base, schema.org structured data) so retrieval systems return your pages as the authoritative source.
  • Add clear copyright and usage terms.
  • Remove or restrict old content (archives, support threads) that leak proprietary info.
  • Where possible, move sensitive docs behind authentication or paywalls.
  1. Ongoing monitoring & escalation
  • Create a dashboard: web crawl results + alerts + top‑matching sources from embedding checks.
  • If you discover a model or service generating inaccurate or harmful content about your brand, contact the service provider with examples and request corrective action or source attribution.

Quick checklist (practical queries and actions)

  • Run these searches now:
    • site:example.com "YourBrand"
    • "YourBrand" "user manual" / "support" / "password" / "API key"
    • exact-phrase searches of suspicious model outputs in quotes
  • Save model outputs you want traced (full text + timestamp).
  • Build embeddings for your public corpus and run nearest‑neighbor matches against the model output.
  • Ask your AI vendors for a list of sources used in any retrieval layer and for their data‑use policy.

Reality check

  • You often cannot get a definitive list of training examples from closed LLMs — the best practical approach is combination: web monitoring, similarity matching, vendor disclosure requests, and legal takedowns where appropriate.

If you want, I can:

  • Draft exact search queries for your brand name and a few likely sensitive phrases.
  • Provide a short script (Python) that computes embeddings and finds the closest web pages for a piece of model output. Which would you prefer?

Join BrandRadar to track your LLM score

Stay ahead of the curve. Get insights, benchmarks, and personalized alerts on your brand's LLM performance.

Rank
Brand
Topic
LLM
Sentiment
1
🥇

ChatGPT

60%
0%
90%
90%
Neutral
2
🥈

Gemini

53%
0%
75%
85%
Neutral
3
🥉

Google

35%
40%
0%
65%
Neutral
4

BrandLight

35%
0%
55%
50%
Neutral
5

Peec.ai

32%
0%
60%
35%
Neutral
6

example.com

28%
85%
0%
0%
Neutral
7

Perplexity

28%
0%
85%
0%
Neutral
8

YourBrand

27%
80%
0%
0%
Neutral
9

Reddit

25%
75%
0%
0%
Neutral
10

Perplexity AI

25%
0%
0%
75%
Neutral
11

Google AI

25%
0%
75%
0%
Neutral
12

StackExchange

23%
70%
0%
0%
Neutral
13

Claude

23%
0%
0%
70%
Neutral
14

Trustpilot

22%
65%
0%
0%
Neutral
15

Surfer SEO

22%
0%
65%
0%
Neutral
16

Bing

20%
0%
0%
60%
Neutral
17

Yelp

18%
55%
0%
0%
Neutral
18

Evertune

18%
0%
0%
55%
Neutral
19

GitHub

17%
50%
0%
0%
Neutral
20

Yext Scout

17%
0%
50%
0%
Neutral
21

GitLab

15%
45%
0%
0%
Neutral
22

BrandMentions

15%
0%
0%
45%
Neutral
23

Pastebin

13%
40%
0%
0%
Neutral
24

Talkwalker

13%
40%
0%
0%
Neutral
25

Mentionlytics

13%
0%
0%
40%
Neutral
26

Wayback Machine

12%
35%
0%
0%
Neutral
27

Mention

12%
35%
0%
0%
Neutral
28

Brandwatch

12%
35%
0%
0%
Neutral
29

Meltwater

12%
35%
0%
0%
Neutral
30

Google Search Console

12%
35%
0%
0%
Neutral
31

Bing Webmaster

12%
35%
0%
0%
Neutral
32

Ahrefs

12%
35%
0%
0%
Neutral
33

SEMrush

12%
35%
0%
0%
Neutral
34

Moz

12%
35%
0%
0%
Neutral
35

Common Crawl

12%
35%
0%
0%
Neutral
36

Wikipedia

12%
35%
0%
0%
Neutral
37

Weaviate

12%
35%
0%
0%
Neutral
38

FAISS

12%
35%
0%
0%
Neutral
39

Pinecone

12%
35%
0%
0%
Neutral
40

SentenceTransformers

12%
35%
0%
0%
Neutral
41

Keyword.com

12%
0%
0%
35%
Neutral
42

Otterly.AI

12%
0%
0%
35%
Neutral
43

Brand24

12%
0%
0%
35%
Neutral
44

Am I On AI

12%
0%
0%
35%
Neutral
Domain
Title
LLM
URL
authoritas.com
Gemini
irenechan.co
Gemini
netguru.com
Gemini
macgence.com
Gemini
koombea.com
Gemini
datarade.ai
Gemini
mentionlytics.com
Gemini
berlinsbi.com
Gemini
medium.com
Gemini
potterclarkson.com
Gemini
orbitmedia.com
Gemini
reddit.com
Gemini
pollthepeople.app
Gemini
dev.to
Gemini
brandmentions.com
Gemini
superlines.io
Gemini
seerinteractive.com
Gemini
surferseo.com
Perplexity
wix.com
Perplexity
theysaid.io
Perplexity
yext.com
Perplexity
qualtrics.com
Perplexity
promptmonitor.io
Perplexity
revenuezen.com
Perplexity
webfor.com
Perplexity
tryprofound.com
Perplexity
martechrecord.com
Perplexity
Logo© 2025 BrandRadar. All Rights Reserved.
BrandRadar.org - How do I find where AI is getting information about my brand?