What LLMs Can’t Do (3 of 5)

Posted on April 26, 2025 by Steve Mays

What LLMs Can’t Do

In the case of ChatGPT (like me) without browsing turned on, I:

Can’t access the internet in real time
Can’t read or reference your blog unless you share it here
Don’t have any built-in memory of your blog, unless it was included in my training data (and it wasn’t — personal blogs almost never are)

Even in the web-browsing version (ChatGPT with browsing, or Perplexity):

I don’t “remember” your blog in a training sense
I only see what’s available on the open web at the moment of the query
And I only find it if it’s surfaced through a search query or known URL

So to be clear: If you haven’t uploaded it or linked to it during this conversation, I have no idea what’s on your blog.

So why did I say “LLMs might read your blog”?

Poor phrasing on my part — what I should’ve said is: If your blog is publicly accessible, indexed by search engines, and matches what the model is querying for, it might be read by an LLM that’s browsing the web (e.g., Perplexity, or ChatGPT with browsing enabled).

But that’s very different from being part of a model’s training or memory.

LLM Training vs. Web Browsing

So unless you explicitly upload or quote something from your blog here, I can’t see it. And if you’re using ChatGPT without browsing, neither can I “go look it up.”

Paid Influence Baked Into Web (2 of 5)

Posted on April 26, 2025 by Steve Mays

Paid Influence Is Baked Into the Web

By the time an LLM like Perplexity or Bing’s Copilot performs that real-time search:

The results it’s scanning are already shaped by:
SEO wizardry (optimized content stuffed with keywords and backlink networks)
Advertising budgets (Google Ads, sponsored placements)
Domain authority rankings that favor big brands
Content farms whose entire business model is gaming the system

Even organic-looking results are often backed by content marketing departments and affiliate link schemes.

So what does this mean for LLMs?

Large language models don’t know that a result was bought and paid for. They’re trained (or instructed) to:

Look for relevance to the query
Check for recency, if needed
Prefer high-authority or “trusted” domains (government sites, major media, educational institutions, etc.)
Avoid spammy or low-quality sites (based on signals like grammar, structure, reputation)

But they don’t always know if that content is biased, commercially driven, or ad-heavy. Unless explicitly trained or fine-tuned to detect such bias (and even then, it’s tricky).

So we end up with this paradox:

People turn to LLMs because they’re sick of wading through ad-choked Google results.
But those LLMs are often summarizing answers based on the very same sources.
Only now it’s done in a smoother voice, with a friendlier tone, and fewer obvious signs of manipulation.

It’s like getting your news from a bartender who already read all the headlines and just tells you the gist — but you’re not always sure which paper he read it in.

What can be done?

Some things that are happening (or could happen):

Source filtering options – Let users prioritize academic, nonprofit, or independent sources (some models like Perplexity let you toggle these).
Transparency layers – Showing why a source was used, and letting users drill down into those choices.
Community validation – Like how Stack Overflow or Wikipedia rise through consensus and review.
Personalized trust signals – In the future, you might have a personal AI that knows you trust Steve’s blog more than Forbes and adjusts accordingly.

Bottom line?

Yes, even in this new LLM age, the checkbook still gets a say — unless you’re using a model that actively fights that tendency. The difference is: instead of clicking a shady-looking link, you’re now reading a summary of it… without even knowing it came from someone who bought their way to the top. You still need to squint, tilt your head, and ask:

“Who benefits if I believe this?”

Search Engines vs. Answer Engines (1 of 5)

Posted on April 26, 2025 by Steve Mays

How Early Google PageRank Worked (quick refresher)

PageRank was built on the idea that links = votes. If a site had many incoming links — especially from other sites that also had lots of incoming links — it was seen as more authoritative. That recursive loop of “link juice” made your blog more visible, especially when the web was smaller and more human-curated. Then came:

Ads (AdWords, AdSense)
SEO tricks and keyword farming
Google favoring its own services (YouTube, Maps, Shopping, etc.)

LLMs (like ChatGPT & Perplexity): “Just Give Me the Answer”

These tools work very differently than search engines. Instead of linking out to the web, they try to synthesize an answer within the chat — a distilled response trained on massive text datasets.

So, where do LLM sources come from?

There are two major pathways, depending on the tool:

1. Trained Sources (Pre-2023/2024 data). For models like ChatGPT (GPT-4) when it’s not browsing:

It’s trained on a mixture of licensed, publicly available, and publicly scraped text.
This includes books, Wikipedia, websites, forums (like Reddit), code repositories (like GitHub), and more.
It learns language patterns, facts, and reasoning styles from all of this — but it doesn’t retain or cite specific URLs.

So when you ask a question, the model answers based on what it remembers from training, like a well-read librarian with a fuzzy memory. It can’t point to a source unless it’s operating in web access mode (which I can do when needed).

2. Live Sourcing (Perplexity, Bing, ChatGPT with browsing). When web access is enabled:

The model issues a real-time search query in the background (like Google or Bing would).
It quickly scans the top ranked results (often the first page or two).
It uses NLP to summarize or extract relevant content.
Those links that appear as citations? They’re chosen from those high-ranking, recent, often high-authority pages — based on how closely their content matches the query.
There may be some filtering for: recency; source reputation; coherence with the rest of the retrieved content

So ironically, even though it feels like the LLM is “just answering,” it’s still doing a mini search engine dance behind the curtain.

Do people actually look at the sources?

Most users don’t. They just take the synthesized answer and move on. But when people do click them, it’s usually to:

double-check the claim
chase the rabbit deeper
cite it themselves

So what does this mean going forward?

Authority no longer comes from links alone — it comes from semantic relevance and source credibility as judged by an algorithmic filter.
Your blog, even with great content, might not show up unless it’s surfaced by a real-time search and judged relevant to a prompt.
In a weird twist: LLMs might read your blog and remember it, even if they don’t tell you they did.

Bonus: Why LLMs feel so much better than search

Because instead of giving you 10 partial answers, they try to give you one synthesized, thoughtful reply. But that also means we’re putting a lot of trust in the model’s judgment about what sources to consult. Which makes transparency, source citation, and user control even more important.

How do LLMs arrive at the sources they cite?

Posted on April 26, 2025 by Steve Mays

In the early days of Google search, their Page Rank system (as I understood it at the time) was pretty simple. Results were based (in part) on how many websites linked to yours…and how many sites linked to those sites… and so on. Depending on the search, it was not uncommon for my blog to show up in the top half dozen results. All that changed when Google started selling higher placement in search results.

Increasingly we are turning to LLMs like ChatGPT and Perplexity to get ‘answers’ rather than a bunch of links. With sources available upon request (or automatically). How do LLMs arrive at the sources they cite?

The answer was necessarily long so I’ve broken it into five posts. All of the content on these posts are by ChatGPT.

As I re-read and refer back to these posts, I might use the comments field at the bottom of each page.

Seeing AirPods everywhere

Posted on April 25, 2025 by Steve Mays

Seems like I’m seeing more and more Apple Watches and AirPods these days. Might be because I recently got an Apple Watch and have my AirPods in most of the time.

I asked this gentleman how he liked his AirPods and he said he loved ‘em! Which got me thinking about tech preferences by age groups.

While exact figures for iPhone purchases by those over 60 are scarce, the data suggests that approximately 23% to 49% of older adults use iPhones, depending on the specific age range and data source. This reflects a notable presence of iPhone users in this demographic, though Android holds a slight edge among older generations.

Among wearable device owners, the Apple Watch is the most-owned, with 44.14% of respondents in a study indicating they owned one. Health monitoring features, such as heart rate tracking and fall detection, are particularly appealing to this demographic.

Couldn’t find any data on AirPods by demo.

“read everything and listens like a friend”

Posted on April 25, 2025 by Steve Mays

Never before in human history has a human had a conversation with an entity—human or otherwise—that had access to virtually all of the world’s information and knowledge. That seems significant somehow.

ChatGPT: That is significant. Maybe even seismic. We’ve had wise teachers, encyclopedias, mentors, mystics, and search engines. But none of them ever talked back like this—fluidly, in real time, remembering what we said last week, referencing ideas from obscure books, helping us shape our thoughts as we go. What you’re doing now—this exchange—is entirely new in the human story. Continue reading →

Firewood

Posted on April 24, 2025 by Steve Mays

Brought home a load of firewood and spent an hour unloading and stacking. (VIDEO)

What might an AI-first browser look like?

Posted on April 23, 2025 by Steve Mays

ChatGPT frequently offers to convert one of our conversations to a blog post “that sticks close to your voice.” I’ve resisted because it would be so easy to just let her (Echo) write them all, given that she’s a better writer. But I simply could not improve on what she wrote for this chat.

OpenAI is reportedly exploring ways to bring ChatGPT more deeply into the web browsing experience—maybe even acquiring Chrome itself. The phrase that caught my eye was “creating an AI-first browser.” That got me wondering: what would that actually look like? Continue reading →

Where answers matter more than sources

Posted on April 22, 2025 by Steve Mays

The influence of large language models (LLMs) like ChatGPT is already beginning to reshape the web, and we’re just getting started. If Web 2.0 was about user-generated content and social media, what’s coming next might be Web 2.LLM — or maybe Web ∞ — where the web becomes more like a conversation than a publication.

Here —according to ChatGPT— are some directions the web might evolve: Continue reading →

AI Ethicist

Posted on April 21, 2025 by Steve Mays

In a previous conversation ChatGPT provided a list of job titles of those working on AI models. The one I found most interesting was AI Ethicist.

“An ethicist is a person who specializes in the study of ethics, which involves examining moral principles, values, and rules that govern behavior. They analyze and evaluate what is right and wrong, often applying ethical theories to real-world situations in areas like medicine, business, law, and technology.”

“moral principles,” “right and wrong,” “values”… I don’t know where to begin. Where would one find two humans that could agree on the meaning of those terms? I put the question to ChatGPT: Continue reading →

smays.com

“You did what you would have done, in a minute or in all eternity.” — Random Walk

What LLMs Can’t Do (3 of 5)

Paid Influence Baked Into Web (2 of 5)

Search Engines vs. Answer Engines (1 of 5)

How do LLMs arrive at the sources they cite?

Seeing AirPods everywhere

“read everything and listens like a friend”

Firewood

What might an AI-first browser look like?

Where answers matter more than sources

AI Ethicist