The Magic 8-Ball vs. Gen AI: a surprisingly interesting comparison

Two products. Both fortune-tellers. Wildly different operating costs.

One Is a Plastic Sphere Filled With Alcohol. The Other Is a Trillion-Dollar Bet on the Future of Compute. They Do Roughly the Same Job.

In 1950, Brunswick Billiards commissioned a Cincinnati novelty firm to design a giveaway. The result was a plastic eight-ball with a 20-sided die suspended in dark blue alcohol and twenty pre-printed answers.

The product cost two dollars and outlasted roughly seven generations of personal computing, the Blackberry except for a few diehards, the entirety of New Coke, and Everyone Loves Raymond. It’s even responsive and mobile, if you have a big enough pocket.

Seventy-five years later, a different fortune-telling device has taken over the same niche. Large language models answer almost any question a human asks — should I take the job, does she like me, will this work, should I buy this stock— at a level of sophistication the 8-Ball could not have matched.

The technical achievement is real.

It also costs measurably more.

Both products do related work, with different tradeoffs. Looking at them side by side reveals where the toy holds up and where it doesn’t.

The 8-Ball won’t go in your codebase. Modern AI won’t sit on your desk for two dollars. The interesting question is what each gets right, and whether anyone has noticed how candid the older product was.

One product makes the dice visible. The other writes about it.

Honest Randomness vs. Fluent Uncertainty

Both products are, mechanically, sampling from a distribution. The 8-Ball gives you twenty possible answers — ten positive, five non-committal, five negative — a die roll dressed up as a fortune. LLMs sample tokens from a probability distribution through a prediction model conditioned on your prompt and on training data. Same operation, different scale, different marketing.

The interface choices are opposite. The 8-Ball wobbles in your hand. The die floats up through the liquid.

The toy is essentially saying: I am a die.

You are looking at the die.

The die has spoken.

Modern AI takes the opposite approach. The fluent prose, the chat interface, the structured formatting — they don’t lie, exactly. They just don’t bring it up. The output reads like a conversation with someone who knows. The fact that it is a probability distribution wearing a sentence is left for the user to deduce, which is generous.

The 8-Ball makes the dice visible. Modern AI wraps them in prose.

A 2024 study in Nature Machine Intelligence found that users systematically overestimate LLM accuracy because confident language reads as warranted confidence. Stanford’s RegLab found hallucination rates between 58% (ChatGPT 4) and 88% (Llama 2) on verifiable questions about random federal court cases — the model was wrong most of the time but composed nicely about it like a lawyer in court.

The newer models and purpose-built legal AI tools do considerably better, which is reassuring if you plan to be sued.

Two products. Same underlying mechanism. Opposite design contracts. One says “reply hazy try again.” The other says a paragraph that may or may not be related to reality, with proper grammar.

Three points of human judgment, built into a toy. Modern interfaces tend to have fewer.

Pauses by Design vs. Polish by Default

The 8-Ball required work from you. You had to frame the question — yes/no only, no follow-ups, no system prompts. You had to shake the device. You had to wait for the die to settle. You had to interpret “Reply hazy try again” as a non-answer and try again. Three different points of human judgment built into a 1950 toy. Most modern interfaces, to be clear, have zero.

Modern AI handles most of that for you. The model parses your question. The model decides when to refuse, when to hedge, when to commit. The model writes the answer in finished prose with bullets and headers, often with a concluding paragraph telling you what it just told you.

One product pushed interpretation back to the user. The other handles it on the user’s behalf. Both are real design choices.

Both have their place. For low-stakes tasks — drafting an email, debugging code — the chatbot is more convenient. For high-stakes decisions, the 8-Ball had a structural advantage: it physically prevented you from skimming, and contained no bullet points.

The design question for modern AI is how to keep the convenience while letting the user shift gears when it matters. Some products are working on this — confidence flags, source citations, structured “I’m not sure about this” patterns.

The default is still toward maximum polish, because polish demos well in keynotes.

One cost two dollars. The other cost a watershed.

Upfront Cost vs. Amortized Cost

The Magic 8-Ball cost roughly two dollars in 1950, sold once, and never asked the buyer for anything else. No subscription. No queries per month. No data center. No “we’ve updated our terms of service” email. No engagement loop. Modern AI runs on a fundamentally different economic substrate. The 8-Ball had no infrastructure. The 8-Ball was the infrastructure.

Training GPT-3 alone has been estimated to have evaporated roughly 700,000 liters of freshwater in U.S. hyperscale data centers for cooling, according to a 2023 study from UC Riverside. Per-query estimates vary by an order of magnitude — from a fraction of a milliliter to roughly half a liter per 100-word prompt — depending on cooling and how you scope “a query.” Hyperscalers are improving water efficiency.

The 8-Ball, by contrast, used some plastic and a bottle of marker fluid. The supply chain was “one person.”

One product sold its cost up front. The other amortizes it across infrastructure the user can’t see. Neither is wrong; both shape design.

The 8-Ball got visibility right: buyers knew exactly what they were paying for. Modern AI gets scale economics right: the per-query cost is small enough that “ask me anything” becomes a workable prompt.

The tradeoff is that invisibility lets the marginal query feel free. It is much easier to type “write me a screenplay about a sentient toaster” if you cannot see the water meter spinning.

There’s also a difference in pricing models. The 8-Ball was a one-time purchase; modern AI defaults to monthly billing with success measured in tokens consumed even if they are work shaped objects. For most products, that’s reasonable — the value scales with usage. For narrow tools and on-device models, the 8-Ball’s one-time contract may fit better.

For product managers and designers, the practical question is: what would your interface look like if users could see the per-query cost — in compute, water, and dollars — before they hit send? Probably more conservative. Possibly less profitable.

Twenty answers. Five refusals. A 25% refusal rate no modern product manager would survive carrying to a board meeting.

Borrowed Disclosure vs. Preserved Fluency

Designing for epistemic honesty doesn’t require returning to plastic. It requires borrowing from a fortune-telling toy at modern scale, which is a sentence I did not expect to write.

From the 8-Ball: surface the probability when the model is uncertain, at the moment of output. Label generated content as generated. Allow the prose to stop short of claims the model isn’t sure about. The 8-Ball had “Reply hazy try again” and “Better not tell you now” baked in as five of its twenty outputs — a 25% refusal rate that a modern chatbot product manager would have a hard time defending in a board meeting. The principle scales down: some calibrated refusal is honest, and users tolerate it.

From modern AI: don’t sacrifice the fluency. The shift from “shake the ball, interpret it yourself” to “ask in natural language” is a genuine UX improvement. Going backward toward more friction is not the lesson. Nobody wants to physically shake their laptop.

The design discipline is one of restraint: let the prose stop short of claims the model isn’t sure about.

Push the interpretation back to the user where you reasonably can. Pull quotes from a source beat a paraphrased summary. Citations beat unsourced claims. Ranges beat point estimates. None of this is technically hard, and most of it is already showing up in shipping products — labeled sources, confidence ranges in copilot tools, structured “I’m not sure about this” patterns. The design language for honest AI is forming in real time.

This is the work that gets handed to designers and PMs now — not “add AI,” but “help this AI communicate what it knows and what it doesn’t, in a way that doesn’t end with someone’s apology blog post.”

The Contract vs. The Capability

Over one million of magic 8-balls are sold today by Mattel. It became a household object, an office toy, a punchline at sleepovers. It lasted because the contract it offered was clear: I am a guess, treat me accordingly.

Modern AI products are doing fundamentally different work — reasoning, coding, summarizing, conversing, occasionally hallucinating federal court cases.

They’re also making a different default contract with the user: I am an answer, trust me accordingly. Both contracts are honest about something. Each is built for a different job.

The interesting question for designers and product managers isn’t which product is better — they’re different products. The question is what travels between them. Disclosure travels. Capability doesn’t go the other way.

Combining the two is the design work in front of us.

The 8-Ball got the contract right in 1950, in plastic, for two dollars. Modern AI got the capability right in the 2020s, in silicon, for the cost of running a small country.

The remaining design work is to ship both contracts, together.

The Magic 8-Ball vs. Gen AI: a surprisingly interesting comparison was originally published in UX Collective on Medium, where people are continuing the conversation by highlighting and responding to this story.

This post first appeared on Read More