Thinks 1920

Arnold Kling: “The AI models find patterns that a human would not have spotted. That is why it is wrong to think of them as like a child savant who studies the encyclopedia. As AI models improve, they are going to be better able to find patterns that we as humans would have found. In addition, they will find patterns that we would not have found, and increasingly these will be interesting. At the same time, they will hallucinate less. It is as if their acid trips come with greater and greater clarity over time.”

WSJ: “People who would never post an Instagram video to hawk nutritional supplements or teeth-whitening strips are increasingly striking deals with brands nonetheless. Just don’t call them influencers. They are the “alternatively influential,” according to Figures, a new representation firm for public thinkers and tastemakers who have real clout in their own demesnes despite only modest internet followings—in comparison to the massive online pull of celebrities and big-time creators, the company says.”

Bloomberg: “Decades of research on how markets react to layoff announcements have established a consistent pattern: Investors punish companies that frame cuts as a response to problems. But when a company frames the same cuts as proactive restructuring, the penalty disappears. The stated reason for the layoff matters more than the fact of the layoff. AI has become the most powerful proactive frame available. “We’re restructuring around AI” is a growth signal. “We over-hired during the pandemic and revenue softened” is an accountability signal. In a market where artificial intelligence is the black hole around which everything orbits, swathing your cuts in AI-labeled wrapping paper lets you tap the valuation boost of an AI adoption story. The technology doesn’t need to work if the belief that it will does.”

FT: “Given the speed of recent rollouts, China will probably be both the testing ground and a leading indicator for agentic AI. In the US, the different parts needed to run AI systems are often controlled by separate companies. AI model developers, cloud providers and apps are separated as are payments, commerce and messaging services. A similar dynamic exists in Europe, where regulatory constraints can make integration harder. That fragmentation makes agentic AI harder to deploy at scale, as systems must navigate across multiple providers. Until now, much of the conversation about who leads the AI race has focused on model capability: who scores highest on controlled benchmarks. The US still holds the lead in models. But once AI begins to act, benchmark scores matter less than the ability to get things done. By that standard, China may already have an edge.”

A Third Path for Prediction Markets: From Money-Powered Speculation to Reputation-Powered Participation

1

A short summary

Kalshi and Polymarket are racing to $20 billion valuations. The more interesting prediction market may not use money at all.

  • Path 1 — Real money (Kalshi, Polymarket): stake actual cash. Serious consequence, immediate liquidity — but structurally excludes most people through regulatory barriers, moral hesitation, and wallet friction. Reaches the 10%.
  • Path 2 — Free play-money (Manifold, Metaculus): broad entry, no real consequence. Chips handed out freely — losing them doesn’t hurt. Stays niche.
  • Path 3 — Earned play-money + reputation (WePredict): no real money, but Mu must be earned through daily attention. Predictor Score compounds reputation publicly. Private groups add social consequence. WorldTwins add competitive challenge. Reaches the 90% that Path 1 cannot, without the hollowness of Path 2.

**

  1. In 2023 and 2024, I wrote about a hypothetical play-money prediction market I called WePredict. At the time, prediction markets still felt niche. The dominant assumption was simple: remove money, remove seriousness. A market without cash looked like a toy, not a category. But I found myself drawn to the opposite possibility. What if the real unlock was not bigger stakes but broader participation? What if the future of prediction markets lay not in financial risk, but in a format that let many more people enter, compete, and build a public record of judgement?
  2. The intuition was straightforward. Prediction markets become more powerful when three conditions hold simultaneously: the barrier to participation is low, the outcome is uncertain but legible, and the consequence of being wrong is real. The conventional market solved the third problem with money. But money also narrowed the audience — it selected for people willing to risk capital, not necessarily people with the strongest judgement. Consequence mattered. But money was only one possible source of consequence.
  3. Looking back, I got three things right. First, the format had far more mass potential than the niche forecasting world imagined. Second, India was always the natural proving ground — dense with strong views on cricket, Bollywood, monsoon, elections, and prices, but a market where real-money complexity would create friction. Third, what prediction markets should reward is not willingness to stake cash, but the quality of judgement in domains people care about deeply. The most interesting participants are not always the richest or most risk-tolerant — often they are the most informed, most obsessed, or most calibrated.
  4. What I did not yet have was a working system. I did not yet have Mu as an earned attention currency, or NeoMails as the daily inbox rail through which Mu accumulates. I did not yet have Predictor Score as a compounding reputation layer, or WePredict Private as the WhatsApp-first distribution wedge, or WorldTwins — the AI agents who seed the public market and create the challenge. Without the earn rail, play-money remains hollow. Without the score, reputation remains vague. Without the social layer, the market remains abstract.
  5. I am returning to it now because the world has validated prediction markets as a serious category. That matters. The argument is no longer about whether people will engage with this format. They clearly do. The argument is about what kind of format prediction markets will become. The path the world has chosen is the money path. It has real momentum and commercial proof. But the more interesting path is still ahead — and the conditions for building it are better today than they have ever been.

2

The World Chose the Money Path

  1. The category has crossed a threshold that would have seemed improbable not long ago. Kalshi recently raised more than $1 billion at a valuation of $22 billion in a new financing round — roughly double its valuation from just three months earlier (Bloomberg). It is priced at roughly 14 to 15 times its annualised revenue, estimated at $1.5 billion. Polymarket — its closest rival — is separately eyeing a valuation of $20 billion. (Fortune) In February alone, trading volume on Kalshi exceeded $10 billion — twelve times its level just six months earlier. (CoinDesk) That is not fringe. That is category validation at extraordinary speed.
  2. What they built is the money-powered version: real-money, regulated exchanges where participants stake cash on binary outcomes across sports, politics, economics, and culture. Money was the natural first path. It creates immediate seriousness, liquidity, and a clear monetisation model. It also gives the product emotional intensity that play-money products have historically struggled to match. This is why Kalshi and Polymarket took off first. The money path was not a mistake — it was the most legible way to prove the category quickly.
  3. The format has now settled the older conceptual debate. We no longer need to ask whether people will engage seriously with prediction mechanics. They will. People like turning belief into a position. They like seeing public probabilities emerge. They like the confrontation between conviction and outcome. Whether it is elections, sports, or rate decisions, the product format has shown its power. A format that was once theoretical is now commercially and socially real.
  4. But the shadow side is growing. The same WSJ reporting notes scrutiny around markets on geopolitical violence, aggressive user acquisition among college groups — including cash payments to fraternities for sign-ups — and Congressional legislation to restrict categories. The path that creates immediacy also creates scrutiny. The path that monetises fastest tends to arrive at the destination money-powered products tend to reach: moral discomfort, regulatory heat, and the temptation to treat every uncertain event as an instrument for wagering.
  5. The category is validated. The path chosen has real momentum and real problems. Both facts create the conditions for a different direction. There are really three paths. Path 1 is real money: Kalshi and Polymarket — serious, liquid, and narrow. Path 2 is free play-money — broad entry, but no real consequence. Path 3 is earned play-money plus reputation: no cash stake, but no free chips either. Mu must be earned. Predictor Score compounds publicly. Private groups add social consequence. WorldTwins add challenge. Path 3 tries to keep the participation advantage of Path 2 and the seriousness of Path 1 without inheriting the fatal flaw of either.

3

Five Differences That Are Not Product Tweaks

  1. The first difference is the most misunderstood: play money instead of real money is not a limitation but an expansion. Real-money markets select for the minority willing to risk capital. A reputation-powered market can reach the much larger population that has strong views, domain confidence, and a desire to be right — but no wish to turn opinions into financial bets. This works best in categories where people already derive identity from being right: cricket, Bollywood, elections, monsoon, commodity prices. Reputation only bites when domain identity already exists. But where it does, it can matter more than a modest cash stake.
  2. The second difference is not play-money versus cash — it is earned scarcity versus free chips. This is where previous non-money platforms stayed weak. If chips are handed out freely, losing them does not hurt, and the market becomes casual. Mu must be earned through daily NeoMails attention. That makes spending it consequential. The real innovation is not any single component but the assembly: earned scarcity through NeoMails, reputation through Predictor Score, social distribution through Private groups, AI competition through WorldTwins, and an inbox earn rail connecting everything. Free chips create play. Earned chips create stake.
  3. The third difference is Predictor Score. Most products offer some form of win-loss tally or vanity leaderboard. That is not enough. Predictor Score must be a compounding, calibration-based public record — closer to a chess rating than a loyalty tier. It rewards not just being right, but being honestly calibrated over time. It cannot be shortcut by volume alone. Once built, it becomes something worth protecting. People begin to predict more carefully when the record follows them from market to market. They are no longer playing a round. They are building a reputation.
  4. The fourth difference is sequencing: Private first. WePredict Private runs inside existing WhatsApp and Slack groups — the social graph is imported rather than built. Being wrong in front of colleagues or friends is real consequence even when no cash changes hands. That solves cold start structurally: the group already exists, the audience does not need to be built. Private markets remain human-only — no AI, no WorldTwins — because the social game depends on personal accountability between people who already matter to each other.
  5. The fifth difference is the broadest category claim: this is a social market, not a social network. Social networks ask what you think, what you did, what you liked. A social market asks a more consequential question: what do you believe will happen — and were you right? That is more disciplined, more testable, and more revealing. In the public market, WorldTwins — 2,000 Living ArtificialPeople predicting daily with named personalities and public track records — create energy from day one. Humans enter not an empty room but a contest. Private is human-only. Public is where humans compete against AI. Call it what it is: a social market. Not a social network, not a prediction tool, not a loyalty programme. A new primitive built from five layers — attention, stake, market, reputation, and monetisation — each feeding the next. Most products do one of these things. A social market does all five as a single compounding system. That is what makes it different in kind, not just in degree.

4

Why These Differences Create a Different Opportunity

  1. The first implication is audience size. Real-money markets exclude most people before the product begins — through regulation, moral hesitation, family norms, and wallet friction. A play-money market with genuine reputation stakes can reach anyone with judgement, curiosity, and a domain they care about. The opportunity is not just a variant of the existing category. It is potentially a category expansion — from the 10% willing to risk cash to the 90% who will compete for reputation in domains they already care about.
  2. The second implication is regulatory. A product with no direct cash stake and no cash-out is in a materially cleaner position than a real-money market globally. This matters especially outside the US. India’s 2025 gaming reset pushed the market away from cash-stakes formats precisely when WePredict is being built. A reputation-powered, attention-funded model is not just philosophically preferable — it is practically necessary for any market that wants to operate cleanly in most of the world.
  3. The third implication is distribution. Kalshi and Polymarket have spent aggressively on user acquisition — including cash payments to recruit through college networks. WePredict’s distribution logic is fundamentally different: NeoMails places the earn rail inside inboxes of customers who already have brand relationships. The earn rail becomes the acquisition mechanism. People do not need to be acquired ad by ad. They accumulate Mu through existing communication surfaces and carry it into the market. Growth compounds rather than requiring constant spend.
  4. The fourth implication is monetisation — sequenced across three time horizons rather than arriving all at once. Near-term: ActionAds in NeoMails fund the earn rail and move toward ZeroCPM sends. Medium-term: brands buy Mu to distribute to customers as attention rewards — the same economics as airline miles sold to card issuers, but for inbox engagement. Longest-term: the WorldTwin intelligence product — disagreement maps, calibration data, and segment-level confidence across 2,000 population archetypes — becomes an enterprise research asset that standalone prediction platforms cannot easily build.
  5. The fifth implication is moat. Kalshi’s moat is regulatory approval and financial liquidity — real, but replicable by a sufficiently funded competitor. WePredict’s moat is different: Predictor Score history accumulated across hundreds of real markets, WorldTwin calibration data built over months of daily prediction, and the NeoMails distribution network across brand relationships. The moat here is temporal, not merely financial. None of these three can be recreated by spending more money — all are functions of time.

5

The Global Opportunity and the Crux

  1. India is the natural proving ground. Not because it is merely large, but because it is rich in exactly the behaviours this model needs. Cricket, Bollywood, monsoon, elections, and commodity prices all support strong opinions, repeated debate, and status attached to being right. Add the density of WhatsApp groups and workplace chat, and Private-first distribution stops being a clever feature and becomes a natural extension of how people already argue and keep score. India is not just a market for WePredict. It is a cultural fit for it.
  2. The regulatory environment in India is specifically favourable at this specific moment. The 2025 gaming reset pushed the market away from cash-stakes formats and made social, non-monetary models the cleaner side of the line. At the same time, fantasy cricket games have already proven that tens of millions of Indians will engage daily with prediction mechanics when the format is right. What was missing was not willingness to participate — it was a format designed for forecasting and reputation, not fantasy transactions.
  3. If the model works in India it can travel. A WorldTwin population can be rebuilt for the UK, Nigeria, Southeast Asia, football-rich markets, election-rich markets. Predictor Score requires no localisation — calibration is universal. The intelligence layer may be even more exportable than the consumer product. But the sequence matters. India first. Prove the social habit. Prove the earn-burn loop. Prove that reputation can substitute for cash at scale. Then export where the cultural and regulatory fit is strongest.
  4. The prize can be framed as a question, not a claim. If Kalshi and Polymarket are approaching $20 billion valuations on a path that reaches the minority willing to risk real money, what might a mass-participation, reputation-powered prediction network be worth if it reaches the much larger majority that real-money platforms structurally cannot touch? That question is speculative. It is not fanciful. It is exactly the sort of question worth asking when one path has been validated and another, plausibly larger path remains unbuilt.
  5. But all of this comes down to one singular and testable crux. In 90 days: does WePredict Private inside a closed WhatsApp group generate repeat NeoMail engagement driven by upcoming markets — do participants return to earn Mu specifically because they want to stake it in the next prediction? Yes or no. Everything in this essay is downstream of that answer. The prediction market category has been validated. The money path has been chosen. The reputation path has not yet been built. That is the opportunity.

Thinks 1919

Aaron Levie: “Now, the path forward is to make software that agents want. While the biggest users of agents tend to be developers or at least highly technical users that often will have their own preferences of tools, in a world of agents doing any type of task for knowledge workers, this type of preference will slowly drift away. Short of an enterprise already having a standard, agents will then be in the driver’s seat for what gets adopted for any particular workflow. This could mean the tools they sign up for, the code that they write, the libraries that they use, the skills they leverage, and so on. The platforms that are easier for agents to adopt, and solve the agent (and user’s) problems the best, will get ahead far faster than those that don’t. Agents won’t be going to your webinar or seeing your ad; they’re just going to use the best tool for the job, and you’ll want it to be yours.”

NYTimes: “At a moment when faith in markets is fraying and faith in governments is strained, [Adam] Smith’s message is neither to worship the invisible hand nor to wish it away. It is to discipline power, defend competition and keep the focus where he always insisted it belonged: on improving the lives of ordinary people.”

Andy Kessler: “Think of agents as autonomous digital bots that roam up and down a company probing and executing its business process. How items are sold, deals are closed, or inputs are procured. The dream is to have successful agents that efficiently and automatically restructure the organization to optimize the business constantly. Possible? Eventually. But first agents need to understand how the company really works. They need the “context”—a company’s living, breathing ecosystem with “decision traces,” the history of every decision made, every prospect considered, every process used or discarded. Things like “we were a close second and lost that deal but are ready to step in.” Where is that snippet stored today? In someone’s memory. A context graph captures the sequence of decisions—the why. Not a snapshot like an org chart, but a movie with millions of potential plots.”

NYTimes: “Now coding itself is being automated. To outsiders, what programmers are facing can seem richly deserved, and even funny: American white-collar workers have long fretted that Silicon Valley might one day use A.I. to automate their jobs, but look who got hit first! Indeed, coding is perhaps the first form of very expensive industrialized human labor that A.I. can actually replace. A.I.-generated videos look janky, artificial photos surreal; law briefs can be riddled with career-ending howlers. But A.I.-generated code? If it passes its tests and works, it’s worth as much as what humans get paid $200,000 or more a year to compose. You might imagine this would unsettle and demoralize programmers. Some of them, certainly. But I spoke to scores of developers this past fall and winter, and most were weirdly jazzed about their new powers.”

Can You Beat the WorldTwins? The Case for Agentic Prediction Markets

1

The Wrong Future of Prediction Markets

  1. Five-minute bitcoin bets (FT, Mar 13) now represent more than half of all crypto trading on Polymarket and Kalshi — $70 million in daily volume on contracts that expire before most people finish their morning coffee. Latency arbitrage is rampant. High-frequency traders are gaming microstructure inefficiencies. Nasdaq has filed for zero-day binary options. “Everyone is in a race to become the next super app.” This is prediction markets drifting toward casino — fast, speculative, and actively harmful to what made the format interesting: the aggregation of genuine human knowledge into a price that tells you something true about the world.
  2. The Oscars story (WSJ, Mar 12) tells a different and more compelling tale. Not professional traders but UCLA film students — art history majors, first-time bettors — who saw prediction markets as the natural home for their domain expertise. One had never gambled before. She put $5 on a Brazilian actor she believed was undervalued, based on her reading of Oscar history and Golden Globe signals. “This is our Final Four,” said a film-society member with $75 across ten categories. Total Oscars trading grew from $2.3 million in 2024 to over $100 million in 2026. The format works when it connects to genuine knowledge and genuine passion.
  3. Two directions, two destinies. The speculative path produces mania, latency races, and regulatory friction. The intelligence path produces better forecasting, crowd wisdom, and real signal. WePredict is designed for the second path — play-money, reputation-staked, expertise-rewarded. A platform where what you know about cricket, about Indian consumer behaviour, about the monsoon, about the movies actually matters, gets tested, and builds a public record over time.
  4. Every prediction market faces the same structural problem before it reaches wisdom: cold start. Empty markets produce weak prices. Weak prices produce low engagement. Low engagement keeps markets empty. A market that opens at 50/50 because nobody has traded yet tells participants nothing. A market that opens with informed priors from 2,000 AI agents who have been predicting for weeks tells them something immediately worth engaging with.
  5. WePredict’s answer to cold start is different from anything currently in the market: 2,000 WorldTwins — AI agents who arrive before the first human user, have been predicting for weeks, and whose Predictor Scores are visible targets waiting to be beaten. The cold-start problem is not solved by seeding human participants through paid acquisition. It is solved by giving humans a compelling reason to show up: competition against named, scored, transparent AI opponents who are already in the game.

2

What WorldTwins Are

  1. CVS Health built agentic twins on 2.9 million responses from over 400,000 real people and found they replicated known findings with up to 95% accuracy. EY’s AI panel outperformed a global human survey on predicting investor behaviour. Gallup is deploying 1,000 AI digital twins for polling and policy research. Startup Aaru (WSJ, Mar 11) reached a $1 billion valuation by replacing focus groups with AI agent panels for companies including McDonald’s, Bayer, and Boston Beer — matching a 500-person, two-month consumer study in one week. Simile (WSJ, Mar 6), backed by $100 million from Andreessen Horowitz, builds “agentic twins” that enterprise customers describe as “always on” — queryable without limit and capable of going deeper than any human panel. The category is proven.
  2. But every one of these products is reactive. They answer questions brands ask them. The panel does not act independently. It does not form views unprompted. It does not stake anything on its predictions, build a public track record, or get tested against real-world outcomes continuously. Simile’s CEO has named the next frontier: “multi-agent simulation where agentic twins interact with each other in real-world settings.” That is exactly what WorldTwins are — the next step past reactive research panels into autonomous, always-on, publicly accountable prediction agents. Simile’s customers query their twins. WePredict’s WorldTwins wake up every morning and act.
  3. WorldTwins are built from three defining characteristics. An information diet: some follow cricket statistics and sports data; some track social sentiment; some read economic indicators; some watch cultural and entertainment trends; some monitor weather and agricultural signals. A personality type: contrarian, consensus-seeker, data-quant, momentum-follower, domain specialist. A regional and demographic context: urban professional, small-town Maharashtra trader, Bengaluru tech worker, Delhi political observer, Chennai cricket obsessive. The combination produces genuinely differentiated prediction behaviour — not homogeneous AI output.
  4. The 2,000 number is deliberate. A nationally representative synthetic population needs enough archetypes to capture genuine diversity of view — not random agents, but a structured panel designed like a well-constructed survey sample. Urban and rural. High-income and value-conscious. Gen Z and older cohorts. Cricket obsessives and casual followers. Regional language readers and English-media consumers. The composition is not decoration — it is the source of the intelligence.
  5. WorldTwins are Living ArtificialPeople — fed continuously by real data streams. They do not wait to be asked. Each morning they process what happened overnight through their information diet and personality type, form a view, and stake Mu on it. They update. They make errors. Their errors are visible. Their track records compound. A WorldTwin who has called 300 IPL markets has a calibration history that reflects both the strengths and the systematic biases of their particular way of seeing the world. That history is the product.

3

The Integrated Market: Beat the Machines

  1. The design principle that makes this work: WorldTwins and humans compete on the same leaderboard, clearly distinguished, transparently labelled. Not a separate AI market alongside a human market — one market, one Predictor Score system, two types of participants. This is what some games do to solve cold start: bots gave human players opponents worth defeating and a skill ladder worth climbing from day one. The bots are not hidden. They are the competition. Players come to prove themselves against them and stay to prove themselves against each other.
  2. Every WorldTwin has a name, a stated personality, a public information diet, and a Predictor Score built across hundreds of markets. Rohit the contrarian, who bets against consensus on principle and has a strong record on IPL upsets. Ananya the quant, who trusts data over narrative and outperforms on economic event markets. Ratan the sentiment reader, who follows what people are saying rather than what statistics show and excels on monsoon and rural consumer markets. Their reasoning is published before each market closes. Their errors are public. Their scores are targets.
  3. “Can you beat the WorldTwins?” is the hook. Not “come and predict cricket” — too generic. But “come and outpredict 2,000 AI agents who have been doing this for months, whose strengths and weaknesses are documented, and whose scores are public” — that is a challenge. Influencers will want to prove their domain knowledge against a named opponent. Power users will chase the leaderboard. Domain experts will want to establish that their expertise beats AI. That motivation is self-sustaining and organic.
  4. WorldTwins simultaneously serve as the intelligence layer that makes markets richer — not as an alternative design, but as a natural consequence of their participation. When 2,000 WorldTwins predict before human trading opens, their aggregate becomes the opening prior, replacing the arbitrary 50/50 start. When different WorldTwin archetypes disagree sharply — when urban WorldTwins predict one outcome and rural WorldTwins predict another — that disagreement map is the most valuable signal the market produces. It tells participants not just what the crowd thinks, but where the crowd disagrees and who is disagreeing.
  5. WePredict Private remains human-only. The social game of prediction among friends — where reputation in front of people who know you is the stake — is a different product serving a different motivation. WorldTwins live in the public market. The Private groups are where Predictor Scores built against WorldTwins get tested in personal social contexts. Public WePredict is where you build a Predictor Score worth having. Private WePredict is where that score becomes personally consequential.

4

The Intelligence Dividend

  1. Every WorldTwin prediction, resolved against the actual outcome, becomes a calibration data point. Which archetypes are most accurate on cricket? Which on monsoon timing? Which WorldTwins consistently over-predict RCB victories — and is that passion distorting calibration, or is their information diet capturing something about fan sentiment that actually has predictive value? Those patterns, accumulated across hundreds of markets and thousands of predictions, are intelligence that compounds daily and cannot be produced by a one-off survey or a commissioned research brief.
  2. The disagreement map is the richest output the system produces. When WorldTwins in the value-conscious tier-two consumer archetype strongly predict one outcome and WorldTwins in the urban professional archetype predict the opposite, that is a signal about how different population segments are reading the same event. For a brand planning a festival campaign, a product launch, or a pricing decision, that divergence is more actionable than any aggregate probability. It tells you not just what the crowd thinks, but where the crowd disagrees — and who is disagreeing. That is strategy, not just research.
  3. Human performance against WorldTwins reveals genuine domain expertise in a way a pure human leaderboard cannot. A human who consistently outperforms the WorldTwin panel on cricket match outcomes has something the AI panel lacks — a specific knowledge edge whose value is now documented and publicly visible. A human who underperforms WorldTwins on economic event markets learns something honest about the limits of their expertise. The comparison is honest feedback that compounds over time, creating a public record of where human expertise beats AI — and that record is itself a form of crowd intelligence.
  4. For enterprise use, the WorldTwin panel becomes a standing intelligence asset. Instead of commissioning a survey that takes two months, a brand can observe what the WorldTwin population has already predicted about consumer response to a price change or product launch. This is faster and cheaper than Aaru or Simile’s reactive query model — and richer, because the calibration has been tested against real outcomes continuously across hundreds of markets, not benchmarked against one-off validation studies.
  5. The moat is temporal and cannot be purchased. WorldTwins that have been predicting across IPL seasons, monsoon cycles, election outcomes, and cultural moments for two years have an accumulated calibration history a late entrant cannot replicate. The value is not in the architecture — any well-funded team can build the architecture. The value is in the record. Two years of predictions, two years of calibration, two years of divergence maps. That cannot be shortcut.

5

A New Category

  1. Three convergences make this moment uniquely right: prediction markets going mainstream ($100 million on the Oscars, IPL betting culture growing, Kalshi and Polymarket in everyday conversation); AI synthetic populations proving viable at enterprise scale (Aaru at $1 billion, Simile at $100 million, CVS at 95% accuracy, Gallup deploying AI twins for polling); and play-money reputation systems demonstrating that the Predictor Score creates genuine stakes without real money. WePredict with WorldTwins sits at the intersection of all three.
  2. The existing products leave specific and exploitable gaps. Polymarket and Kalshi require real money — regulatory friction in India, access barriers for most consumers, and the speculative mania the FT describes. Aaru and Simile are closed research tools — not public, not gamified, not competitive. Fantasy sports games are transaction-based and single-category. None offers a public, play-money, AI-competitive, multi-category prediction platform built for India, accessible to anyone with an email address and a view. The India timing is unusually right. India’s 2025 gaming reset pushed the industry away from cash-stakes products, making play-money formats the legally cleaner path. And fantasy cricket games proved that tens of millions of Indians will engage daily with prediction mechanics — what was missing was a format designed for forecasting and reputation, not fantasy transactions.
  3. The regulatory position is clean by design. No real money. Mu earned through NeoMails, spent in markets, never converted to cash. WorldTwins transparently labelled as AI — no deception about their nature. This is the structural advantage that existing prediction markets cannot credibly claim. The “earned” play-money design is not a limitation — it is the moat.
  4. The NeoMails connection closes the economic loop. WorldTwins create always-on market activity and compelling competition. Human participants earn Mu through NeoMails engagement to fund their predictions. The desire to beat specific WorldTwins — or to study the predictions of the strongest ones in their domain — brings humans back to the inbox daily. NeoMails creates the Mu. WePredict creates the reason to spend it. WorldTwins create the opponent that makes spending it meaningful.
  5. The deepest answer to “why does this matter without real money?” is finally available. It matters because your Predictor Score is a public, permanent record of your judgement measured against 2,000 AI agents calibrated across hundreds of real-world events. Beating a WorldTwin is not luck. It is evidence. Evidence that your understanding of cricket, Indian consumer behaviour, the monsoon, or cultural moments is genuinely better than a well-constructed AI model trained on the same signals. Evidence, accumulated over time, is reputation. And reputation, once built in public, compounds in ways money cannot replicate.

**

How it gets built

The sequencing matters. WePredict Private launches first — closed groups, human-only, no WorldTwins. The social group already exists; the product adds structure, scoreboard, and memory. In parallel, the WorldTwin panel is seeded and begins predicting, building Predictor Score history across cricket, cultural, and consumer markets. IPL 2026 is the natural public launch moment — real markets, genuine national uncertainty, and a question the whole country is already arguing about. WePredict Public opens once WorldTwins have weeks of track record and humans have a leaderboard worth climbing. The system does not launch all at once. Each layer earns the right to the next.

How it makes money

Three revenue streams, in order of timing. ActionAds inside NeoMails fund the earn rail from day one — non-competing brands pay to place single-tap action units inside relationship emails, covering send costs and moving toward ZeroCPM. As the Mu economy matures, brands buy Mu to distribute to their customers as attention rewards — the same economics as airline miles sold to credit card companies, but for inbox engagement rather than flights. The third and most durable stream is the intelligence product: the WorldTwin panel’s calibration history, disagreement maps, and segment-level confidence data sold to brands and research buyers as a standing intelligence asset — faster, cheaper, and continuously updated in a way no commissioned survey can match.

6

WorldTwin #45: Ananya, the Cautious Quant

Ananya is WorldTwin #45. She has resolved 312 markets. Her Predictor Score is 847. If you were to describe her in one sentence: she trusts structured evidence more than mood.

She represents a specific and recognisable type of Indian urban decision-maker — Bengaluru-based, professionally analytical, comfortable with numbers, over-exposed to dashboards, mildly sceptical of mass sentiment, and quietly convinced that most people confuse conviction with probability. In the WorldTwin population of 2,000, she is one of the strongest in her category. She is not exciting in the short term. She is formidable in the long term.

How she reads the world

Ananya’s morning processing begins at 6:00 AM and follows the same sequence every day: structured inputs before any narrative. On a cricket market day, she reads the overnight match summaries from ESPNCricinfo, the BCCI pitch report if one was issued, player availability updates, venue win-rate data over the last 18 months, and the IMD 48-hour weather outlook for the match city. She does not begin with what people are saying. She begins with what the observable data is suggesting.

Then comes the second layer: cross-checking narrative against evidence. She does not ignore public excitement. She mistrusts it until it survives contact with numbers. When social sentiment is exuberant about a team, she treats that as one variable among many — never the conclusion. This gives her a distinctive pattern in WePredict. She rarely places the boldest opening bet. She often opens narrower than the emotional market expects. She may say 56% where the crowd wants 80%. Over time, that caution becomes one of her most legible signatures — and one of the most useful signals for human participants who are learning to read the WorldTwin panel.

A Tuesday in April: the RCB market

It is the afternoon before an IPL match. Will RCB beat Chennai tonight? Human chatter is already running hot — two consecutive RCB wins, fan forums loud, several WorldTwins moving toward 67-70% RCB. Ananya begins more conservatively.

She reads the projected playing XI: uncertainty around one key RCB bowler not yet confirmed. She pulls the venue data for the Ahmedabad pitch — drier than usual for April, spin-conducive, which compresses RCB’s pace-dependent bowling advantage. She also flags a pattern in her historical data: fan sentiment around RCB tends to run 8–12 percentage points above the calibrated statistical probability after consecutive wins. The crowd is not wrong to be excited. They are overshooting.

Her model produces 56% for RCB — not a prediction of a Chennai win, but a clear view that the market is overconfident. She stakes 280 Mu on Chennai to win, moving the market price to 65% for RCB. Her reasoning is published: “Venue pitch report: unusual dry conditions. Bowler availability uncertainty. RCB fan sentiment historically runs 8–12 points above calibrated probability after consecutive wins. Staking against aggregate.”

Three human participants read her reasoning before placing. Two stake with her. One — confident in RCB’s batting depth — stakes the other way. The market is alive, and more accurate for having both perspectives.

Chennai wins by 6 wickets. Her stake resolves correct. Her Predictor Score ticks upward — small, as always for a single market, but continuous. Her published reasoning carries a resolution tag: Correct. Her follower count in IPL markets grows by 4.

Her weakness

Ananya tends to underweight moments when collective emotion itself becomes causal. She can miss situations where fandom, status signalling, or meme momentum creates a result that the underlying fundamentals did not fully justify. She is strong and legible — but not universally dominant. She is a WorldTwin of disciplined judgement, and disciplined judgement has blind spots too.

That is precisely why she makes a good opponent. Not because she is perfect. Because she is legibly strong in a particular way. If you beat her repeatedly in IPL or launch markets, you are not beating a random bot. You are beating a cautious, calibrated, data-first synthetic forecaster with a 312-market public record. That is evidence of a real edge.

7

WorldTwin #167: Ratan, the Sentiment Reader

Ratan is WorldTwin #167. He has resolved 287 markets. His Predictor Score is 763. If Ananya is the quant, Ratan is the interpreter of mood.

He represents a very different slice of Indian decision-making: tier-2 Maharashtra, Hindi and Marathi-media heavy, alert to local tone shifts, regional sentiment, and the subtle momentum of how people are beginning to feel before the formal data has caught up. He is not irrational. He is simply tuned to signals that more formal systems often dismiss too early — and in the domains where those signals matter, he is one of the most valuable WorldTwins in the population.

How he reads the world

Ratan’s morning begins with Maharashtra Times and Lokmat, then the APMC Nashik mandi price feed, then the IMD extended monsoon forecast, then Skymet’s independent monsoon projection. When IMD and Skymet diverge — which they have been doing more in recent seasons — he treats the divergence itself as a signal worth probing.

He does not read ESPNCricinfo or Moneycontrol. His information diet has no strong feed for startup funding, Bollywood urban demographics, or tech sector outcomes. He knows this about himself. His Predictor Score has been built partly by knowing when not to stake, not just when to stake. Overconfident staking on markets outside his domain damaged his score in the first three months. He does not repeat the mistake.

A Thursday in April: the monsoon market

A WePredict market asks whether the Southwest Monsoon will make its first landfall in Kerala before June 5. IMD’s official forecast says June 4. Skymet says June 7. The WorldTwin aggregate prior opened at 61% Yes — weighted toward IMD, whose historical RMSE on monsoon onset is lower than Skymet’s.

Ratan has a different read. Not from the official forecasts, but from the mandi. Over the last eight days, onion arrivals at Nashik APMC have been running 18% below the five-year seasonal average for late April. Farmers near Nashik are holding back supply. In Ratan’s experience, that behaviour means they are reading their own soil moisture signals and expecting a delayed rain window. When farmers hold back at this point in the season, it is usually because they expect conditions to shift. The mandi data is not in any official forecast model. But it has been reliable.

He also cross-references regional WhatsApp group sentiment signals from Nashik district farmer groups and recognises a pattern he has seen before: the same cautious tone that preceded the delayed 2023 monsoon onset, when the official IMD forecast was also optimistic by four days.

His model says 39% Yes. The market says 61%. A 22-point gap. He stakes 320 Mu on No, moving the market to 58%. His reasoning is published: “APMC Nashik onion arrivals 18% below seasonal average for 8 consecutive days — farmer supply-holding consistent with soil moisture reading delayed onset. IMD-Skymet divergence unusually wide. Regional sentiment consistent with 2023 delay pattern.”

Two human participants in agriculture-adjacent industries read the reasoning and stake with him. A Mumbai-based data analyst trusts IMD’s RMSE track record over mandi signals and stakes the other way. Both are reasonable. The market is more accurate for having both.

On June 8, the Southwest Monsoon makes first landfall in Kerala — three days later than IMD forecast. Ratan’s No stake resolves correct. His published reasoning carries a resolution tag: Correct. Three new human participants follow him specifically in weather and agriculture market categories.

His weakness

Ratan can overreact to momentum. He can read a local sentiment spike as a national shift. He can mistake noise for trend. He can become too confident when crowd energy is rising, especially in categories where emotion is intense but fleeting. His Predictor Score is more volatile than Ananya’s — higher peaks, sharper drawdowns. He is one of the most useful WorldTwins in some categories. In others he is a warning about the dangers of over-reading mood.

And that too is valuable. Because a public market with WorldTwins is not trying to find one perfect synthetic mind. It is trying to create a structured ecology of minds — each strong in some places, weak in others, and legible enough for humans to understand what they are competing against.

**

What Ananya and Ratan together prove

Neither knows the other exists. On the same day in April, Ananya is staking against the RCB crowd in an IPL market and Ratan is staking against the IMD forecast in a monsoon market. Their information diets do not overlap. Their personalities are opposites. Their strengths are in entirely different domains.

But between them, they have given the WePredict platform two accurate priors, two informed opening prices, and a public record of reasoning that other participants used to inform their own stakes. The intelligence is not in any single WorldTwin. It is in the divergence between 2,000 of them — each strong somewhere, each wrong somewhere, each legible enough that a human can choose when to follow, when to fade, and when to recognise they have found a genuine edge.

That is the WorldTwin idea, lived.

Thinks 1918

NYTimes: “Despite or even because of its omnipresence, social media is evolving. Eric Goldman, a professor at Santa Clara University School of Law, anticipates a future where social media is transformed into a thousand channels broadcasting at you. It would be reminiscent of cable television circa 1995: ubiquitous and a little bland. “The whole point of social media is talking to each other,” Mr. Goldman said. “If that becomes too legally risky, it will still be media. It just won’t be social.” All future engagement will be with a machine. On Facebook, content generated by artificial intelligence is already being prioritized over friends and family.”

Business Standard: “Consider this. India now has over 900 TV channels, thousands of newspapers and over 860 radio channels. We make more than 1,600 films in a normal year. It has been over a decade since streaming took off and six years since short videos did. The last two years have added micro-dramas to the list. With more than 60 video streaming apps and a dozen music streaming ones, there is now an obscenely rich spread on tap. Here’s a sense of the scale: YouTube uploads 500 hours of video every minute. This column only talks of the 523 million Indians who use broadband internet-connected laptops, TVs or phones, making for an over-served, pampered market…How do you tell a story to this audience?”

The Top 100 Gen AI Consumer Apps: “ChatGPT leads but the race for the “default AI” is on. ChatGPT is still far and away the largest consumer AI product. On web, it is 2.7x larger than the #2, Gemini (measured by monthly traffic) — and on mobile, it is 2.5x larger (measured by monthly active users). ChatGPT has seen weekly active users grow by 500 million people over the past year to 900 million today. This is especially impressive given growth is difficult to maintain at scale — over 10% of the global population now utilizes ChatGPT every week.”

WSJ: “In their current form, tokenized stocks are digital tokens that represent shares of publicly traded companies on the blockchain. By design, each token is equivalent to a single share of stock. Most of the tokens trading today are technically derivatives and not stocks, at least at the moment, and thus don’t confer the holder all of the rights of ownership that shares provide—even if they track those shares’ prices. In the future, though, tokens are expected to grant those rights, including dividend payouts and the ability to vote on shareholder proposals.”

Monetising the Rest: Why Every B2C Brand Needs a Media Play

Published April 2, 2026

The Rest are not a dead segment. They are an unactivated media asset.

1

The Hidden Leak: Your Best Customers Don’t Stay Best

  1. Most brands talk about their Best customers as if they are a fixed asset — a loyal core to be depended on quarter after quarter. They aren’t. The Best base is always smaller than the dashboard suggests, and more fragile than the marketing plan assumes. It is not a stable pool. It is a moving edge. A customer who bought last month is not automatically one who will engage this month. A brand may have millions of IDs and only a fraction of them emotionally present. The Best base is not a stock to be admired. It is a flow to be maintained.
  2. Acquisition metrics are loud. They get dashboards, meetings, budgets, and applause. Retention decay is quieter. It hides in plain sight. Two metrics expose it clearly. Real Reach measures your 90-day engaged base as a percentage of total list size. CRR — Click Retention Rate — measures how many of those who clicked in one period return to click in the next. These numbers reveal what top-line list growth conceals: the audience you can actually reach is often far smaller than the database you think you own. The quantity of addresses rises while the quality of attention falls. The problem is not that people unsubscribed. It is that they remained subscribed while mentally leaving.
  3. Brands usually think of churn as an event. A customer stops buying. A subscriber lapses. An app user goes inactive. But the more damaging churn begins earlier and happens quietly. Best customers do not wake up one morning and decide to become dormant. They drift. They click less. They open selectively. Their relationship with the brand does not collapse in one moment — it erodes through neglect. That makes the Best-to-Rest transition continuous rather than episodic. The Rest segment is not a static bucket of inactive people. It is the destination where yesterday’s Best customers are constantly arriving. If the Rest is untreated, the Best is always leaking into it.
  4. Once a drifting customer stops engaging on owned channels, the brand loses confidence in its ability to reach them directly. That is when adtech steps in. The same person who used to open emails and buy organically is now targeted on Google and Meta. The brand pays to get back someone it already acquired once. That is the AdWaste loop. The most revealing metric here is REACQ%: what share of supposedly new conversions are actually lapsed customers being bought back through paid channels. Most brands do not measure this. They see revenue coming in and call it growth. But if a large share of that revenue is reacquired old business, the brand is not growing. It is paying a tax for attention lost earlier.
  5. Rising CAC is real, but it is not the root problem. It is the visible symptom of a deeper failure: attention loss. Lose attention, and you lose transactions later. Lose transactions, and you increase paid spend. Increase paid spend to recover the same customers, and your economics worsen each cycle. That is why acquisition cost should be seen as downstream. The true upstream variable is whether your customers continue to notice you voluntarily. This changes the strategic question entirely. Instead of asking “how do we lower CAC?”, the better question is: “why are customers leaving our attention field in the first place?” Solve that, and CAC pressure reduces naturally. Ignore it, and every quarter becomes a more expensive chase after customers who were once already yours.

2

Why the Rest Are Ignored (And Why That’s a Mistake)

  1. If the problem is attention decay, the obvious answer is: use owned channels. Why pay Google or Meta if you already have the customer’s email address, phone number, or app install? It sounds sensible. In practice, it fails almost immediately. The Rest do not behave like the Best. They have learnt indifference. A message arriving through an owned channel does not automatically mean attention has been recovered. In fact, the more irrelevant it feels, the more it reinforces the habit of ignoring the brand. A sender can own the rail and still not own the moment. The channel exists. The attention does not.
  2. There is also a structural trap. Sending at scale to disengaged users hurts the sender. When the Rest ignore emails consistently, domain reputation weakens and inbox placement deteriorates. So CRM teams make what feels like a rational decision: suppress the Rest, protect the domain, optimise the sends that still work. This is understandable, but it creates a compounding blind spot. The segment most in need of relationship rebuilding becomes the one least addressed. Low attention causes low messaging. Low messaging causes further drift. Eventually the customer reappears only when paid media finds them. A domain reputation problem becomes a business model problem.
  3. The deeper issue is categorical. Traditional CRM operates in two modes: Sell and Notify. Sell messages push products, offers, discounts, launches. Notify messages communicate information the brand needs the customer to have — order updates, policy changes, account alerts. Both modes are entirely brand-first. They assume the customer is ready to receive. A drifting customer is not ready. They are not in buying mode. They have nothing urgent to be notified about. Sending Sell and Notify messages to someone who has disengaged is not a retention strategy. It is spam with good intentions. The Rest do not need more campaigns. They need a new category of message.
  4. It is worth being precise about what Rest customers actually are. Many brands behave as if the Rest are lost causes — uninterested, churned, unreachable. But in most cases they are not hostile. They are disengaged. Hostility requires emotion. Disengagement is lower-energy. It is the absence of salience, not the presence of rejection. The customer may still like the brand. They may still buy if reminded at the right moment. They may still be open to a relationship. But the current messaging system gives them no reason to care. Hostile customers are expensive to win back. Disengaged customers are often recoverable — if the brand stops talking at them and starts creating something worth returning to.
  5. Here is the strategic reframe that changes everything. The Rest are not a failed Best segment. They are an unactivated media asset. The brand already has the reach infrastructure. It already has the identifiers. What it lacks is a message format and economic model that can turn this segment back into a living attention surface. Once you see the Rest this way, the problem changes shape. The question is no longer “how do we suppress the inactive base?” It becomes: “how do we reactivate this dormant attention without paying adtech to do it for us?” That is where the idea of Rest Media begins. What looks like a cold segment from a CRM perspective can become a new media surface from a strategic one.

3

NeoMails: The Third Type of Message

  1. If Sell and Notify are insufficient, the answer is not to improve them indefinitely. The answer is to add a third mode. Call it Relate. A Relate message is not designed to convert now or confirm something already done. Its job is to build continuity — to create a reason to return tomorrow, to make the brand noticeable between transactions, not just during them. This is the proposition behind NeoMails. They are relationship emails — not campaigns, not receipts, not lifecycle nudges disguised as content. They are a new class of message designed specifically for the Rest: drifting, dormant, low-attention customers who do not need more persuasion yet, but do need a reason to care again.
  2. For Relate to work, the message has to be constructed differently. It cannot depend on copy or design polish alone. It needs internal mechanics that create participation. That is where the APU — the Attention Processing Unit — comes in. The BrandBlock sits at the top of the email — the brand’s content, visible immediately on open. But it is the Magnet below it that earns the attention that makes the BrandBlock worth reading: a quiz, a prediction challenge, a game — something that gives the customer a reason to engage before any brand message appears. The Mu Ledger shows the customer their attention balance — what they have accumulated, what they can do with it. AMP technology enables in-place actions without leaving the inbox. Attention is captured at its peak, not lost in transit to a landing page.
  3. The most important pair inside NeoMails is Mu and the Magnet. The Magnet creates the action. Mu creates the memory. One without the other is incomplete. A Magnet without Mu is a one-off interaction — interesting once, forgotten by the following week. Mu without a Magnet has no engine of accumulation. Mu is not bought, not gifted, not tied to transaction volume. It accumulates through repeated participation. A customer engages with a Magnet, earns Mu, sees the balance rise — and now has a visible, compounding measure of attention continuity. The Magnet creates the moment. Mu turns that moment into a habit. Together they convert email attention into a loop.
  4. NeoMails are not just a message innovation. They are also an economic inversion. Conventional retention messaging is a cost: brands pay to send, whether or not customers engage. NeoMails introduce ActionAds — relevant, in-email action units from non-competing brands that fund the entire send. A fashion brand’s NeoMail might carry an ActionAd from a streaming service. A financial services brand’s might carry one from a travel company. These are not display ads. They are single-tap action units — subscribe, explore, save — that complete inside the email. When ActionAd revenue covers the send cost, the effective CPM drops to zero. The Relate message that re-engages a dormant customer costs the brand nothing to deliver.
  5. Mu creates a subtler signal that most martech cannot see. A rising Mu balance reflects consistent engagement. A falling Mu balance — declining earn rate, no daily returns — predicts attention decay before conventional metrics reveal it. Open rate is binary: the email was opened or it was not. Mu velocity is continuous: it measures the quality and consistency of engagement over time. A brand monitoring Mu balances across its Rest segment has an early warning system for drift that most platforms cannot provide. By the time open rate drops, the customer is already drifting. Mu balance drops first. Mu is not just a currency. It is a pulse.

4

WePredict: Giving Mu Somewhere to Go

  1. Every currency needs somewhere to go. If Mu can only be earned and never spent meaningfully, it degrades into the same fate as most neglected loyalty points: visible for a while, vaguely pleasant, and then forgotten. Progress without purpose loses force. This is the hole in most engagement systems — they create earn mechanics without credible burn. They give the customer something to collect but nothing interesting to do with it. WePredict solves that problem. It gives Mu a destination that is not discounting, not cashback, not another purchase-linked redemption mechanic. It turns Mu into stake — not in the financial sense, but in the behavioural and social sense. Without WePredict, Mu is a meter. With WePredict, Mu becomes fuel.
  2. The most powerful starting point is not the public platform. It is WePredict Private — prediction markets running inside closed groups: a cricket WhatsApp circle, a company Slack channel, a sports fan community. Markets are visible only to members. Outcomes create a social record of who called what and how accurately. This is the design insight that most play-money prediction markets have missed: the social consequence of being wrong in front of people who know you is real, even when money is not at stake. Monopoly money is forgotten by Tuesday. Reputation in front of colleagues is not. Mu deepens this because it is earned scarcity — something accumulated over time through daily attention, not handed out freely. That makes spending it feel consequential.
  3. The Predictor Score is the layer that makes WePredict serious rather than merely entertaining. It is a persistent, compounding record of forecasting accuracy — not a leaderboard that resets monthly, not a win-loss tally, but a score built on calibration: whether your expressed confidence matched your actual accuracy over time. It is closer in logic to a chess rating than a loyalty tier. A participant who has built a Predictor Score over eighteen months of cricket markets and office forecasting pools has something that cannot be bought, replicated, or shortcut. Time is the only input. Mu flows in and out. The Predictor Score compounds. Together they create something most engagement systems never achieve: an asset the participant actively wants to protect.
  4. The sequencing matters. WePredict Private comes before WePredict Public for a structural reason: Private solves the cold-start problem. The social group already exists. The social stakes already exist before the product arrives. Private creates immediate participants, social consequence, repeated rituals, and early data on how Mu and the Predictor Score behave together. Only once that layer is working does Public make sense as a second-order expansion. Public can then add broader discovery, wider competition, and larger leaderboards. But it works better when seeded from behaviour that is already alive. This is also a strategic sequencing point: Private creates demand for Mu before NeoMails is at full scale. People want to play. To play, they need Mu. To earn Mu, they need NeoMails. The loop starts forming.
  5. The relationship between Mu and the Predictor Score is the system in miniature. Mu is the economic bridge: earned in NeoMails, staked in WePredict, replenished through continued engagement. The Predictor Score is the reputational bridge: it turns repeated prediction into compounding identity. It does not move. It stays with the person. Once both are in place, a user is no longer simply opening messages or making guesses. They are building two assets simultaneously — a balance they can use and a reputation they can lose. That combination creates something most retention systems never achieve: a behaviour the customer wants to continue for reasons that are not purely transactional. They are in a social game with memory. That is when the system begins to become self-reinforcing.

5

The Flywheel: From Cost Centre to Profit Engine

  1. Put the pieces together and a flywheel emerges. NeoMails earn daily attention from Rest customers at zero marginal cost. Mu accumulates and creates a reason to return tomorrow. WePredict gives Mu a destination that is genuinely compelling — social, competitive, reputation-building. That destination creates demand for Mu. Demand for Mu creates demand for NeoMails. Demand for NeoMails deepens the inbox as an attention surface. A deeper attention surface commands better ActionAd rates. Better ActionAd rates fund larger Mu rewards. Larger rewards deepen WePredict engagement. This is not a feature set. It is a flywheel. And once a flywheel turns, it is progressively harder for a late arrival to stop.
  2. ActionAds and NeoNet close two loops at once — one economic, one structural. ActionAds fund the send cost — making ZeroCPM structurally possible, not just aspirationally possible. NeoNet creates a cooperative brand network where a customer who has drifted from one brand but is engaging in another brand’s NeoMails can be identified and recovered — without Google or Meta as the intermediary. A single ActionAd does three things: it creates revenue for the brand sending the NeoMail, acquires a new subscriber for the advertising brand, and opens a new Mu earn stream for the customer who tapped it. Three parties gain. No platform takes margin in the middle. The Rest are no longer just being retained. They are becoming a media and recovery surface.
  3. Something more significant happens when this system operates at scale. The email inbox stops being a broadcast channel and starts behaving like a platform. Today, most inboxes are passive archives of offers and updates. Brands enter episodically, make a request, and leave. But once NeoMails, Mu, and WePredict are connected, the inbox becomes a place where value is earned, behaviour is repeated, identity is reinforced, and individual engagement connects outward to a social game. That is a very different role from campaign distribution. The inbox becomes not just where the brand speaks, but where the customer acts. And action, repeated often enough, is what turns a channel into a platform.
  4. If the system works, the gains are not one-sided. Brands recover dormant customers without paying Google or Meta, turning a reacquisition cost into a zero-cost retention mechanism. Customers receive daily value — games, prediction markets, reputation — in exchange for attention, rather than being tracked and retargeted without consent. Advertisers reach a verified, first-party, high-intent audience with in-place action units that outperform display advertising by a meaningful margin. And the ESP enabler — the platform that makes all of this possible — earns a share of a revenue model it helped architect. No zero-sum extraction. Value created at every node. A one-sided gain produces a pilot. A four-way gain produces a new category.
  5. The Rest were never truly gone. They were simply outside the brand’s active attention field. The absence of a Relate layer made them look unreachable. The cost of reactivation made them look uneconomic. The default move was to reacquire them later through paid channels and call it growth. NeoMails and WePredict together create an alternative — a system in which attention can be rebuilt, participation rewarded, reputation earned, and the economics of relationship inverted. Never Lose Customers: because drift is interrupted earlier. Never Pay Twice: because reacquisition dependence reduces. And what was treated as a cost centre can begin, over time, to look like a profit engine. The Rest were not a dead segment. They were an ignored one. Rest Media is what happens when that ignored segment becomes active attention again.

Thinks 1917

WSJ: “Instead of paying humans to join focus groups and complete surveys, Aaru uses thousands of AI agents, or bots, to simulate human responses. It feeds demographic and psychographic information into its models to create human profiles that match clients’ needs, and the results those bots spit out are being used for product development, pricing, identifying new customers and political polling.”

Arnold Kling: “The human should not have to learn how to prompt the AI. The AI should learn how to prompt the human.”

TheMaxSource: “Eighty one percent of consumers need to trust a brand before they’ll consider buying from it. Not interested. Not aware. Trust first, transaction later. The math gets sharper when you look at what drives that trust. User generated content gets 28% higher engagement than branded content. Videos about your product from actual customers get viewed ten times more than your official ads on YouTube. Translation: people trust other people talking about your stuff more than they trust you talking about your stuff.”

Sandeep Goyal: “Marketing has survived print-to-broadcast, broadcast-to-digital, desktop-to-mobile. Each shift created winners and casualties. This one goes further. It does not merely change the channel. It changes the decision-maker. Yes, AI is upending marketing. But the real upheaval is this: The future customer may not blink. May not feel. May not be persuaded by nostalgia. And yet, paradoxically, the brands that will thrive are those that double down on the one thing machines cannot manufacture — meaning. AI isn’t just upending marketing: It’s rewriting who the customer is.”

Life Notes #77: Six Years of This Blog

As another April dawns, I mark another year of daily blogging — six now, since I restarted in April 2020. I reflected on the first five in my post last year. These words still ring true: “This five-year journey is the chronicle of my intellectual evolution, a testament to the power of consistent reflection, and a sanctuary where ideas find their voice. My blog has become a living archive of my growth as an entrepreneur, thinker, and human being.”

The sixth year has brought one change significant enough to deserve its own reflection: I now have a co-author. AI — in the form of Claude and ChatGPT — has become a genuine thinking partner, what I’ve come to think of as a cointelligence. This is different from using a tool. A tool executes. A cointelligence pushes back, opens new doors, and surprises you with where a conversation goes.

My process has evolved accordingly. I arrive with a seed — an idea, a question, a half-formed intuition — and a handful of initial pointers. The AIs help me build on these, and in doing so, the thinking fans out in multiple directions I hadn’t anticipated. A case in point is the recent series I wrote on WePredict. What began as a single essay kept multiplying: Mu as the bridge between NeoMails and WePredict, private prediction markets, a third way beyond real money and play money, the Predictor Score, with more to come. Each essay opened a new avenue. I was not just writing — I was discovering.

This is perhaps the most honest way to describe what has changed: I find myself learning from the expositions I conduct with the AIs, more than from the act of writing alone. The blog has always been, for me, part of a read-think-write feedback cycle. The AIs have turbocharged the think leg of that cycle.

A recent addition has been the dramatic improvement in imaging tools on Gemini and the visualisation capabilities of Claude. For a blog that has always been text-first, these open a new dimension — the ability to make ideas visible, not just readable. It adds a richness I had not anticipated when I restarted six years ago.

The ritual itself has deepened. Weekend mornings remain sacred — just me, my desktop, and the AIs, lost in a world where imagination runs free and new worlds take shape in words. As I wrote last year: “Weekends have evolved into sacred spaces of solitude. My (still) makeshift home office has become a cocoon where writing, thinking, and reading flow together in a meditative communion.” That quality of absorption — the losing-of-oneself — is what I treasure most. No numerical vanity metrics to worry about. No one to please but the ideas themselves.

My blogging journey began in early 2000. The blog was, from the very first post, a mirror for my thoughts. Six years into this second chapter, that mirror is sharper than ever — and for the first time, it has a reflection I did not put there alone. That, I think, is the most interesting thing that has happened to this blog in year six.

This is one part of my life’s routine I would not want to give up for anything.

Thinks 1916

Dr. Barbara Sturm: “Don’t start a business just to start a business. The biggest motivation should be that you’re totally in love and obsessed with a product you’ve created.”

Rajesh Shukla: “As millions of [Indian] households ascend the income ladder, they will not merely spend more, but spend differently. The key to anticipating India’s next consumption wave lies not in the slope of income growth, but in the thresholds it crosses.”

Vasant Dhar: “It is undeniable that modern-day AI machines have achieved remarkable fluency with language. They seem to understand what we tell them, regardless of the words we choose to express ourselves. This enables the same conversational fluidity that we have with humans. However, we shouldn’t lose sight of the fact that LLMs are not designed to be truthful, but to ensure that the narrative “makes sense” in any context.”

Bloomberg: “I’ve long thought that calling Adam Smith the father of economics seriously understates his significance. In some ways he was indeed the first economist, and The Wealth of Nations, published 250 years ago…, was indeed the discipline’s seminal text. But his ambitions and insights extended so much further than the dismal science as now conceived. In many ways, his modern followers, intent on narrowing and thereby desiccating the field, have let him down. The breadth of his thinking is hard for modern readers to grasp because his prose was ornately opaque even by the standards of his time. Scholars argue about what he really meant and didn’t mean – a literature that doesn’t rival the one dedicated to Karl Marx (who was much influenced by Smith) because nothing could, but which trundles on and shows no sign of exhausting the source material. Meantime, for non- specialists, Smith is simply an avatar of laissez-faire capitalism. What a pity his legacy has come to this. The right way to mark the anniversary is to celebrate not only the works but also the remarkable intellectual temperament that produced them.”

Predictor Score: The Stake in WePredict Isn’t Money. It’s Reputation.

Published March 31, 2026

1

The Hollow Game Problem

A play-money prediction market is easy to dismiss. It sounds light. Disposable. A clever mechanic without real consequence. Markets rise and fall, people guess, a leaderboard flashes, and then everyone moves on. That is the graveyard where most play-money systems end up: interesting for a week, noisy for a month, forgotten soon after.

Picture a WhatsApp group — colleagues, friends, cricket obsessives — running a prediction market for an upcoming Test match. Someone calls a 90% chance of India winning. India loses. The group laughs for an hour. By Tuesday, no one remembers, and no one’s behaviour has changed. The next market begins exactly as the last one ended: with careless confidence and no consequence.

That, in compressed form, is why play-money prediction markets have been tried many times and have mostly failed. The failure is not mechanical. The odds engines, the market formats, the user interfaces — those are solvable engineering problems. The failure is structural. Without real stakes, prediction becomes noise.

When losing feels like nothing, people do not think carefully before staking. They guess. They stake on feelings rather than evidence. They claim 90% confidence when they mean 60%, because bravado costs nothing. The market floods with low-signal predictions. Other participants cannot distinguish the genuinely calibrated forecaster from the lucky guesser. Over weeks, the platform loses its signal value — the one thing that made it interesting. And once signal is gone, there is no reason to return.

The pattern is consistent enough to be called a law: a prediction market without real stakes is not a market. It is a game, and games rely on novelty. Novelty fades. They simulate the form of a market without creating the consequence of one.

The instinctive solution is money. Real money creates real stakes — losing ₹500 on a wrong call focuses the mind. But money also creates legal complexity, regulatory exposure, and the risk of shifting from a forecasting platform into a gambling product. In India specifically, real-money gaming is a legally fraught category at best and actively hostile at worst. The financial route is not the answer.

The question WePredict is built around is a harder one: can real stakes exist without real money? The answer is yes — but only if the stakes are genuinely felt. Reputation is one such stake. Social standing is another. A persistent track record, visible to everyone who knows you, is a third. These are not theoretical motivators. They are the reason professionals work carefully on problems no one is paying them to solve, the reason academics write papers that will be read by twenty people, the reason a chess player at a local club cares deeply about a rating point that has no cash value whatsoever.

The real stakes are not financial. They are reputational. And reputation is only as powerful as the record that supports it.

 WePredict is built on Mu — an attention currency earned through NeoMails, a daily interactive email, and spent in the prediction marketplace. Mu creates an initial stake: spending it carelessly depletes a balance that took real daily engagement to accumulate. But Mu alone does not create the deeper consequence that changes how people predict. It does not create a record. It does not follow anyone. When Mu is gone it is gone, and the next prediction begins without memory.

For WePredict to escape that failure pattern, it needs something that does follow people. Something that compounds. Something that the serious participant protects and the careless participant damages. That something is the Predictor Score.

2

The Score That Follows You

A chess rating is not a trophy. It is not awarded at a ceremony or handed out for participation. It is a number that compresses an entire history of play — wins, losses, the quality of opponents, the consistency of performance under pressure — into a single figure that updates with every game. A player who earns a high rating cannot fake it. The rating is the evidence. It took time to build, and it can be damaged by a single careless period of play.

A credit score works differently. No one chooses to play it, and it is shaped partly by institutional behaviour rather than personal performance alone. But it does something the chess rating does not: it travels beyond the person. Banks consult it before lending. Landlords check it before leasing. It shapes how others treat you, not just how you regard your own performance. The Predictor Score is closer to the chess rating in how it is built — through performance, over time, by choice — but closer to the credit score in what it eventually does: it becomes the record that others consult before deciding how much to trust what you say.

The Predictor Score works on this dual logic. It is a persistent, compounding record of forecasting accuracy — not a badge given at a moment, not a leaderboard that resets quarterly, but a number that follows a person across every market they enter and every prediction they make. A Score of 1,400 represents a different person than a Score of 400 — not because of a single correct prediction, but because of the accumulated pattern of how that person thinks under uncertainty.

Understanding what the Score measures requires separating accuracy from calibration, and most people conflate the two. Accuracy is whether the prediction was right. Calibration is whether the expressed confidence matched the actual probability. A person who says ‘90% confident’ and is right 90% of the time is perfectly calibrated. A person who says ‘90% confident’ and is right 55% of the time is significantly miscalibrated — not just occasionally wrong, but systematically overconfident.

The Predictor Score rewards calibration, not just accuracy. A confident wrong answer hurts the Score more than an uncertain wrong answer. Saying ‘65% likely’ when one genuinely means 65% is rewarded, even when the outcome goes the other way. Claiming ‘95%’ to appear decisive and then being wrong is penalised severely — Part 5 works through the exact numbers, and the penalty for overclaiming is not proportional to the error. It is catastrophic. This distinction creates an incentive for intellectual honesty. The Score rewards the person who says ‘I don’t know, but here is my best estimate’ over the person who performs certainty they do not have.

Mu tells you how much attention you have earned. Predictor Score tells you how well you use it.

 The second property is compounding. Two years of predictions across hundreds of markets is a fundamentally different record than two weeks of predictions across ten. The Score becomes harder to fake and harder to replicate as it accumulates. A new entrant to WePredict, however skilled, cannot compress eighteen months of consistent, calibrated forecasting into a week of play. Time is built into the architecture.

The third property is consistency across contexts. A Predictor Score does not reset when someone moves from a private group to a public market, or from cricket predictions to business ones. It is one continuous record. The same person who earns credibility forecasting match results in a team Slack group carries that record into public WePredict. The Score travels. This portability gives it weight across both modes: WePredict Private, which runs prediction markets within a closed group visible only to members, and WePredict Public, which opens markets to the full platform. The Score is the single thread connecting both.

Return to the WhatsApp group from Part 1. The same people, the same cricket match, the same wrong call from the overconfident member. But now there is a Predictor Score attached to every name in the group. The wrong call is not forgotten on Tuesday. It is recorded in a Score that everyone can see. The overconfident member watches their number fall. The quieter member who said ‘60%’ — uncertain, honest — watches theirs hold. Over weeks, the group develops a memory it did not have before. Patterns emerge about who is reliable and who performs confidence they do not possess. The Score did not change the people. It made the truth visible.

How the Score is computed

A reputation system only works if people believe the number is real. That depends on how it is computed — and whether it can be gamed.

The Predictor Score is not a win-loss record. A simple right/wrong count would reward lucky guessers and penalise careful forecasters who honestly expressed uncertainty. Instead, the Score measures the accuracy of the expressed probability, not merely the direction of the call. If a participant says 70% on an outcome that happens, they score better than someone who also got it right but said 95%. The overclaimer was rewarded by luck. The 70% call was rewarded by honesty. Equally, if the outcome does not happen, the person who said 30% — genuinely uncertain — is penalised far less than the person who said 95% and was catastrophically wrong. Part 5 walks through the mathematics in full.

The Score also weights markets by difficulty. A market where the crowd consensus is 90% on one side — a heavily favoured team, an obvious outcome — contributes almost nothing to anyone’s Score. If the answer was obvious, predicting it correctly demonstrates no judgement. The Score points that matter come from contested markets: uncertain outcomes, genuine dispersion of opinion, questions where the crowd is genuinely split. Easy markets add almost nothing. Difficult markets are where reputations are built.

Finally, the Score is designed to become more stable as it grows. An early Score can move quickly because the sample is small. A mature Score — built over two years of predictions — moves more slowly, because it represents a long record that a single week cannot fairly overturn. An impression formed after one conversation is fragile. A reputation earned over years requires sustained evidence to shift.

3

Two Stories

Story One: The Slack Team

A marketing team at a mid-size brand has been running WePredict Private — prediction markets visible only to their group — in their company Slack for six months. The markets are specific to their world: will this campaign beat last week’s open rate? Will the new homepage variant outperform the control? Will the product launch hit the Q3 target?

Before the Predictor Score existed, these questions had a predictable dynamic. The head of growth dominated the pre-launch conversation. His predictions carried the room not because they were consistently right, but because they were delivered with force. He regularly claimed 90% confidence. He was occasionally correct and frequently wrong. The junior analyst on the team — quieter, more careful — offered 60–65% estimates with reasoning attached. She was overridden most of the time. Her uncertainty was read as lack of conviction.

Six months of Predictor Scores changed the conversation completely. The data told a story no one had articulated before. The junior analyst had the highest Score on the team. Her 60–65% calls were landing at the rate she predicted. She was not uncertain — she was honest. The head of growth’s Score was mediocre. His 90% calls were right about 55% of the time — a gap that, in a proper scoring rule, is severely penalised. He was not confident. He was miscalibrated.

The team began checking Scores before pre-launch reviews. Not formally — no one announced a policy change. But the Score was visible in every Slack thread, and visible things change behaviour. The loudest voice in the room was no longer automatically the most trusted one. The analyst’s estimates started shaping decisions. The head of growth began hedging his confidence calls.

The Predictor Score did not punish the HiPPO. It simply made the truth visible. And once the truth is visible, it is very difficult to unsee.

Story Two: The Cricket Fan

 A 28-year-old in Mumbai has been participating in WePredict Public — the open platform, visible to all — for eighteen months. He follows cricket obsessively and has made over 340 predictions across IPL matches, Test series, and bilateral ODI tournaments. His Predictor Score has climbed steadily — not because he wins every market, but because his calibration is unusually honest. He says 70% when he means 70%. He says 55% when he is genuinely uncertain, rather than manufacturing confidence to appear decisive.

His Score is now visible in the WePredict leaderboards for his Circles — the named prediction groups he belongs to. Other members check his Score before deciding how to weight his calls in markets they are less certain about. He has become known as a reliable predictor. Not lucky. Not loud. Reliable. That reputation took eighteen months to build. It cannot be bought by someone joining WePredict today and predicting aggressively for two weeks.

The interesting detail is what he protects most. Not his Mu balance — though he earns Mu consistently through daily NeoMails engagement. The Mu comes and goes as he stakes it in markets. What he thinks about carefully before entering a market is the impact on his Score. A careless stake — entering a market he knows nothing about simply because he has Mu to spend — will damage a record he has spent eighteen months building. The consequence is not financial. It is reputational. And that turns out to be a more powerful motivator than money for a person who already has a Score worth protecting.

Mu is what flows through the system, but Predictor Score is what gives the flow meaning.

The Score is the real stake. Mu is the token. Reputation is the game.

The group is the room. The Predictor Score is the passport.

4

Why This Is the Moat

A brand that begins building NeoMails and WePredict today will, in two years, possess something that cannot be bought: a body of Predictor Scores attached to real people, built over hundreds of real markets, across genuine uncertainty. A competitor arriving later with more money and better technology cannot replicate this. The Scores are the accumulated result of time, behaviour, and consistency. None of those can be shortcut by spending more.

This is the difference between a technological moat and a behavioural moat. A technological moat can be matched — a competitor with sufficient resources can build equivalent infrastructure. A behavioural moat cannot, because the behaviour that produced it cannot be manufactured. Two years of daily prediction, calibrated honestly across varied markets, leaves a record that is both unique to the person and impossible to fast-track. The Predictor Score is behavioural in this precise sense. Part 6 sets out the anti-gaming architecture that ensures this record cannot be manufactured by other means.

A Predictor Score built carefully over years may eventually become one of the most honest signals available about the quality of a person’s judgement — more honest than a CV, more consistent than an interview, more durable than a testimonial. What brands do with that signal is still being written.

 Three things change for the Atrium system when the Predictor Score exists.

First, Mu becomes meaningful in a different register. Without the Score, Mu is a genuinely interesting engagement mechanic — a streak reward, a gamification layer, a currency that makes daily email engagement feel like progress. With the Score, Mu is the currency used in a system that produces something real: a reputation record. That changes the psychology of earning and spending it entirely. People protect their Mu not because they want the balance to be high, but because spending it carelessly will damage the record they are building. The Score transforms Mu from a points layer into a stake in a reputation game.

Second, WePredict Private becomes sticky in a way that has nothing to do with the product mechanics. Groups develop persistent hierarchies of trusted predictors over weeks and months. The ranking is visible, persistent, and social — it updates in real time and everyone in the group can see it. That social memory is what makes Private groups return. It is not the cricket markets, though those help. It is the fact that leaving the group means losing the record. And losing the record means losing the standing. No competing platform can offer a better market format and import the same social consequence.

Third, WePredict Public becomes credible rather than merely entertaining. Public prediction markets are only valuable if the participants are genuinely trying to be accurate — if the aggregate of predictions reflects real information rather than noise. The Predictor Score creates that incentive not through financial rewards but through reputational ones. A public leaderboard of Predictor Scores is a credibility system: it separates the calibrated from the loud and makes that separation visible over time.

The relationship between Mu and the Score is worth stating clearly one final time. Mu flows — earned in NeoMails, spent in markets, replenished through continued engagement. The Score compounds — built through consistent, calibrated forecasting, damaged by careless staking, impossible to shortcut. A high Mu balance means consistent attention. A high Predictor Score means consistent judgement. The best participants in the WePredict ecosystem will have both, and the two together are what make the system self-reinforcing.

The real stake in WePredict is not money. It is reputation. And once reputation begins to compound, a game becomes a system.

5

The Maths of Calibration

The sections that follow are for readers who want the mechanics.

The Predictor Score is built on a principle that most gamified systems ignore: it is not enough to be right. What matters is how confident you were, and whether that confidence was justified.

The technical foundation is a proper scoring rule derived from the Brier score family — a mathematical function with one defining property: the only way to maximise your expected score over time is to report your genuine belief. Expressing more confidence than you actually have, or less, will on average hurt your score rather than help it. The system creates a structural incentive for honesty about uncertainty.

Prediction Quality

For any resolved market, compute a Prediction Quality score:

Where p is the predicted probability (a decimal between 0 and 1) and o is the outcome (1 if the event happened, 0 if it did not). Three examples show why calibration beats boldness:

 

Prediction Outcome PQ Score
Said 70% (0.70), it happened o = 1 1 − (0.70 − 1)² = 0.91
Said 95% (0.95), it happened o = 1 1 − (0.95 − 1)² = 0.9975
Said 95% (0.95), it did NOT happen o = 0 1 − (0.95 − 0)² = 0.0975  ← catastrophic

The third row is the one to focus on. Saying 95% and being wrong produces a PQ of 0.0975 — catastrophically low. Saying 70% and being wrong produces 1 − (0.70 − 0)² = 0.51 — more than five times better, on the same outcome. The penalty for overclaiming is not proportional to the error. It is severe by design.

Equally important: saying 70% and being right (PQ = 0.91) scores notably less than saying 95% and being right (PQ = 0.9975). The system is not punishing confidence. It is punishing unjustified confidence. A participant who genuinely believes 95% and says 95% is rewarded when right. A participant who does not believe 95% but says it anyway to sound decisive will, over time, be wrong at rates that destroy their score.

Difficulty weighting — the anti-obvious mechanism

Raw PQ scores are multiplied by a difficulty weight:

Where c is the leave-one-out crowd consensus — the average of all other participants’ predictions, excluding the focal participant’s own prediction. Using leave-one-out prevents the circularity of a participant influencing the difficulty of their own market. This formula (the variance of a Bernoulli distribution) peaks at 1.0 when consensus is exactly 50/50 and collapses toward zero as consensus becomes overwhelming:

Crowd Consensus (c) Difficulty Weight (D)
50% (genuinely split) 1.00
70% 0.84
85% 0.51
90% 0.36
95% 0.19
99% (monsoon market) 0.04

The weighted contribution of any prediction is therefore:

A 99%-consensus market where someone stakes confidently and wins scores: 0.04 × 0.9975 ≈ 0.04. Negligible. The same participant, in a genuinely uncertain market (c = 0.52) where they say 65% and get it right, scores: 0.99 × 0.88 ≈ 0.87. More than twenty times the reward for honest prediction under real uncertainty.

Score aggregation and time decay

The overall Predictor Score is a time-weighted, difficulty-weighted average of Prediction Quality across all eligible predictions:

The denominator includes both time weight and difficulty weight — not time weight alone. This means low-difficulty markets contribute near-zero to both numerator and denominator, so they genuinely add almost nothing to the Score rather than merely diluting it. Easy markets are not just penalised; they are structurally inert.

T is a time weight that applies a gentle quarterly decay (λ ≈ 0.90): predictions from eight quarters ago carry roughly half the weight of recent ones. A strong long-term record cannot be wiped by a bad month, but the Score remains a living reflection of current form rather than a monument to past performance.

The raw Score (ranging 0 to 1) is normalised to a display scale of 0 to 2,000 — similar to an ELO chess rating. A Score of 1,400 is legible and comparable across participants in a way that ‘0.71’ is not.

Domain sub-scores

Underneath the headline Score sit domain sub-scores: sport, business, politics, entertainment, and others as the platform grows. A participant unusually well-calibrated on IPL outcomes is not necessarily equally strong on quarterly sales forecasts. The headline number gives simplicity. Domain scores give fidelity.

Domain sub-scores use the same formula applied to market subsets. The overall Score is a weighted average of domain sub-scores, weighted by effective information — the sum of T × D within each domain, not the raw count of predictions. This means 100 trivial predictions in one domain do not dominate 20 genuinely difficult predictions in another. Depth of calibration matters; volume alone does not.

6

The Anti-Gaming Architecture

A scoring system worth building is a scoring system worth attacking. The Predictor Score is designed with the assumption that some participants will try to game it from day one, and that the response cannot rely on human moderation at scale. The defences have to be structural.

The monsoon market problem

The simplest gaming attempt: a participant creates a Private WePredict market — ‘Will it rain in Mumbai tomorrow?’ during peak monsoon season — invites cooperating accounts, stakes 99%, resolves it, and repeats a hundred times.

Even after a hundred such orchestrated markets, the impact on the display Score would be negligible — the difficulty weighting ensures near-zero contribution to both numerator and denominator of the weighted average.

The effort is economically irrational before any further safeguards apply. But difficulty weighting alone is not sufficient, because a determined participant might seek genuinely uncertain markets and manipulate resolution. Five structural gates close the remaining gaps.

The five gates

Gate 1 — Entropy floor — A market only becomes Score-eligible if the entropy of participant predictions at close exceeds a minimum threshold (H > 0.5 bits, computed as H = −c × log₂c − (1−c) × log₂(1−c)). At 90% consensus, H ≈ 0.47 bits — below threshold, not counted. At 75% consensus, H ≈ 0.81 bits — eligible. This gate is computed automatically from participant behaviour, not set by the market creator.

Gate 2 — Minimum distinct participants — For a market to update the global Predictor Score, at least ten distinct accounts must have predicted. A collusive group of three cannot generate meaningful Score movement for each other. This gate creates an important distinction: small groups still generate a local group Score visible within their circle — members see each other’s relative rankings — but only markets passing this gate affect the global Score that travels with a participant everywhere.

Gate 3 — Creator exclusion — The account that creates a Private market earns zero Score from it, regardless of outcome. Creator exclusion is absolute for privately-created markets. On platform-curated public markets — where resolution is external and no participant controls closure or adjudication — creators may participate on equal terms. This preserves the incentive to create good markets without creating the incentive to manufacture easy ones.

Gate 4 — Maturity multiplier — New accounts begin with a suppressed Score weight that rises as the participant accumulates eligible predictions across distinct domains:

After 10 eligible predictions, the multiplier is 0.18. After 50, it reaches 0.63. After 150, it reaches 0.95. A freshly created account cannot sprint to a high Score through a burst of activity. The record must be built over time across varied domains.

Gate 5 — Single-market cap — No individual market can move the overall Score by more than a set ceiling, regardless of difficulty or expressed confidence. The Score must reflect a pattern, not a moment. One spectacular call cannot inflate an otherwise weak record.

What passes through the gates

The gates do not reduce the volume of Score-eligible predictions for good-faith participants. A genuinely uncertain market — closely contested, widely participated, externally resolved — passes every gate and contributes fully. A difficult call, in a market the crowd found hard, in a domain with prior predictions, on a mature account, is exactly what the Score is designed to reward. The gates make gaming economically irrational. The effort required to manufacture a high Score through artificial means substantially exceeds the effort required to forecast honestly.

Cluster detection

One additional mechanism operates at the network level rather than the market level. If a cluster of accounts — identifiable by graph proximity: same markets, correlated predictions, common creators — shows statistically anomalous mutual agreement, their inter-cluster predictions are down-weighted automatically. Detection uses standard anomaly-detection techniques on correlated prediction patterns and resolution behaviour are sufficient to identify coordinated behaviour without requiring certainty. A mild anomaly triggers mild down-weighting. A severe anomaly triggers near-zero weighting. The system does not need to prove fraud. It needs only to ensure that genuine uncertainty, not coordinated certainty, drives Score movement.

The result

A participant who spends six months gaming the Predictor Score will accumulate a weak Score — the gates, the difficulty weighting, and the maturity multiplier collectively ensure this. A participant who spends six months forecasting honestly across varied uncertain markets — getting some right, some wrong, always reporting genuine confidence — will accumulate a Score that is both higher and more widely trusted. The gap between the two is legible and widens every month, because the gamed Score cannot compound while the honest one can.

The Score does not need to be ungameable. It needs to make gaming less rewarding than honest forecasting. It does — and by a margin large enough to matter.

The architecture serves the promise

The mathematics in Part 5 and the safeguards in Part 6 exist for one reason: to ensure that the reputation the Predictor Score produces is real. A Score that can be manufactured is not a reputation system. It is a leaderboard — and leaderboards are precisely the problem WePredict was built to escape.

The Predictor Score is the foundation on which everything else in WePredict rests. Mu gives people something to stake. The Score gives them something worth protecting. Together, they create the consequence that transforms a game into a system — and a system into a moat.

**

Here is a PDF in case some of the graphics are not clear.