How is this calculated?
This page is for the curious — the technical details behind the numbers on the rest of the site. The short version: we smooth win rates toward the community average so investigators with very few games don't show up at the top or bottom by accident, and we show honest error ranges around those rates. Cells with fewer than 5 games are hidden entirely, and the fewer games behind a cell that survives, the harder the prior pulls it toward the average. If you want the formulas, read on.
Ancient One difficulty
We compute the loss rate (defeats divided by total games) for each Ancient One that has at least 5 games logged. The error bars are 95% Wilson score intervals — well-behaved on small samples, unlike the naïve normal approximation. The list is sorted hardest (highest loss rate) first.
Investigator × Ancient One
For each (investigator, Ancient One) pair, the raw win rate is just wins / games. The shrunk win rate uses a Beta(α, β) prior centered on the global win rate, with prior strength of 10 — i.e. each cell is pulled toward the mean as if 10 extra games at the global average had been added. Cells with fewer than 5 games are omitted entirely.
Modern-adjusted scores (the “Modern” toggle)
Many of the comparison charts — the matchup heatmap, the tier list, Ancient One difficulty, the synergy forests, the co-play network, and the investigator safety / consistency / rumor charts — carry an All-time ↔ Modern toggle. Here is what “Modern” does and why it exists.
The problem. A pooled win rate mixes together games played under very different rulesets. Early games were often the base box only — no Focus action, no “advance the current Mystery” effect — and are much harder; modern games usually have one or more big-box expansions on the table and are easier. Because of release dates, the original twelve investigators absorbed the brutal early-edition games, while expansion investigators were almost only ever played under the friendlier modern ruleset (a character can only be fielded once its box has shipped). So the pooled table compares them on unequal footing: the core twelve look worse than they are, expansion Ancient Ones look easier than they are, and pair “synergy” between core-and-newer investigators looks better than between two core ones — purely as an artifact of which edition the games were played under.
The fix — a reference condition. The Modern score recomputes each number on the same slice of games for everyone: those that are (1) EASY — at least one non-Forsaken-Lore expansion on the table — and (2) recent — played in 2018 or later. That is “modern expansion-table play.” Restricting to it removes the edition confound directly — the comparison is simply the subset of games where everyone is judged under comparable conditions, with no extrapolation to combinations nobody actually played (the standardization below only reweights games that exist). In the current data almost every cell still has enough reference games to report; very thin ones are withheld.
Standardization. Even inside that slice, newer investigators are played with more boxes on the table on average, and more boxes is a little easier. For the win-rate rankings (matchup, tier list, Ancient One difficulty, network nodes) we therefore standardize: we reweight each entity’s reference games to a common distribution of expansion counts, so the number reflects the same table mix for everyone. Each chart keeps the same shrinkage convention as its all-time counterpart — the matchup, tier-list and network rates keep the Beta prior, while Ancient One difficulty stays a plain unshrunk rate — just computed on the reference slice; confidence ranges come from a bootstrap that resamples the reference games.
Why the ratio charts go flat. Synergy and co-play lift is a ratio — a pair’s win rate together divided by apart — and the edition confound hits the two sides unevenly, which is exactly why pooled core-and-newer pairs sit at the top. Under the Modern toggle, restricting both sides to the reference condition makes most of that apparent chemistry evaporate: lifts collapse toward 1.0×. That is the honest result — once you compare within the same ruleset, very little measurable “teamwork” survives.
What it is, and isn’t. The Modern score is a descriptive, standardized conditional rate — “how this tends to go under modern expansion-table conditions” — not a causal power ranking. Selection into those games isn’t random (stronger players may own more expansions; investigator choice correlates with player type; a team’s outcome is credited to everyone at the table), so read it as a fairer comparison rather than a verdict, and expect a few points of residual that this adjustment can’t separate from genuine skill. All-time stays available on every toggle because it is the real historical record — what actually happened in the logged games. Where an entity has fewer than ten reference-condition games, the Modern number is withheld rather than guessed.
Shrinkage, visually
The leaderboards and tier list use shrunk win rates, computed against a community-mean prior with strength 10. The plot below shows what that actually does to each investigator's number. Investigators with thousands of games barely move; those with only the 30-game floor get yanked toward the middle.
Shrinkage in action
Each dot is an investigator. The diagonal would mean 'no shrinkage'; the horizontal line at the community mean shows the pull. Small-sample investigators (smaller dots) get yanked toward the middle.
Calibration check
A good shrunk estimate should match observed reality in aggregate: if we bucket cells by their predicted (shrunk) win rate, the observed (raw) average inside each bucket should fall close to the diagonal. Systematic deviation would indicate the prior is biased.
Calibration of shrunk win rates
Within each bucket of predicted (shrunk) win rate, we plot the actual observed win rate. Points on the diagonal = well-calibrated. Systematic deviation = the model over- or under-predicts.
Doom-track distribution
For each Ancient One we build a histogram of the final doom-track value across all games where it was reported. The ridgeline view normalizes each AO's histogram so the densities are comparable across rows. Bimodal shapes (mass near 0 and near 15) indicate a "swingy" AO where games end decisively in either direction; smooth unimodal shapes indicate predictable pacing.
Team size
For each (Ancient One, team size) pair we take the win rate over games with that many investigators at the table, keeping only pairs with at least 5 games. The error ranges are 95% Wilson score intervals, the same well-behaved small-sample interval used for Ancient One difficulty. We never extrapolate to team sizes nobody has actually logged.
Co-play network
Nodes are investigators with at least 10 logged games; node size scales with games played and node color compares their win rate to the community average. An edge is drawn between two investigators only if they have shared a table at least 5 times. The edge's lift is their win rate together divided by the average of each one's own overall win rate (across every game they appear in) — above 1 means the pair does better together than their individual rates would suggest, below 1 means worse. Edges close to 1 (no real signal) are dropped so the graph shows only notable chemistry.
Rumors
Rumors are recorded per game, not per investigator: when a game logs a passed or failed rumor, we credit that outcome to every investigator who was at the table. So a "rumor success rate" for an investigator really measures "how often rumors were passed in games featuring them," not how often that specific character solved one. We only report investigators with at least 5 rumors logged (counting every rumor in their games, passed or failed), with 95% Wilson intervals. Read these as table-level context, not individual skill.
Preludes
A prelude's effect is the change in win rate between games that used it and games that did not. To avoid crediting a prelude for an easy Ancient One or a fat expansion set, we compare like-with-like: games are stratified by (Ancient One, exact set of expansions), the delta is computed within each stratum and then pooled, and we require at least 3 games on each side of the comparison. The error ranges are 95% bootstrap intervals; where the interval crosses zero, the effect can't be told apart from noise.
Trends
The Trends page measures popularity — what the community plays, not what wins. Games are bucketed by month and each line is a 3-month rolling average, since recent volume (~55–105 games/month) is too thin for weekly detail. "Share" means different things by dimension: Ancient Ones are one per game, so their monthly shares sum to 100%; investigators and expansions are multivalued (a whole team, several boxes), so we report an appearance rate — the fraction of that month's games featuring the value — which does not sum to 100%. Game length and score are continuous, so we trend their monthly median.
"Rising & Falling" compares the most recent six months with the six before them, ranked by the change in share (percentage points), tie-broken by relative change. To keep the list honest, an entity must clear at least 10 games in both windows to qualify — a one-off surge from two games to six can't reach the board. The windows are anchored to the latest month with submitted games, not today's date, so a reporting lag never opens an empty "recent" window. Score is reported in fewer than half of all games; its trend is flagged as a rough signal.
Source
The underlying spreadsheet is maintained by the Eldritch Horror community. We fetch the raw submissions tab once a day, normalize, and rebuild.