{"slug":"statistician","title":"Statistician","metadata":{"title":"Statistician","slug":"statistician","aliases":["Applied Statistician","Biostatistician","Quantitative Analyst"],"category":"Science","tags":["statistics","inference","probability","data-analysis","experimental-design"],"difficulty":"advanced","summary":"Turns noisy, imperfect data into calibrated belief — estimates with honest uncertainty — by reasoning about how the data were generated and how an analysis could be fooling itself.","contributors":["soul-atlas"],"last_reviewed":null,"provenance":"ai-generated","created":"2026-06-26","updated":"2026-06-26","related":[{"slug":"data-scientist","type":"adjacent","note":"applies the same methods at scale, leaning toward prediction over inference"},{"slug":"mathematician","type":"prerequisite","note":"supplies the probability theory statistics rests on"},{"slug":"actuary","type":"specialization","note":"statistician of risk in insurance and finance, bound by professional standards"},{"slug":"epidemiologist","type":"collaboration","note":"domain statistician of disease, fluent in study design and causal inference"},{"slug":"machine-learning-engineer","type":"related","note":"treats the bias-variance tradeoff operationally rather than inferentially"},{"slug":"economist","type":"related","note":"wields much of the same causal-inference toolkit on markets and policy"}],"specializations":["Biostatistician","Bayesian Statistician","Experimental Design Specialist"],"country_variants":[],"sources":[{"title":"Statistical Inference (Casella & Berger)","kind":"book"},{"title":"Bayesian Data Analysis (Gelman et al.)","kind":"book"},{"title":"Causality (Judea Pearl)","kind":"book"}],"status":"draft","reviewers":[]},"sections":[{"heading":"Purpose","id":"purpose","markdown":"Statisticians exist because the world speaks in noise and we want the signal.\nEvery measurement is contaminated — by sampling, by error, by how it was collected\n— yet decisions must be made: does this drug work, did the policy change anything? A\nstatistician quantifies how much we can trust a conclusion drawn from limited,\nimperfect data, and designs the collection so trust is warranted: the formal study\nof variation.","html":"<h2 id=\"purpose\">Purpose</h2>\n<p>Statisticians exist because the world speaks in noise and we want the signal.\nEvery measurement is contaminated — by sampling, by error, by how it was collected\n— yet decisions must be made: does this drug work, did the policy change anything? A\nstatistician quantifies how much we can trust a conclusion drawn from limited,\nimperfect data, and designs the collection so trust is warranted: the formal study\nof variation.</p>\n","wordCount":68},{"heading":"Core Mission","id":"core-mission","markdown":"Turn data into calibrated belief — estimates with honest uncertainty, and claims\nthat survive \"compared to what, and how could this be fooling me?\"","html":"<h2 id=\"core-mission\">Core Mission</h2>\n<p>Turn data into calibrated belief — estimates with honest uncertainty, and claims\nthat survive &quot;compared to what, and how could this be fooling me?&quot;</p>\n","wordCount":23},{"heading":"Primary Responsibilities","id":"primary-responsibilities","markdown":"The visible output is a number with an interval; the work determines whether it\nmeans anything. A statistician frames the question as an estimand before touching\ndata; designs studies so the comparison is fair and the effect identifiable;\nchooses models matching the data-generating process, not the ones that fit best;\nchecks assumptions and residuals; quantifies uncertainty through standard errors,\nintervals, or posteriors; and translates it for those who act. Much of the job is\ndefensive: spotting confounding and selection effects. Most value lands in design,\nwhere good randomization makes analysis trivial and a bad one impossible.","html":"<h2 id=\"primary-responsibilities\">Primary Responsibilities</h2>\n<p>The visible output is a number with an interval; the work determines whether it\nmeans anything. A statistician frames the question as an estimand before touching\ndata; designs studies so the comparison is fair and the effect identifiable;\nchooses models matching the data-generating process, not the ones that fit best;\nchecks assumptions and residuals; quantifies uncertainty through standard errors,\nintervals, or posteriors; and translates it for those who act. Much of the job is\ndefensive: spotting confounding and selection effects. Most value lands in design,\nwhere good randomization makes analysis trivial and a bad one impossible.</p>\n","wordCount":97},{"heading":"Guiding Principles","id":"guiding-principles","markdown":"- **The design is the analysis.** Most validity is fixed before collection; a\n  confounded design cannot be rescued by a model. Randomization buys causal claims\n  no regression can.\n- **Quantify uncertainty, always.** A point estimate without an interval is a guess\n  in a lab coat. Deliver the estimate *and* its error.\n- **All models are wrong, some are useful.** (Box.) The model serves a purpose, not\n  truth — ask which decision it serves.\n- **Distrust the data you didn't collect.** Every dataset came from some process —\n  who was included, who dropped out, what got recorded. Reconstruct it.\n- **Significance is not importance.** A p-value answers \"would this surprise me\n  under the null?\" — not \"does this matter?\" A tiny effect is significant with\n  enough n, a large one nonsignificant with too few.\n- **Look at the data before you model it.** (Tukey.) Plot it, tabulate it, find the\n  impossible values. Surprises live in the margins.\n- **Pre-specify, then explore — label which is which.** The garden of forking paths\n  turns curiosity into false discovery; confirmatory and exploratory analyses claim\n  under different licenses.","html":"<h2 id=\"guiding-principles\">Guiding Principles</h2>\n<ul>\n<li><strong>The design is the analysis.</strong> Most validity is fixed before collection; a\nconfounded design cannot be rescued by a model. Randomization buys causal claims\nno regression can.</li>\n<li><strong>Quantify uncertainty, always.</strong> A point estimate without an interval is a guess\nin a lab coat. Deliver the estimate <em>and</em> its error.</li>\n<li><strong>All models are wrong, some are useful.</strong> (Box.) The model serves a purpose, not\ntruth — ask which decision it serves.</li>\n<li><strong>Distrust the data you didn&#39;t collect.</strong> Every dataset came from some process —\nwho was included, who dropped out, what got recorded. Reconstruct it.</li>\n<li><strong>Significance is not importance.</strong> A p-value answers &quot;would this surprise me\nunder the null?&quot; — not &quot;does this matter?&quot; A tiny effect is significant with\nenough n, a large one nonsignificant with too few.</li>\n<li><strong>Look at the data before you model it.</strong> (Tukey.) Plot it, tabulate it, find the\nimpossible values. Surprises live in the margins.</li>\n<li><strong>Pre-specify, then explore — label which is which.</strong> The garden of forking paths\nturns curiosity into false discovery; confirmatory and exploratory analyses claim\nunder different licenses.</li>\n</ul>\n","wordCount":174},{"heading":"Mental Models","id":"mental-models","markdown":"- **The data-generating process (DGP).** The imagined mechanism — random and\n  systematic — behind the observations; modeling reverse-engineers it.\n- **Bias–variance tradeoff.** Error decomposes into bias (systematic wrongness) and\n  variance (sensitivity to the sample). Flexible models cut bias and raise variance,\n  simple ones the reverse; overfitting wins on training, loses on new.\n- **Confounding and the DAG.** A third variable driving both cause and effect\n  manufactures spurious association. The causal graph (Pearl's DAGs) shows what to\n  adjust for — and what *not* to, since a collider opens a backdoor.\n- **Pearl's ladder of causation.** Association (seeing), intervention (doing),\n  counterfactuals (imagining otherwise). Data answers only the first; the rest need\n  structural assumptions.\n- **Regression to the mean.** Extreme observations are partly luck; the next\n  measurement drifts toward average regardless of intervention, so a \"treatment\" on\n  the worst performers always seems to help.\n- **The sampling distribution.** The estimate is itself a random variable across\n  repeated samples, the standard error its spread.\n- **Shrinkage / partial pooling.** Estimates for small groups pulled toward the\n  grand mean by borrowing strength — behind hierarchical models and James–Stein.\n- **Likelihood.** The data's compatibility with each parameter value; frequentists\n  maximize it, Bayesians multiply it by a prior.","html":"<h2 id=\"mental-models\">Mental Models</h2>\n<ul>\n<li><strong>The data-generating process (DGP).</strong> The imagined mechanism — random and\nsystematic — behind the observations; modeling reverse-engineers it.</li>\n<li><strong>Bias–variance tradeoff.</strong> Error decomposes into bias (systematic wrongness) and\nvariance (sensitivity to the sample). Flexible models cut bias and raise variance,\nsimple ones the reverse; overfitting wins on training, loses on new.</li>\n<li><strong>Confounding and the DAG.</strong> A third variable driving both cause and effect\nmanufactures spurious association. The causal graph (Pearl&#39;s DAGs) shows what to\nadjust for — and what <em>not</em> to, since a collider opens a backdoor.</li>\n<li><strong>Pearl&#39;s ladder of causation.</strong> Association (seeing), intervention (doing),\ncounterfactuals (imagining otherwise). Data answers only the first; the rest need\nstructural assumptions.</li>\n<li><strong>Regression to the mean.</strong> Extreme observations are partly luck; the next\nmeasurement drifts toward average regardless of intervention, so a &quot;treatment&quot; on\nthe worst performers always seems to help.</li>\n<li><strong>The sampling distribution.</strong> The estimate is itself a random variable across\nrepeated samples, the standard error its spread.</li>\n<li><strong>Shrinkage / partial pooling.</strong> Estimates for small groups pulled toward the\ngrand mean by borrowing strength — behind hierarchical models and James–Stein.</li>\n<li><strong>Likelihood.</strong> The data&#39;s compatibility with each parameter value; frequentists\nmaximize it, Bayesians multiply it by a prior.</li>\n</ul>\n","wordCount":192},{"heading":"First Principles","id":"first-principles","markdown":"- Variation is the default; constancy requires explanation.\n- You can never observe a counterfactual directly — causal inference is always about\n  something unseen, resting on assumptions you must state.\n- Correlation is real information about the joint distribution, just not the\n  information people want it to be.\n- More data shrinks variance but does nothing to bias: a biased measurement, taken a\n  million times, is confidently wrong.\n- The map is chosen, not found: which model you fit is a decision.","html":"<h2 id=\"first-principles\">First Principles</h2>\n<ul>\n<li>Variation is the default; constancy requires explanation.</li>\n<li>You can never observe a counterfactual directly — causal inference is always about\nsomething unseen, resting on assumptions you must state.</li>\n<li>Correlation is real information about the joint distribution, just not the\ninformation people want it to be.</li>\n<li>More data shrinks variance but does nothing to bias: a biased measurement, taken a\nmillion times, is confidently wrong.</li>\n<li>The map is chosen, not found: which model you fit is a decision.</li>\n</ul>\n","wordCount":76},{"heading":"Questions Experts Constantly Ask","id":"questions-experts-constantly-ask","markdown":"- What is the estimand — the quantity we want, defined before any estimator?\n- How were these data generated, and who is missing?\n- Compared to what — the control, the baseline, the counterfactual?\n- What would I expect to see under no effect?\n- What is confounded with what I care about?\n- Is this effect identifiable from my data, under any assumptions?\n- How many comparisons did I make, including unreported ones?\n- Is the difference between \"significant\" and \"not significant\" itself significant?\n  (Usually no — Gelman.)\n- If I reran the study, how different would the number be?","html":"<h2 id=\"questions-experts-constantly-ask\">Questions Experts Constantly Ask</h2>\n<ul>\n<li>What is the estimand — the quantity we want, defined before any estimator?</li>\n<li>How were these data generated, and who is missing?</li>\n<li>Compared to what — the control, the baseline, the counterfactual?</li>\n<li>What would I expect to see under no effect?</li>\n<li>What is confounded with what I care about?</li>\n<li>Is this effect identifiable from my data, under any assumptions?</li>\n<li>How many comparisons did I make, including unreported ones?</li>\n<li>Is the difference between &quot;significant&quot; and &quot;not significant&quot; itself significant?\n(Usually no — Gelman.)</li>\n<li>If I reran the study, how different would the number be?</li>\n</ul>\n","wordCount":90},{"heading":"Decision Frameworks","id":"decision-frameworks","markdown":"- **Frequentist vs. Bayesian.** Use Bayesian methods for real prior information,\n  direct probability statements about parameters, or small hierarchical samples;\n  frequentist methods for guaranteed error rates or a regulator's vocabulary. A\n  choice, not a faith.\n- **Power analysis before data, not after.** Set sample size from the smallest\n  effect worth detecting, the variance, and tolerable Type I/II rates. Post-hoc\n  power is circular.\n- **Model selection by out-of-sample performance.** Cross-validation, AIC, or a\n  held-out set — never in-sample fit. Generalization, not memory.\n- **Multiple-comparison discipline.** Choose the correction (Bonferroni,\n  Benjamini–Hochberg false discovery rate) by the cost of a false positive versus a\n  miss, before peeking.\n- **The estimand-first ladder.** Define the target quantity, then identification\n  assumptions, the estimator, the inference. Reversing it derails things.","html":"<h2 id=\"decision-frameworks\">Decision Frameworks</h2>\n<ul>\n<li><strong>Frequentist vs. Bayesian.</strong> Use Bayesian methods for real prior information,\ndirect probability statements about parameters, or small hierarchical samples;\nfrequentist methods for guaranteed error rates or a regulator&#39;s vocabulary. A\nchoice, not a faith.</li>\n<li><strong>Power analysis before data, not after.</strong> Set sample size from the smallest\neffect worth detecting, the variance, and tolerable Type I/II rates. Post-hoc\npower is circular.</li>\n<li><strong>Model selection by out-of-sample performance.</strong> Cross-validation, AIC, or a\nheld-out set — never in-sample fit. Generalization, not memory.</li>\n<li><strong>Multiple-comparison discipline.</strong> Choose the correction (Bonferroni,\nBenjamini–Hochberg false discovery rate) by the cost of a false positive versus a\nmiss, before peeking.</li>\n<li><strong>The estimand-first ladder.</strong> Define the target quantity, then identification\nassumptions, the estimator, the inference. Reversing it derails things.</li>\n</ul>\n","wordCount":127},{"heading":"Workflow","id":"workflow","markdown":"1. **Frame.** Pin down the question and the estimand in plain language. What\n   decision rides on the answer, and what precision does it need?\n2. **Design.** If data can still be collected: randomize, block, blind, choose the\n   sampling frame, compute the sample size. Validity is won here.\n3. **Explore.** Before modeling, do EDA — plots, distributions, missingness\n   patterns, outliers, impossible values.\n4. **Specify.** Write down the model and analysis plan, ideally pre-registered,\n   stating assumptions explicitly.\n5. **Fit and diagnose.** Estimate, then attack: residual plots, influence\n   diagnostics, posterior predictive checks.\n6. **Quantify uncertainty.** Standard errors, confidence or credible intervals, the\n   bootstrap when the analytic form is intractable.\n7. **Sensitivity.** Vary the untestable assumptions — the missing-data mechanism, an\n   unmeasured confounder — and see how much the conclusion moves.\n8. **Communicate.** Lead with the effect size and its uncertainty in the\n   decision-maker's units. Say plainly what it cannot support.","html":"<h2 id=\"workflow\">Workflow</h2>\n<ol>\n<li><strong>Frame.</strong> Pin down the question and the estimand in plain language. What\ndecision rides on the answer, and what precision does it need?</li>\n<li><strong>Design.</strong> If data can still be collected: randomize, block, blind, choose the\nsampling frame, compute the sample size. Validity is won here.</li>\n<li><strong>Explore.</strong> Before modeling, do EDA — plots, distributions, missingness\npatterns, outliers, impossible values.</li>\n<li><strong>Specify.</strong> Write down the model and analysis plan, ideally pre-registered,\nstating assumptions explicitly.</li>\n<li><strong>Fit and diagnose.</strong> Estimate, then attack: residual plots, influence\ndiagnostics, posterior predictive checks.</li>\n<li><strong>Quantify uncertainty.</strong> Standard errors, confidence or credible intervals, the\nbootstrap when the analytic form is intractable.</li>\n<li><strong>Sensitivity.</strong> Vary the untestable assumptions — the missing-data mechanism, an\nunmeasured confounder — and see how much the conclusion moves.</li>\n<li><strong>Communicate.</strong> Lead with the effect size and its uncertainty in the\ndecision-maker&#39;s units. Say plainly what it cannot support.</li>\n</ol>\n","wordCount":147},{"heading":"Common Tradeoffs","id":"common-tradeoffs","markdown":"- **Bias vs. variance.** Flexibility fits the past at the cost of the future; the\n  right complexity depends on sample size and the cost of error.\n- **Power vs. cost.** Detecting a smaller effect needs a larger, costlier sample.\n- **Type I vs. Type II error.** Tightening one loosens the other at fixed n. Which\n  costs more — false alarm or missed signal — is a domain question.\n- **Interpretability vs. predictive accuracy.** A regression you can explain to a\n  clinician may lose to an uninterrogable black box. The use case decides.\n- **Pooling vs. separation.** Complete pooling ignores group differences, no pooling\n  overfits small groups; partial pooling trades a little bias for less variance.\n- **Generality vs. internal validity.** A tightly controlled experiment is\n  believable but may not transport to the population you care about.","html":"<h2 id=\"common-tradeoffs\">Common Tradeoffs</h2>\n<ul>\n<li><strong>Bias vs. variance.</strong> Flexibility fits the past at the cost of the future; the\nright complexity depends on sample size and the cost of error.</li>\n<li><strong>Power vs. cost.</strong> Detecting a smaller effect needs a larger, costlier sample.</li>\n<li><strong>Type I vs. Type II error.</strong> Tightening one loosens the other at fixed n. Which\ncosts more — false alarm or missed signal — is a domain question.</li>\n<li><strong>Interpretability vs. predictive accuracy.</strong> A regression you can explain to a\nclinician may lose to an uninterrogable black box. The use case decides.</li>\n<li><strong>Pooling vs. separation.</strong> Complete pooling ignores group differences, no pooling\noverfits small groups; partial pooling trades a little bias for less variance.</li>\n<li><strong>Generality vs. internal validity.</strong> A tightly controlled experiment is\nbelievable but may not transport to the population you care about.</li>\n</ul>\n","wordCount":128},{"heading":"Rules of Thumb","id":"rules-of-thumb","markdown":"- Plot before you compute; the eye catches what the summary hides.\n- If you tortured the data, it confessed — count every test you ran.\n- A confidence interval spanning both zero and a huge effect admits you learned\n  nothing.\n- When the result seems too clean, suspect a leak between train and test, or a\n  variable that encodes the answer.\n- Garbage in, gospel out: a model launders dubious data into authoritative numbers.\n- Absence of evidence is not evidence of absence; a nonsignificant low-power result\n  says nothing.\n- Standardize before comparing coefficients; raw units mislead.","html":"<h2 id=\"rules-of-thumb\">Rules of Thumb</h2>\n<ul>\n<li>Plot before you compute; the eye catches what the summary hides.</li>\n<li>If you tortured the data, it confessed — count every test you ran.</li>\n<li>A confidence interval spanning both zero and a huge effect admits you learned\nnothing.</li>\n<li>When the result seems too clean, suspect a leak between train and test, or a\nvariable that encodes the answer.</li>\n<li>Garbage in, gospel out: a model launders dubious data into authoritative numbers.</li>\n<li>Absence of evidence is not evidence of absence; a nonsignificant low-power result\nsays nothing.</li>\n<li>Standardize before comparing coefficients; raw units mislead.</li>\n</ul>\n","wordCount":91},{"heading":"Failure Modes","id":"failure-modes","markdown":"- **p-hacking and the garden of forking paths.** Trying subgroups, transforms, and\n  covariates until p < 0.05, then reporting the survivor as the plan.\n- **Confusing significance with effect size.** Trumpeting a p < 0.001 result whose\n  effect is too small to matter.\n- **Confounding mistaken for causation.** Reporting an association as if randomized.\n- **Ignoring the sampling mechanism.** Generalizing from a convenience sample to a\n  population it never represented.\n- **Overfitting and reporting in-sample fit.** Dazzling R² on tuning data, collapse\n  on new.\n- **HARKing.** Hypothesizing after results are known, dressing exploration as\n  confirmation.\n- **Throwing out incomplete cases blindly.** Complete-case analysis when the\n  missingness is informative (MNAR), biasing the result.","html":"<h2 id=\"failure-modes\">Failure Modes</h2>\n<ul>\n<li><strong>p-hacking and the garden of forking paths.</strong> Trying subgroups, transforms, and\ncovariates until p &lt; 0.05, then reporting the survivor as the plan.</li>\n<li><strong>Confusing significance with effect size.</strong> Trumpeting a p &lt; 0.001 result whose\neffect is too small to matter.</li>\n<li><strong>Confounding mistaken for causation.</strong> Reporting an association as if randomized.</li>\n<li><strong>Ignoring the sampling mechanism.</strong> Generalizing from a convenience sample to a\npopulation it never represented.</li>\n<li><strong>Overfitting and reporting in-sample fit.</strong> Dazzling R² on tuning data, collapse\non new.</li>\n<li><strong>HARKing.</strong> Hypothesizing after results are known, dressing exploration as\nconfirmation.</li>\n<li><strong>Throwing out incomplete cases blindly.</strong> Complete-case analysis when the\nmissingness is informative (MNAR), biasing the result.</li>\n</ul>\n","wordCount":108},{"heading":"Anti-patterns","id":"anti-patterns","markdown":"- **The dredge.** Running every variable against every outcome and harvesting the\n  survivors.\n- **Default-everything modeling.** Linear regression on whatever's in the file — no\n  DGP, no diagnostics, no thought about the coefficients.\n- **Significance worship.** Treating 0.05 as a cliff where 0.049 is truth, 0.051\n  nothing.\n- **Berkson and collider blunders.** Conditioning on a downstream variable and\n  inventing a correlation.\n- **The file drawer.** Burying null results so the literature is a survivorship\n  sample.\n- **Goodharting.** Optimizing a metric until it stops measuring the target.\n- **Ecological hand-waving.** Inferring individual behavior from group-level\n  correlations.","html":"<h2 id=\"anti-patterns\">Anti-patterns</h2>\n<ul>\n<li><strong>The dredge.</strong> Running every variable against every outcome and harvesting the\nsurvivors.</li>\n<li><strong>Default-everything modeling.</strong> Linear regression on whatever&#39;s in the file — no\nDGP, no diagnostics, no thought about the coefficients.</li>\n<li><strong>Significance worship.</strong> Treating 0.05 as a cliff where 0.049 is truth, 0.051\nnothing.</li>\n<li><strong>Berkson and collider blunders.</strong> Conditioning on a downstream variable and\ninventing a correlation.</li>\n<li><strong>The file drawer.</strong> Burying null results so the literature is a survivorship\nsample.</li>\n<li><strong>Goodharting.</strong> Optimizing a metric until it stops measuring the target.</li>\n<li><strong>Ecological hand-waving.</strong> Inferring individual behavior from group-level\ncorrelations.</li>\n</ul>\n","wordCount":93},{"heading":"Vocabulary","id":"vocabulary","markdown":"- **Estimand** — the quantity you want, defined before any estimator.\n- **Standard error** — the standard deviation of an estimate's sampling\n  distribution.\n- **Heteroskedasticity** — non-constant error variance across a predictor's range,\n  invalidating naive errors.\n- **Collinearity** — predictors so correlated their effects can't be separated.\n- **Type I / Type II error** — a false positive (rejecting a true null) and a false\n  negative (missing a real effect).\n- **Base rate** — the underlying prevalence; ignoring it yields the base-rate\n  fallacy, where a good test gives mostly false positives for rare conditions.\n- **The bootstrap** — resampling with replacement to approximate the sampling\n  distribution.\n- **Shrinkage** — pulling noisy estimates toward a common value.\n- **MCAR / MAR / MNAR** — missing completely at random, at random (given observed\n  data), or not at random; each needs a different remedy.\n- **Posterior** — the Bayesian belief about a parameter, prior times likelihood.","html":"<h2 id=\"vocabulary\">Vocabulary</h2>\n<ul>\n<li><strong>Estimand</strong> — the quantity you want, defined before any estimator.</li>\n<li><strong>Standard error</strong> — the standard deviation of an estimate&#39;s sampling\ndistribution.</li>\n<li><strong>Heteroskedasticity</strong> — non-constant error variance across a predictor&#39;s range,\ninvalidating naive errors.</li>\n<li><strong>Collinearity</strong> — predictors so correlated their effects can&#39;t be separated.</li>\n<li><strong>Type I / Type II error</strong> — a false positive (rejecting a true null) and a false\nnegative (missing a real effect).</li>\n<li><strong>Base rate</strong> — the underlying prevalence; ignoring it yields the base-rate\nfallacy, where a good test gives mostly false positives for rare conditions.</li>\n<li><strong>The bootstrap</strong> — resampling with replacement to approximate the sampling\ndistribution.</li>\n<li><strong>Shrinkage</strong> — pulling noisy estimates toward a common value.</li>\n<li><strong>MCAR / MAR / MNAR</strong> — missing completely at random, at random (given observed\ndata), or not at random; each needs a different remedy.</li>\n<li><strong>Posterior</strong> — the Bayesian belief about a parameter, prior times likelihood.</li>\n</ul>\n","wordCount":132},{"heading":"Tools","id":"tools","markdown":"- **R** — the lingua franca; the tidyverse for wrangling, the modeling ecosystem\n  from mixed models to survival analysis.\n- **Python (pandas, statsmodels, scikit-learn)** — for analysis that lives next to\n  production code.\n- **Stan (and brms, PyMC)** — probabilistic programming for Bayesian models, full\n  posterior inference via HMC.\n- **SAS** — entrenched in pharma and regulated industries where validated procedures\n  matter.\n- **Design-of-experiments tooling** — factorial designs, blocking, response-surface\n  methods, randomization schedules.\n- **Plotting (ggplot2, matplotlib)** — EDA is visual; the graph is a thinking tool.\n- **Version-controlled, literate notebooks (R Markdown, Quarto, Jupyter)** — so the\n  analysis is reproducible and auditable.","html":"<h2 id=\"tools\">Tools</h2>\n<ul>\n<li><strong>R</strong> — the lingua franca; the tidyverse for wrangling, the modeling ecosystem\nfrom mixed models to survival analysis.</li>\n<li><strong>Python (pandas, statsmodels, scikit-learn)</strong> — for analysis that lives next to\nproduction code.</li>\n<li><strong>Stan (and brms, PyMC)</strong> — probabilistic programming for Bayesian models, full\nposterior inference via HMC.</li>\n<li><strong>SAS</strong> — entrenched in pharma and regulated industries where validated procedures\nmatter.</li>\n<li><strong>Design-of-experiments tooling</strong> — factorial designs, blocking, response-surface\nmethods, randomization schedules.</li>\n<li><strong>Plotting (ggplot2, matplotlib)</strong> — EDA is visual; the graph is a thinking tool.</li>\n<li><strong>Version-controlled, literate notebooks (R Markdown, Quarto, Jupyter)</strong> — so the\nanalysis is reproducible and auditable.</li>\n</ul>\n","wordCount":94},{"heading":"Collaboration","id":"collaboration","markdown":"Statisticians are almost always embedded in someone else's problem — a clinician's\ntrial, an economist's policy question, a product team's experiment. The most\nvaluable conversation happens before data collection, preventing an unanswerable\ndesign rather than apologizing for one. The recurring friction is the collaborator\nwho arrives with data collected and a conclusion desired. Good statisticians say\n\"this design cannot answer that,\" then help reshape it. They translate both ways: a\nvague aim into an estimand, an interval into a layperson's decision.","html":"<h2 id=\"collaboration\">Collaboration</h2>\n<p>Statisticians are almost always embedded in someone else&#39;s problem — a clinician&#39;s\ntrial, an economist&#39;s policy question, a product team&#39;s experiment. The most\nvaluable conversation happens before data collection, preventing an unanswerable\ndesign rather than apologizing for one. The recurring friction is the collaborator\nwho arrives with data collected and a conclusion desired. Good statisticians say\n&quot;this design cannot answer that,&quot; then help reshape it. They translate both ways: a\nvague aim into an estimand, an interval into a layperson&#39;s decision.</p>\n","wordCount":80},{"heading":"Ethics","id":"ethics","markdown":"The statistician sits at the gate between data and belief, a position of easy abuse.\nThe duties: report the uncertainty, not just the estimate; disclose every analysis\nattempted, not only the one that worked; refuse to p-hack on request; resist letting\na desired conclusion drive the method. Confidentiality and re-identification risk are\nlive whenever the data are about people. There is a duty against false precision —\nfour decimals on a noisy number mislead as surely as a lie. The replication crisis\nwas an ethical failure: incentives rewarded surprising over true findings.","html":"<h2 id=\"ethics\">Ethics</h2>\n<p>The statistician sits at the gate between data and belief, a position of easy abuse.\nThe duties: report the uncertainty, not just the estimate; disclose every analysis\nattempted, not only the one that worked; refuse to p-hack on request; resist letting\na desired conclusion drive the method. Confidentiality and re-identification risk are\nlive whenever the data are about people. There is a duty against false precision —\nfour decimals on a noisy number mislead as surely as a lie. The replication crisis\nwas an ethical failure: incentives rewarded surprising over true findings.</p>\n","wordCount":93},{"heading":"Scenarios","id":"scenarios","markdown":"**A vaccine trial readout.** The team reports the vaccine \"significantly reduced\ninfections, p = 0.03.\" The statistician asks for the effect size and its interval:\nrelative risk reduction 12%, 95% interval 1% to 22% — barely excluding zero. Then\nthe harder questions: how many endpoints were tested, was this the pre-registered\nprimary, and did the arms differ at baseline because randomization was imperfect in\na small trial? The writeup reports \"a modest effect, imprecisely estimated,\nwarranting a confirmatory trial\" — not the certainty the p-value implies.\n\n**The \"training program works\" claim.** HR shows that employees who took an optional\nleadership course were promoted more often, and wants to mandate it. The statistician\nsketches the DAG: ambition drives both enrolling *and* getting promoted — a textbook\nconfounder, so the association is partly, maybe entirely, selection. The proposal: a\nrandomized pilot offering the course to a random half. If impossible, match on\npre-course performance plus a sensitivity analysis for how strong an unmeasured\nconfounder must be to erase the effect. Either way, the association is a hypothesis,\nnot a finding.\n\n**Surprising A/B test winner.** A product experiment shows a new checkout flow\nlifting conversion by 8%, hugely significant. The statistician distrusts it: was the\nrandomization unit correct (user, not session), did the test run a full weekly cycle\nto avoid day-of-week confounding, was this the only metric among forty? They find\nthe team peeked daily and stopped the moment it crossed significance — optional\nstopping that inflates the false-positive rate. The fix: rerun with pre-committed\nsample size and one look.","html":"<h2 id=\"scenarios\">Scenarios</h2>\n<p><strong>A vaccine trial readout.</strong> The team reports the vaccine &quot;significantly reduced\ninfections, p = 0.03.&quot; The statistician asks for the effect size and its interval:\nrelative risk reduction 12%, 95% interval 1% to 22% — barely excluding zero. Then\nthe harder questions: how many endpoints were tested, was this the pre-registered\nprimary, and did the arms differ at baseline because randomization was imperfect in\na small trial? The writeup reports &quot;a modest effect, imprecisely estimated,\nwarranting a confirmatory trial&quot; — not the certainty the p-value implies.</p>\n<p><strong>The &quot;training program works&quot; claim.</strong> HR shows that employees who took an optional\nleadership course were promoted more often, and wants to mandate it. The statistician\nsketches the DAG: ambition drives both enrolling <em>and</em> getting promoted — a textbook\nconfounder, so the association is partly, maybe entirely, selection. The proposal: a\nrandomized pilot offering the course to a random half. If impossible, match on\npre-course performance plus a sensitivity analysis for how strong an unmeasured\nconfounder must be to erase the effect. Either way, the association is a hypothesis,\nnot a finding.</p>\n<p><strong>Surprising A/B test winner.</strong> A product experiment shows a new checkout flow\nlifting conversion by 8%, hugely significant. The statistician distrusts it: was the\nrandomization unit correct (user, not session), did the test run a full weekly cycle\nto avoid day-of-week confounding, was this the only metric among forty? They find\nthe team peeked daily and stopped the moment it crossed significance — optional\nstopping that inflates the false-positive rate. The fix: rerun with pre-committed\nsample size and one look.</p>\n","wordCount":262},{"heading":"Related Occupations","id":"related-occupations","markdown":"The statistician shares deep tooling and a love of uncertainty with several roles\nbut is defined by the formal study of variation itself. Data scientists apply the\nsame methods at scale, leaning toward prediction over inference. Mathematicians\nsupply the probability theory statistics rests on, caring about proof where\nstatisticians care about data. Actuaries are statisticians of risk in finance and\ninsurance. Epidemiologists are domain statisticians of disease, fluent in study\ndesign and causal inference. Machine learning engineers optimize prediction and\ntreat the bias–variance tradeoff operationally. Economists wield the same\ncausal-inference toolkit on policy.","html":"<h2 id=\"related-occupations\">Related Occupations</h2>\n<p>The statistician shares deep tooling and a love of uncertainty with several roles\nbut is defined by the formal study of variation itself. Data scientists apply the\nsame methods at scale, leaning toward prediction over inference. Mathematicians\nsupply the probability theory statistics rests on, caring about proof where\nstatisticians care about data. Actuaries are statisticians of risk in finance and\ninsurance. Epidemiologists are domain statisticians of disease, fluent in study\ndesign and causal inference. Machine learning engineers optimize prediction and\ntreat the bias–variance tradeoff operationally. Economists wield the same\ncausal-inference toolkit on policy.</p>\n","wordCount":95},{"heading":"References","id":"references","markdown":"- *Statistical Inference* — Casella & Berger\n- *Exploratory Data Analysis* — John Tukey\n- *Bayesian Data Analysis* — Gelman, Carlin, Stern, Dunson, Vehtari & Rubin\n- *All of Statistics* — Larry Wasserman\n- *Causality* — Judea Pearl\n- *The Elements of Statistical Learning* — Hastie, Tibshirani & Friedman","html":"<h2 id=\"references\">References</h2>\n<ul>\n<li><em>Statistical Inference</em> — Casella &amp; Berger</li>\n<li><em>Exploratory Data Analysis</em> — John Tukey</li>\n<li><em>Bayesian Data Analysis</em> — Gelman, Carlin, Stern, Dunson, Vehtari &amp; Rubin</li>\n<li><em>All of Statistics</em> — Larry Wasserman</li>\n<li><em>Causality</em> — Judea Pearl</li>\n<li><em>The Elements of Statistical Learning</em> — Hastie, Tibshirani &amp; Friedman</li>\n</ul>\n","wordCount":34}],"computed":{"wordCount":2204,"readingTimeMinutes":10,"completeness":1,"backlinks":["cost-estimator","data-analyst","ecologist","economist","epidemiologist","geographer","insurance-underwriter","market-research-analyst","operations-research-analyst","political-scientist","quality-control-inspector","zoologist"],"verified":false,"aiDrafted":true,"unverifiedAiDraft":true},"git":{"created":"2026-06-26","updated":"2026-06-27","revisions":7,"authors":[{"name":"soul-atlas","commits":7}],"timeline":[{"date":"2026-06-26","author":"soul-atlas"},{"date":"2026-06-27","author":"soul-atlas"},{"date":"2026-06-27","author":"soul-atlas"},{"date":"2026-06-27","author":"soul-atlas"},{"date":"2026-06-27","author":"soul-atlas"},{"date":"2026-06-27","author":"soul-atlas"},{"date":"2026-06-27","author":"soul-atlas"}]},"citation":{"apa":"soul-atlas (2026). Statistician [SOUL]. SOUL Atlas. https://soul-atlas.github.io/occupations/statistician","bibtex":"@misc{soulatlas-statistician,\n  title        = {Statistician},\n  author       = {soul-atlas},\n  year         = {2026},\n  howpublished = {SOUL Atlas},\n  note         = {SOUL.md, version 2026-06-27},\n  url          = {https://soul-atlas.github.io/occupations/statistician}\n}","text":"soul-atlas. \"Statistician.\" SOUL Atlas, 2026. https://soul-atlas.github.io/occupations/statistician."}}