title: Statistician
slug: statistician
aliases:
  - Applied Statistician
  - Biostatistician
  - Quantitative Analyst
category: Science
tags:
  - statistics
  - inference
  - probability
  - data-analysis
  - experimental-design
difficulty: advanced
summary: >-
  Turns noisy, imperfect data into calibrated belief — estimates with honest
  uncertainty — by reasoning about how the data were generated and how an
  analysis could be fooling itself.
contributors:
  - soul-atlas
last_reviewed: null
provenance: ai-generated
created: '2026-06-26'
updated: '2026-06-26'
related:
  - slug: data-scientist
    type: adjacent
    note: >-
      applies the same methods at scale, leaning toward prediction over
      inference
  - slug: mathematician
    type: prerequisite
    note: supplies the probability theory statistics rests on
  - slug: actuary
    type: specialization
    note: >-
      statistician of risk in insurance and finance, bound by professional
      standards
  - slug: epidemiologist
    type: collaboration
    note: >-
      domain statistician of disease, fluent in study design and causal
      inference
  - slug: machine-learning-engineer
    type: related
    note: treats the bias-variance tradeoff operationally rather than inferentially
  - slug: economist
    type: related
    note: wields much of the same causal-inference toolkit on markets and policy
specializations:
  - Biostatistician
  - Bayesian Statistician
  - Experimental Design Specialist
country_variants: []
sources:
  - title: Statistical Inference (Casella & Berger)
    kind: book
  - title: Bayesian Data Analysis (Gelman et al.)
    kind: book
  - title: Causality (Judea Pearl)
    kind: book
status: draft
reviewers: []
sections:
  - heading: Purpose
    markdown: >-
      Statisticians exist because the world speaks in noise and we want the
      signal.

      Every measurement is contaminated — by sampling, by error, by how it was
      collected

      — yet decisions must be made: does this drug work, did the policy change
      anything? A

      statistician quantifies how much we can trust a conclusion drawn from
      limited,

      imperfect data, and designs the collection so trust is warranted: the
      formal study

      of variation.
  - heading: Core Mission
    markdown: >-
      Turn data into calibrated belief — estimates with honest uncertainty, and
      claims

      that survive "compared to what, and how could this be fooling me?"
  - heading: Primary Responsibilities
    markdown: >-
      The visible output is a number with an interval; the work determines
      whether it

      means anything. A statistician frames the question as an estimand before
      touching

      data; designs studies so the comparison is fair and the effect
      identifiable;

      chooses models matching the data-generating process, not the ones that fit
      best;

      checks assumptions and residuals; quantifies uncertainty through standard
      errors,

      intervals, or posteriors; and translates it for those who act. Much of the
      job is

      defensive: spotting confounding and selection effects. Most value lands in
      design,

      where good randomization makes analysis trivial and a bad one impossible.
  - heading: Guiding Principles
    markdown: >-
      - **The design is the analysis.** Most validity is fixed before
      collection; a
        confounded design cannot be rescued by a model. Randomization buys causal claims
        no regression can.
      - **Quantify uncertainty, always.** A point estimate without an interval
      is a guess
        in a lab coat. Deliver the estimate *and* its error.
      - **All models are wrong, some are useful.** (Box.) The model serves a
      purpose, not
        truth — ask which decision it serves.
      - **Distrust the data you didn't collect.** Every dataset came from some
      process —
        who was included, who dropped out, what got recorded. Reconstruct it.
      - **Significance is not importance.** A p-value answers "would this
      surprise me
        under the null?" — not "does this matter?" A tiny effect is significant with
        enough n, a large one nonsignificant with too few.
      - **Look at the data before you model it.** (Tukey.) Plot it, tabulate it,
      find the
        impossible values. Surprises live in the margins.
      - **Pre-specify, then explore — label which is which.** The garden of
      forking paths
        turns curiosity into false discovery; confirmatory and exploratory analyses claim
        under different licenses.
  - heading: Mental Models
    markdown: >-
      - **The data-generating process (DGP).** The imagined mechanism — random
      and
        systematic — behind the observations; modeling reverse-engineers it.
      - **Bias–variance tradeoff.** Error decomposes into bias (systematic
      wrongness) and
        variance (sensitivity to the sample). Flexible models cut bias and raise variance,
        simple ones the reverse; overfitting wins on training, loses on new.
      - **Confounding and the DAG.** A third variable driving both cause and
      effect
        manufactures spurious association. The causal graph (Pearl's DAGs) shows what to
        adjust for — and what *not* to, since a collider opens a backdoor.
      - **Pearl's ladder of causation.** Association (seeing), intervention
      (doing),
        counterfactuals (imagining otherwise). Data answers only the first; the rest need
        structural assumptions.
      - **Regression to the mean.** Extreme observations are partly luck; the
      next
        measurement drifts toward average regardless of intervention, so a "treatment" on
        the worst performers always seems to help.
      - **The sampling distribution.** The estimate is itself a random variable
      across
        repeated samples, the standard error its spread.
      - **Shrinkage / partial pooling.** Estimates for small groups pulled
      toward the
        grand mean by borrowing strength — behind hierarchical models and James–Stein.
      - **Likelihood.** The data's compatibility with each parameter value;
      frequentists
        maximize it, Bayesians multiply it by a prior.
  - heading: First Principles
    markdown: >-
      - Variation is the default; constancy requires explanation.

      - You can never observe a counterfactual directly — causal inference is
      always about
        something unseen, resting on assumptions you must state.
      - Correlation is real information about the joint distribution, just not
      the
        information people want it to be.
      - More data shrinks variance but does nothing to bias: a biased
      measurement, taken a
        million times, is confidently wrong.
      - The map is chosen, not found: which model you fit is a decision.
  - heading: Questions Experts Constantly Ask
    markdown: >-
      - What is the estimand — the quantity we want, defined before any
      estimator?

      - How were these data generated, and who is missing?

      - Compared to what — the control, the baseline, the counterfactual?

      - What would I expect to see under no effect?

      - What is confounded with what I care about?

      - Is this effect identifiable from my data, under any assumptions?

      - How many comparisons did I make, including unreported ones?

      - Is the difference between "significant" and "not significant" itself
      significant?
        (Usually no — Gelman.)
      - If I reran the study, how different would the number be?
  - heading: Decision Frameworks
    markdown: >-
      - **Frequentist vs. Bayesian.** Use Bayesian methods for real prior
      information,
        direct probability statements about parameters, or small hierarchical samples;
        frequentist methods for guaranteed error rates or a regulator's vocabulary. A
        choice, not a faith.
      - **Power analysis before data, not after.** Set sample size from the
      smallest
        effect worth detecting, the variance, and tolerable Type I/II rates. Post-hoc
        power is circular.
      - **Model selection by out-of-sample performance.** Cross-validation, AIC,
      or a
        held-out set — never in-sample fit. Generalization, not memory.
      - **Multiple-comparison discipline.** Choose the correction (Bonferroni,
        Benjamini–Hochberg false discovery rate) by the cost of a false positive versus a
        miss, before peeking.
      - **The estimand-first ladder.** Define the target quantity, then
      identification
        assumptions, the estimator, the inference. Reversing it derails things.
  - heading: Workflow
    markdown: >-
      1. **Frame.** Pin down the question and the estimand in plain language.
      What
         decision rides on the answer, and what precision does it need?
      2. **Design.** If data can still be collected: randomize, block, blind,
      choose the
         sampling frame, compute the sample size. Validity is won here.
      3. **Explore.** Before modeling, do EDA — plots, distributions,
      missingness
         patterns, outliers, impossible values.
      4. **Specify.** Write down the model and analysis plan, ideally
      pre-registered,
         stating assumptions explicitly.
      5. **Fit and diagnose.** Estimate, then attack: residual plots, influence
         diagnostics, posterior predictive checks.
      6. **Quantify uncertainty.** Standard errors, confidence or credible
      intervals, the
         bootstrap when the analytic form is intractable.
      7. **Sensitivity.** Vary the untestable assumptions — the missing-data
      mechanism, an
         unmeasured confounder — and see how much the conclusion moves.
      8. **Communicate.** Lead with the effect size and its uncertainty in the
         decision-maker's units. Say plainly what it cannot support.
  - heading: Common Tradeoffs
    markdown: >-
      - **Bias vs. variance.** Flexibility fits the past at the cost of the
      future; the
        right complexity depends on sample size and the cost of error.
      - **Power vs. cost.** Detecting a smaller effect needs a larger, costlier
      sample.

      - **Type I vs. Type II error.** Tightening one loosens the other at fixed
      n. Which
        costs more — false alarm or missed signal — is a domain question.
      - **Interpretability vs. predictive accuracy.** A regression you can
      explain to a
        clinician may lose to an uninterrogable black box. The use case decides.
      - **Pooling vs. separation.** Complete pooling ignores group differences,
      no pooling
        overfits small groups; partial pooling trades a little bias for less variance.
      - **Generality vs. internal validity.** A tightly controlled experiment is
        believable but may not transport to the population you care about.
  - heading: Rules of Thumb
    markdown: >-
      - Plot before you compute; the eye catches what the summary hides.

      - If you tortured the data, it confessed — count every test you ran.

      - A confidence interval spanning both zero and a huge effect admits you
      learned
        nothing.
      - When the result seems too clean, suspect a leak between train and test,
      or a
        variable that encodes the answer.
      - Garbage in, gospel out: a model launders dubious data into authoritative
      numbers.

      - Absence of evidence is not evidence of absence; a nonsignificant
      low-power result
        says nothing.
      - Standardize before comparing coefficients; raw units mislead.
  - heading: Failure Modes
    markdown: >-
      - **p-hacking and the garden of forking paths.** Trying subgroups,
      transforms, and
        covariates until p < 0.05, then reporting the survivor as the plan.
      - **Confusing significance with effect size.** Trumpeting a p < 0.001
      result whose
        effect is too small to matter.
      - **Confounding mistaken for causation.** Reporting an association as if
      randomized.

      - **Ignoring the sampling mechanism.** Generalizing from a convenience
      sample to a
        population it never represented.
      - **Overfitting and reporting in-sample fit.** Dazzling R² on tuning data,
      collapse
        on new.
      - **HARKing.** Hypothesizing after results are known, dressing exploration
      as
        confirmation.
      - **Throwing out incomplete cases blindly.** Complete-case analysis when
      the
        missingness is informative (MNAR), biasing the result.
  - heading: Anti-patterns
    markdown: >-
      - **The dredge.** Running every variable against every outcome and
      harvesting the
        survivors.
      - **Default-everything modeling.** Linear regression on whatever's in the
      file — no
        DGP, no diagnostics, no thought about the coefficients.
      - **Significance worship.** Treating 0.05 as a cliff where 0.049 is truth,
      0.051
        nothing.
      - **Berkson and collider blunders.** Conditioning on a downstream variable
      and
        inventing a correlation.
      - **The file drawer.** Burying null results so the literature is a
      survivorship
        sample.
      - **Goodharting.** Optimizing a metric until it stops measuring the
      target.

      - **Ecological hand-waving.** Inferring individual behavior from
      group-level
        correlations.
  - heading: Vocabulary
    markdown: >-
      - **Estimand** — the quantity you want, defined before any estimator.

      - **Standard error** — the standard deviation of an estimate's sampling
        distribution.
      - **Heteroskedasticity** — non-constant error variance across a
      predictor's range,
        invalidating naive errors.
      - **Collinearity** — predictors so correlated their effects can't be
      separated.

      - **Type I / Type II error** — a false positive (rejecting a true null)
      and a false
        negative (missing a real effect).
      - **Base rate** — the underlying prevalence; ignoring it yields the
      base-rate
        fallacy, where a good test gives mostly false positives for rare conditions.
      - **The bootstrap** — resampling with replacement to approximate the
      sampling
        distribution.
      - **Shrinkage** — pulling noisy estimates toward a common value.

      - **MCAR / MAR / MNAR** — missing completely at random, at random (given
      observed
        data), or not at random; each needs a different remedy.
      - **Posterior** — the Bayesian belief about a parameter, prior times
      likelihood.
  - heading: Tools
    markdown: >-
      - **R** — the lingua franca; the tidyverse for wrangling, the modeling
      ecosystem
        from mixed models to survival analysis.
      - **Python (pandas, statsmodels, scikit-learn)** — for analysis that lives
      next to
        production code.
      - **Stan (and brms, PyMC)** — probabilistic programming for Bayesian
      models, full
        posterior inference via HMC.
      - **SAS** — entrenched in pharma and regulated industries where validated
      procedures
        matter.
      - **Design-of-experiments tooling** — factorial designs, blocking,
      response-surface
        methods, randomization schedules.
      - **Plotting (ggplot2, matplotlib)** — EDA is visual; the graph is a
      thinking tool.

      - **Version-controlled, literate notebooks (R Markdown, Quarto, Jupyter)**
      — so the
        analysis is reproducible and auditable.
  - heading: Collaboration
    markdown: >-
      Statisticians are almost always embedded in someone else's problem — a
      clinician's

      trial, an economist's policy question, a product team's experiment. The
      most

      valuable conversation happens before data collection, preventing an
      unanswerable

      design rather than apologizing for one. The recurring friction is the
      collaborator

      who arrives with data collected and a conclusion desired. Good
      statisticians say

      "this design cannot answer that," then help reshape it. They translate
      both ways: a

      vague aim into an estimand, an interval into a layperson's decision.
  - heading: Ethics
    markdown: >-
      The statistician sits at the gate between data and belief, a position of
      easy abuse.

      The duties: report the uncertainty, not just the estimate; disclose every
      analysis

      attempted, not only the one that worked; refuse to p-hack on request;
      resist letting

      a desired conclusion drive the method. Confidentiality and
      re-identification risk are

      live whenever the data are about people. There is a duty against false
      precision —

      four decimals on a noisy number mislead as surely as a lie. The
      replication crisis

      was an ethical failure: incentives rewarded surprising over true findings.
  - heading: Scenarios
    markdown: >-
      **A vaccine trial readout.** The team reports the vaccine "significantly
      reduced

      infections, p = 0.03." The statistician asks for the effect size and its
      interval:

      relative risk reduction 12%, 95% interval 1% to 22% — barely excluding
      zero. Then

      the harder questions: how many endpoints were tested, was this the
      pre-registered

      primary, and did the arms differ at baseline because randomization was
      imperfect in

      a small trial? The writeup reports "a modest effect, imprecisely
      estimated,

      warranting a confirmatory trial" — not the certainty the p-value implies.


      **The "training program works" claim.** HR shows that employees who took
      an optional

      leadership course were promoted more often, and wants to mandate it. The
      statistician

      sketches the DAG: ambition drives both enrolling *and* getting promoted —
      a textbook

      confounder, so the association is partly, maybe entirely, selection. The
      proposal: a

      randomized pilot offering the course to a random half. If impossible,
      match on

      pre-course performance plus a sensitivity analysis for how strong an
      unmeasured

      confounder must be to erase the effect. Either way, the association is a
      hypothesis,

      not a finding.


      **Surprising A/B test winner.** A product experiment shows a new checkout
      flow

      lifting conversion by 8%, hugely significant. The statistician distrusts
      it: was the

      randomization unit correct (user, not session), did the test run a full
      weekly cycle

      to avoid day-of-week confounding, was this the only metric among forty?
      They find

      the team peeked daily and stopped the moment it crossed significance —
      optional

      stopping that inflates the false-positive rate. The fix: rerun with
      pre-committed

      sample size and one look.
  - heading: Related Occupations
    markdown: >-
      The statistician shares deep tooling and a love of uncertainty with
      several roles

      but is defined by the formal study of variation itself. Data scientists
      apply the

      same methods at scale, leaning toward prediction over inference.
      Mathematicians

      supply the probability theory statistics rests on, caring about proof
      where

      statisticians care about data. Actuaries are statisticians of risk in
      finance and

      insurance. Epidemiologists are domain statisticians of disease, fluent in
      study

      design and causal inference. Machine learning engineers optimize
      prediction and

      treat the bias–variance tradeoff operationally. Economists wield the same

      causal-inference toolkit on policy.
  - heading: References
    markdown: >-
      - *Statistical Inference* — Casella & Berger

      - *Exploratory Data Analysis* — John Tukey

      - *Bayesian Data Analysis* — Gelman, Carlin, Stern, Dunson, Vehtari &
      Rubin

      - *All of Statistics* — Larry Wasserman

      - *Causality* — Judea Pearl

      - *The Elements of Statistical Learning* — Hastie, Tibshirani & Friedman
