CLEAR: Calibrated Learning for Epistemic and Aleatoric Risk

Ilia Azizi1*  ·  Juraj Bodik1,2*  ·  Jakob Heiss2*  ·  Bin Yu2,3

1HEC, University of Lausanne   2Dept. of Statistics, UC Berkeley   3EECS, UC Berkeley

*Equal contribution

Abstract

Accurate uncertainty quantification is critical for reliable predictive modeling. Existing methods typically address either aleatoric uncertainty due to measurement noise or epistemic uncertainty resulting from limited data, but not both in a balanced manner. We propose CLEAR, a calibration method with two distinct parameters, $\gamma_1$ and $\gamma_2$, to combine the two uncertainty components and improve the conditional coverage of predictive intervals for regression tasks. CLEAR is compatible with any pair of aleatoric and epistemic estimators; we show how it can be used with (i) quantile regression for aleatoric uncertainty and (ii) ensembles drawn from the Predictability–Computability–Stability (PCS) framework for epistemic uncertainty. Across 17 diverse real-world datasets, CLEAR achieves an average improvement of 28.2% and 17.4% in the interval width compared to the two individually calibrated baselines while maintaining nominal coverage. Similar improvements are observed when applying CLEAR to Deep Ensembles (epistemic) and Simultaneous Quantile Regression (aleatoric). The benefits are especially evident in scenarios dominated by high aleatoric or epistemic uncertainty.

Aleatoric vs. epistemic uncertainty and CLEAR prediction intervals

Left: Aleatoric uncertainty (blue) reflects irreducible data noise; epistemic uncertainty (red) is large in extrapolation regions with limited training data. Right: CLEAR combines both sources in a data-driven manner, yielding tighter and better-calibrated prediction intervals.

Method

CLEAR constructs prediction intervals by adaptively combining aleatoric (data noise) and epistemic (model) uncertainty through two calibration parameters, $\gamma_1$ and $\lambda$.

$$C(x) = \left[\,\hat{f}(x)\;\pm\;\Bigl(\underbrace{\gamma_1}_{\substack{\text{coverage}\\\text{parameter}}} \cdot \underbrace{\mathrm{ale}_{\pm}(x)}_{\text{\color{#3CD2E1}data noise}} \;+\; \underbrace{\lambda\gamma_1}_{\substack{\text{balance}\\\text{parameter}}} \cdot \underbrace{\mathrm{epi}_{\pm}(x)}_{\text{\color{#3CD2E1}model unc.}}\Bigr)\,\right]$$
$\gamma_1$
Calibration parameter; ensures marginal $(1-\alpha)$ coverage for any finite sample via conformal guarantee.
$\lambda = 0$
Interval reduces to aleatoric uncertainty only.
$\lambda \to \infty$
Interval reduces to epistemic uncertainty only.
$\lambda^\star$
Data-driven optimal trade-off, selected by minimizing quantile loss on $\mathcal{D}_\mathrm{val}$.
$\gamma_1$ — Coverage

Ensures marginal $(1-\alpha)$ coverage for any finite sample via a conformal guarantee. Calibrated on a held-out set.

$\lambda$ — Balance

Balances aleatoric vs. epistemic contribution. Selected on a validation set by minimizing quantile loss.

Model-agnostic

Works with any pair of aleatoric and epistemic estimators: CQR + PCS, SQR + Deep Ensembles, and more.

Algorithm (4 steps)

  1. Estimate epistemic uncertainty on $\mathcal{D}_\text{train}$ — PCS ensembles yield upper/lower bounds.
  2. Estimate aleatoric uncertainty on $\mathcal{D}_\text{train}$ — quantile regression on residuals.
  3. Calibrate on $\mathcal{D}_\text{cal}$ — find smallest $\gamma_1$ for each $\lambda$ achieving $\geq(1-\alpha)$ coverage.
  4. Select $\lambda^\star$ on $\mathcal{D}_\text{val}$ — pick $\lambda$ minimising quantile loss (tightest valid intervals).

Contributions

  1. We introduce two calibration parameters, $\gamma_1$ and $\gamma_2 = \lambda\gamma_1$, that jointly guarantee marginal coverage and adaptively balance the scale of aleatoric and epistemic uncertainty — the first method to do so.
  2. We show that fitting quantile regression on model residuals $Y_i - \hat{f}(X_i)$, rather than directly on targets, yields substantially more stable aleatoric uncertainty estimates.
  3. We combine PCS ensemble perturbation intervals with conformalized quantile regression (CQR) for the first time, and demonstrate the strength of this pairing across diverse settings.
  4. We conduct large-scale uncertainty quantification benchmarks on 17 real-world regression datasets over 10 random seeds, with multiple model variants and comparisons to state-of-the-art baselines.

Results

17 Real-world Datasets

We benchmark across 17 real-world regression datasets, 10 random splits (60%/20%/20%), and multiple model and estimator variants, at 95% nominal coverage. CLEAR is compared against two individually calibrated baselines: an aleatoric-only approach using Conformalized Quantile Regression (CQR) and an epistemic-only PCS ensemble. Results are reported in Normalized Calibrated Interval Width (NCIW) and quantile loss; lower is better in both.

+28.2%
narrower intervals vs.
aleatoric-only baseline
+17.4%
narrower intervals vs.
epistemic-only baseline
15/17
datasets where CLEAR
is the top method
CLEAR benchmark: NCIW and Quantile Loss across 17 real-world datasets

Quantile loss and NCIW across 17 real-world datasets, averaged over 10 seeds, normalized relative to CLEAR (= 1.0 baseline). Lower is better; error bars show ±1σ. Inset boxplot: average % relative increase over CLEAR. EPISTEMIC = PCS ensemble; ALEATORIC = bootstrapped CQR; ALEATORIC-R = CQR on residuals.

Results also hold with Deep Ensembles + Simultaneous Quantile Regression: CLEAR improves interval width (NCIW) by 28.6% and 13.4% over the two baselines, confirming generalizability beyond PCS + CQR.

Case Study

Ames Housing: Adaptive $\lambda$

Data processing can significantly affect the uncertainty. We take the Ames Housing dataset and deliberately vary the feature set to engineer controlled uncertainty regimes. Restricting to the top 2 predictors (originally 80) deprives the model of information, inducing high aleatoric uncertainty. Using all features with richer data processing (PCS pipeline) reduces it and shifts dominance to epistemic uncertainty. CLEAR's $\lambda$ correctly identifies the dominant source in each regime and adjusts accordingly.

Key insight
  • 2 features → aleatoric dominates ($\lambda \approx 0.64$)
  • All features → epistemic dominates ($\lambda \approx 14.5$)
  • $\lambda$ adapts automatically to the data regime
Setting Method Width ($) Coverage
2 features (high aleatoric)
PCS 107,880 87%
CQR 104,741 90%
CLEAR 95,177 89%
All features (high epistemic)
PCS 57,594 89%
CQR 62,398 88%
CLEAR 55,910 88%

Target: 90% coverage. CLEAR achieves the best or near-best interval width in both regimes.

Get Started

Quick Install

$ pip install clear-uq

Citation

BibTeX

@inproceedings{azizi2026clear,
  title     = {{CLEAR}: Calibrated Learning for Epistemic and Aleatoric Risk},
  author    = {Ilia Azizi and Juraj Bodik and Jakob Heiss and Bin Yu},
  booktitle = {The Fourteenth International Conference on Learning Representations},
  year      = {2026},
  url       = {https://openreview.net/forum?id=RY4IHaDLik}
}