10. Visualization#

The mlquantify.visualization module provides a small collection of plotting helpers that follow the scikit-learn *Display convention: every class has from_predictions / from_estimator / from_protocol constructors, a plot method that returns the display, and stores the matplotlib ax_ and figure_ it drew on. Any extra keyword arguments are forwarded to the underlying matplotlib artist, so you can restyle a plot (color, alpha, line width, …) straight from the constructor, and pass your own ax= to compose several plots in one figure.

Every example below is self-contained — use the copy button in the top-right corner of each code block to run it as-is — and each one passes a matplotlib styling keyword to show how the plots are customised.

The displays fall into two groups:

Multiple-sample displays summarise a whole evaluation protocol run (many test samples with varying prevalences) — DiagonalDisplay, BiasDisplay, ErrorByShiftDisplay.
Single-sample displays inspect one prediction — PrevalenceDisplay, ConfidenceRegionDisplay.

The subpackage is not imported by import mlquantify (so matplotlib stays off the top-level import path); import it explicitly:

from mlquantify.visualization import DiagonalDisplay

10.1. Multiple-sample displays#

10.1.1. Diagonal plot#

The signature quantification diagnostic: predicted prevalence against true prevalence for every protocol sample, with the \(y = x\) reference line. Points above the diagonal are over-estimates, points below are under-estimates; tight clustering around the line marks a good quantifier. Styling keywords such as color, alpha and s are forwarded to ax.scatter.

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification

from mlquantify.counting import ACC
from mlquantify.model_selection import apply_protocol
from mlquantify.visualization import DiagonalDisplay

X, y = make_classification(n_samples=2000, weights=[0.6, 0.4], random_state=0)

# Artificial Prevalence Protocol: fit once, predict on many test samples.
results = apply_protocol(
    ACC(LogisticRegression(max_iter=1000)), X, y, protocol="app",
    n_prevalences=21, repeats=5, batch_size=100, random_state=0,
)

disp = DiagonalDisplay.from_predictions(
    results["true_prevalences"], results["predicted_prevalences"],
    color="#2a9d8f", alpha=0.6, s=20,
)
disp.ax_.set_title("ACC — diagonal plot")

Note

from_protocol runs the protocol for you in a single call:

DiagonalDisplay.from_protocol(ACC(LogisticRegression()), X, y,
                              protocol="app", n_prevalences=21)

10.1.2. Bias boxplots#

BiasDisplay shows the signed error (predicted - true). With bins set, the samples are grouped into bins of the true prevalence, exposing how the bias drifts along the range — a box consistently above zero reveals systematic over-estimation. Extra keywords are forwarded to ax.boxplot (here patch_artist / boxprops to colour the boxes).

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification

from mlquantify.counting import ACC
from mlquantify.model_selection import apply_protocol
from mlquantify.visualization import BiasDisplay

X, y = make_classification(n_samples=2000, weights=[0.6, 0.4], random_state=0)

results = apply_protocol(
    ACC(LogisticRegression(max_iter=1000)), X, y, protocol="app",
    n_prevalences=21, repeats=5, batch_size=100, random_state=0,
)

disp = BiasDisplay.from_predictions(
    results["true_prevalences"], results["predicted_prevalences"],
    bins=5, patch_artist=True,
    boxprops=dict(facecolor="#e9c46a", alpha=0.8),
    medianprops=dict(color="#264653", linewidth=2),
)
disp.ax_.set_title("ACC — bias by true prevalence")

10.1.3. Error by prior-probability shift#

ErrorByShiftDisplay plots a quantification error metric as a function of how far the test prevalence drifts from the training prevalence, with a ±std band — the standard way to read a quantifier’s robustness to distribution shift. Keywords such as color, marker and linewidth are forwarded to ax.plot.

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification

from mlquantify.counting import ACC
from mlquantify.model_selection import apply_protocol
from mlquantify.visualization import ErrorByShiftDisplay

X, y = make_classification(n_samples=2000, weights=[0.6, 0.4], random_state=0)

results = apply_protocol(
    ACC(LogisticRegression(max_iter=1000)), X, y, protocol="upp",
    n_prevalences=200, batch_size=100, random_state=0,
)
_, counts = np.unique(y, return_counts=True)
train_prevalence = counts / counts.sum()

ErrorByShiftDisplay.from_predictions(
    results["true_prevalences"], results["predicted_prevalences"],
    train_prevalence=train_prevalence, error_metric="ae", name="ACC",
    color="#e76f51", marker="s", linewidth=2,
)

10.2. Single-sample displays#

10.2.1. Prevalence bars#

For a single test sample, PrevalenceDisplay draws the predicted per-class prevalence, optionally next to the ground truth. The color keyword (and any other ax.bar keyword) styles the predicted bars.

from mlquantify.visualization import PrevalenceDisplay

PrevalenceDisplay.from_predictions(
    [0.18, 0.55, 0.27],
    true_prevalence=[0.20, 0.50, 0.30],
    class_names=["setosa", "versicolor", "virginica"],
    color="#457b9d",
)

Note

from_estimator predicts with a fitted quantifier for you:

PrevalenceDisplay.from_estimator(fitted_quantifier, X_sample)

10.2.2. Confidence regions#

ConfidenceRegionDisplay visualises the uncertainty of a single prediction from a set of bootstrap prevalence estimates (for instance from AggregativeBootstrap, or via construct_confidence_region). For a 3-class problem it draws a confidence ellipse on the probability simplex; for any other number of classes it falls back to per-class intervals. The color / alpha keywords style the bootstrap point cloud.

import numpy as np
from mlquantify.visualization import ConfidenceRegionDisplay

# 500 bootstrap prevalence estimates for one 3-class prediction.
rng = np.random.default_rng(0)
prev_estims = rng.dirichlet([40, 25, 35], size=500)

ConfidenceRegionDisplay.from_estimates(
    prev_estims, confidence_level=0.95,
    class_names=["A", "B", "C"], true_prevalence=[0.45, 0.25, 0.30],
    color="#1d3557", alpha=0.25,
)

If you already have a fitted region object, use from_region instead:

from mlquantify.confidence import construct_confidence_region

region = construct_confidence_region(prev_estims, method="ellipse")
ConfidenceRegionDisplay.from_region(region, class_names=["A", "B", "C"])

10.3. Combining plots on one figure#

Because every display accepts an ax, plots compose like any other matplotlib artist — pass your own axes to draw several side by side:

import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification

from mlquantify.counting import ACC
from mlquantify.model_selection import apply_protocol
from mlquantify.visualization import DiagonalDisplay, BiasDisplay

X, y = make_classification(n_samples=2000, weights=[0.6, 0.4], random_state=0)
results = apply_protocol(
    ACC(LogisticRegression(max_iter=1000)), X, y, protocol="app",
    n_prevalences=21, repeats=5, batch_size=100, random_state=0,
)
true_prev = results["true_prevalences"]
pred_prev = results["predicted_prevalences"]

fig, axes = plt.subplots(1, 2, figsize=(11, 4.5))

DiagonalDisplay.from_predictions(
    true_prev, pred_prev, ax=axes[0], color="#2a9d8f", alpha=0.6, s=20,
)
axes[0].set_title("Diagonal")

BiasDisplay.from_predictions(true_prev, pred_prev, ax=axes[1], bins=5)
axes[1].set_title("Per-class bias")

fig.tight_layout()