.. _cython_acceleration: .. currentmodule:: mlquantify.matching ==================== Cython Acceleration ==================== The inner loop of histogram distribution matching (:class:`DyS`, :class:`HDy`, :class:`HDx`) is accelerated by an **optional compiled Cython kernel**. It is a pure performance optimisation: the estimates are identical with or without it, and if the compiled extension is not available the library transparently falls back to a pure-Python implementation. .. contents:: Contents :local: :depth: 2 ---- What is the "histogram sweep"? ============================== A histogram matching quantifier represents each class by a normalised histogram of classifier scores. For a binary problem it stores the positive and negative class histograms :math:`H_+` and :math:`H_-`, and estimates the positive prevalence :math:`\alpha` as the **mixing proportion** whose mixed histogram is closest to the test histogram :math:`H_U`: .. math:: \hat{\alpha} = \arg\min_{\alpha \in [0, 1]} \; D\!\left(\, \alpha H_+ + (1 - \alpha) H_- ,\; H_U \,\right) Finding that :math:`\alpha` means evaluating the distance :math:`D` at many candidate values of :math:`\alpha` — a **sweep** over the interval (a grid search, or a ternary search since the objective is unimodal). Each evaluation: 1. builds the mixture ``alpha * Hpos + (1 - alpha) * Hneg``, 2. normalises it, 3. computes a distance (Hellinger / Topsøe / prob-symm) against the test histogram. For :class:`HDy`/:class:`DyS` this sweep is repeated over **several bin counts** and the per-bin estimates are aggregated by their median. So a single ``predict`` performs **hundreds to thousands of these tiny distance evaluations**, each on short histogram vectors (typically 8–110 bins). ---- What Cython does here ===================== Those thousands of evaluations are the hot path — and because each one works on a *tiny* array, the cost is dominated by Python/numpy **per-call overhead** (``np.asarray``, ufunc dispatch, temporary allocation), not by the actual arithmetic. The Cython kernel (``mlquantify.matching._histogram_sweep``) collapses the whole ``mixture → normalise → distance → search`` into a **single compiled C function**: - typed, contiguous ``float64`` memory views (no Python objects in the loop); - the unimodal search (ternary or grid) runs **inside the C function**, so the thousands of distance evaluations never cross back into Python; - the inner loop is ``nogil`` — pure C, no interpreter overhead. The result is bit-for-bit the same :math:`\alpha`, computed without the per-iteration Python overhead. ---- Where it is more efficient — speed and memory ============================================= **Speed.** The kernel removes the Python/numpy call overhead that dominates the sweep over tiny arrays, and keeps the search in compiled C. In practice it roughly **halves** ``DyS``/``HDy`` ``predict`` time relative to the generic Python solver (more on a fine bin grid or large test sets), and the gap grows with the number of bin counts swept. **Memory.** This is where the kernel helps most relative to the pure-Python path: .. list-table:: :widths: 30 70 :header-rows: 1 * - Path - Allocation behaviour * - generic Python solver - Allocates a **fresh** mixture array (and normalised copies) on **every** :math:`\alpha` evaluation — thousands of small temporary arrays per ``predict``, all churned through the allocator and garbage collector. * - Cython sweep kernel - Allocates **one** scratch buffer of length ``n_bins`` and **reuses it** for the entire sweep — ``O(n_bins)`` working memory, zero per-iteration allocation, cache-friendly contiguous access. So the kernel turns a stream of thousands of tiny allocations into a single reused buffer, which both cuts allocator/GC pressure and keeps the working set in cache. ---- Enabling, checking and toggling =============================== The kernel is used **automatically** when the compiled extension is present (it ships in the binary wheels on PyPI). Everything still works without it — just slower — via the pure-Python fallback. .. code-block:: python import mlquantify.matching._histogram as H H._HAS_FAST_KERNEL # True if the compiled extension is built/imported H.USE_SWEEP_KERNEL # runtime on/off switch (default True) # force the generic Python solver (e.g. for debugging / benchmarking) H.USE_SWEEP_KERNEL = False # ... and back on H.USE_SWEEP_KERNEL = True If the compiled kernel is not built, the first histogram-matching ``predict`` emits a one-time :class:`PerformanceWarning`. To build it from a source checkout:: pip install -e . # compiles mlquantify/matching/_histogram_sweep.pyx .. note:: The kernel only changes *speed and memory*, never the result: the estimated prevalence is identical (to floating-point tolerance) whether it runs the compiled kernel, the pure-Python sweep, or the generic solver. .. seealso:: :ref:`distribution_matching` for the methods that use the sweep, and :ref:`representations` for the :class:`~mlquantify.representations.HistogramRepresentation` it operates on.