seqme.metrics.ClippedCoverage#

class seqme.metrics.ClippedCoverage(n_neighbors, reference, embedder, *, batch_size=256, device='cpu', strict=True, name='Clipped coverage')[source]#

Evaluates how well the reference data is covered by the synthetic samples.

This function computes the Clipped Coverage metric [1], which measures the proportion of reference samples that are covered. The metric quantifies coverage, i.e., the degree to which the synthetic data represents the full reference distribution.

Clipped Coverage is designed to be robust to outliers, and its value ranges from 0 to 1, representing the fraction of reference embeddings that are effectively covered by the synthetic data.

Clipped Coverage improves upon the Coverage metric [2].

References

[1] Salvy et al., “Enhanced Generative Model Evaluation with Clipped Density and Coverage”, 2025

(https://arxiv.org/abs/2507.01761)

[2] Naeem et al., “Reliable Fidelity and Diversity Metrics for Generative Models”, 2020

(https://arxiv.org/abs/2002.09797)

__init__(n_neighbors, reference, embedder, *, batch_size=256, device='cpu', strict=True, name='Clipped coverage')[source]#

Initialize the ClippedCoverage metric.

Constructs the reference manifold using the provided sequences and prepares the metric for evaluation. The reference manifold is approximated using nearest-neighbor balls, with radii determined by the specified number of neighbors.

Parameters:
  • n_neighbors (int) – Number of nearest neighbors used to define the radii of the nearest-neighbor balls. More neighbors result in larger radii.

  • reference (list[str]) – List of reference sequences used to build the reference manifold.

  • embedder (Callable[[list[str]], ndarray]) – Function mapping sequences to embeddings.

  • batch_size (int) – Number of samples per batch when computing distances.

  • device (str) – Compute device, e.g., "cpu" or "cuda".

  • strict (bool) – If True, enforces an equal number of evaluation and reference samples.

  • name (str) – Metric name.

Raises:
  • ValueError – If n_neighbors < 1.

  • ValueError – If reference contains fewer than 1 sequence after embedding.

__call__(sequences)[source]#

Compute the Clipped Coverage of the given sequences.

Evaluates how well the reference embeddings are covered by the provided sequences, producing a score between 0 and 1 that reflects the fraction of reference data effectively represented by the synthetic sequences.

Parameters:

sequences (list[str]) – List of sequences to evaluate.

Returns:

The Clipped Coverage score, representing the fraction of reference embeddings covered by the sequences.

Return type:

MetricResult

Methods

__init__(n_neighbors, reference, embedder, *)

Initialize the ClippedCoverage metric.

__call__(sequences)

Compute the Clipped Coverage of the given sequences.

Attributes

name

Name of the metric.

objective

Whether lower or higher scores indicate better performance.