seqme.metrics.ClippedDensity#

class seqme.metrics.ClippedDensity(n_neighbors, reference, embedder, *, batch_size=256, device='cpu', strict=True, name='Clipped density')[source]#

Evaluates how realistic synthetic samples are compared to reference data.

Computes the Clipped Density metric [1], which measures the realism of synthetic samples by assessing how closely each sample aligns with the reference data in the embedding space. This metric quantifies fidelity, i.e., the degree to which synthetic samples resemble the true data distribution. To achieve this, the reference manifold is approximated using nearest-neighbor balls, with radii chosen to be robust to outliers.

Clipped Density is designed to be robust to outliers, and its value ranges from 0 to 1, representing the fraction of synthetic samples that lie on the reference manifold and are therefore considered realistic.

Clipped Density improves upon the Density metric [2].

References

[1] Salvy et al., “Enhanced Generative Model Evaluation with Clipped Density and Coverage”, 2025

(https://arxiv.org/abs/2507.01761)

[2] Naeem et al., “Reliable Fidelity and Diversity Metrics for Generative Models”, 2020

(https://arxiv.org/abs/2002.09797)

__init__(n_neighbors, reference, embedder, *, batch_size=256, device='cpu', strict=True, name='Clipped density')[source]#

Initialize the ClippedDensity metric.

Constructs the reference manifold using the provided sequences and prepares the metric for evaluation. The reference manifold is approximated using nearest-neighbor balls, with radii determined by the specified number of neighbors.

Parameters:
  • n_neighbors (int) – Number of nearest neighbors used to define the radii of the nearest-neighbor balls. More neighbors result in larger radii.

  • reference (list[str]) – List of reference sequences used to build the reference manifold.

  • embedder (Callable[[list[str]], ndarray]) – Function mapping sequences to embeddings.

  • batch_size (int) – Number of samples per batch when computing distances.

  • device (str) – Compute device, e.g., "cpu" or "cuda".

  • strict (bool) – If True, enforces an equal number of evaluation and reference samples.

  • name (str) – Metric name.

Raises:
  • ValueError – If n_neighbors < 1.

  • ValueError – If reference contains fewer than 1 sequence after embedding.

__call__(sequences)[source]#

Compute the Clipped Density of the given sequences.

Evaluates how many of the provided sequences lie on or near the reference manifold, producing a score between 0 and 1 that reflects their realism relative to the reference data.

Parameters:

sequences (list[str]) – List of sequences to evaluate.

Returns:

The Clipped Density score, representing the fraction of realistic sequences.

Return type:

MetricResult

Methods

__init__(n_neighbors, reference, embedder, *)

Initialize the ClippedDensity metric.

__call__(sequences)

Compute the Clipped Density of the given sequences.

Attributes

name

Name of the metric.

objective

Whether lower or higher scores indicate better performance.