seqme.metrics.FKEA

seqme.metrics.FKEA#

class seqme.metrics.FKEA(embedder, bandwidth, *, alpha=2, n_random_fourier_features=2048, batch_size=256, device='cpu', seed=0, strict=True, name='FKEA')[source]#

Fourier-based Kernel Entropy Approximation (FKEA) approximates the VENDI-score and RKE-score using random Fourier features.

This is a reference-free method to estimate diversity in a set of generated sequences. It is positively correlated with the number of distinct modes or clusters in the embedding space, without requiring access to real/reference data.

The method works by projecting embeddings into a randomized Fourier feature space, approximating the Gaussian kernel, and computing the α-norm of the normalized kernel eigenvalues.

  • If alpha=2, this corresponds to the RKE-score.

  • If alpha≠2, this corresponds to the VENDI-α score.

References

[1] Friedman et al., The Vendi Score: A Diversity Evaluation Metric for Machine Learning, 2023

(https://arxiv.org/abs/2210.02410)

[2] Ospanov, Zhang, Jalali et al., “Towards a Scalable Reference-Free Evaluation of Generative Models”, 2024

(https://arxiv.org/pdf/2407.02961)

__init__(embedder, bandwidth, *, alpha=2, n_random_fourier_features=2048, batch_size=256, device='cpu', seed=0, strict=True, name='FKEA')[source]#

Initialize the metric with an embedding function and kernel bandwidth.

Parameters:
  • embedder (Callable[[list[str]], ndarray]) – A function that maps a list of sequences to a 2D NumPy array of embeddings.

  • bandwidth (float) – Bandwidth parameter for the Gaussian kernel.

  • alpha (float | int) – alpha-norm of the normalized kernels eigenvalues. If alpha=2 then it corresponds to the RKE-score otherwise VENDI-alpha.

  • n_random_fourier_features (int | None) – Number of random Fourier features. Used to approximate the kernel function. Consider increasing this to get a better approximation. If None, use the exact kernel covariance matrix.

  • batch_size (int) – Number of samples per batch when computing the kernel.

  • device (str) – Compute device, e.g., "cpu" or "cuda".

  • seed (int) – Seed for deterministic sampling of Fourier features.

  • strict (bool) – Enforce equal number of samples for computation.

  • name (str) – Metric name.

__call__(sequences)[source]#

Computes FKEA of the input sequences.

Parameters:

sequences (list[str]) – Sequences to evaluate.

Returns:

FKEA score.

Return type:

MetricResult

Methods

__init__(embedder, bandwidth, *[, alpha, ...])

Initialize the metric with an embedding function and kernel bandwidth.

__call__(sequences)

Computes FKEA of the input sequences.

Attributes

name

Name of the metric.

objective

Whether lower or higher scores indicate better performance.