seqme.metrics.FKEA#
- class seqme.metrics.FKEA(embedder, bandwidth, *, alpha=2, n_random_fourier_features=2048, batch_size=256, device='cpu', seed=0, strict=True, name='FKEA')[source]#
Fourier-based Kernel Entropy Approximation (FKEA) approximates the VENDI-score and RKE-score using random Fourier features.
This is a reference-free method to estimate diversity in a set of generated sequences. It is positively correlated with the number of distinct modes or clusters in the embedding space, without requiring access to real/reference data.
The method works by projecting embeddings into a randomized Fourier feature space, approximating the Gaussian kernel, and computing the α-norm of the normalized kernel eigenvalues.
If alpha=2, this corresponds to the RKE-score.
If alpha≠2, this corresponds to the VENDI-α score.
References
- [1] Friedman et al., The Vendi Score: A Diversity Evaluation Metric for Machine Learning, 2023
- [2] Ospanov, Zhang, Jalali et al., “Towards a Scalable Reference-Free Evaluation of Generative Models”, 2024
- __init__(embedder, bandwidth, *, alpha=2, n_random_fourier_features=2048, batch_size=256, device='cpu', seed=0, strict=True, name='FKEA')[source]#
Initialize the metric with an embedding function and kernel bandwidth.
- Parameters:
embedder (
Callable[[list[str]],ndarray]) – A function that maps a list of sequences to a 2D NumPy array of embeddings.bandwidth (
float) – Bandwidth parameter for the Gaussian kernel.alpha (
float|int) – alpha-norm of the normalized kernels eigenvalues. Ifalpha=2then it corresponds to the RKE-score otherwise VENDI-alpha.n_random_fourier_features (
int|None) – Number of random Fourier features. Used to approximate the kernel function. Consider increasing this to get a better approximation. IfNone, use the exact kernel covariance matrix.batch_size (
int) – Number of samples per batch when computing the kernel.device (
str) – Compute device, e.g.,"cpu"or"cuda".seed (
int) – Seed for deterministic sampling of Fourier features.strict (
bool) – Enforce equal number of samples for computation.name (
str) – Metric name.
- __call__(sequences)[source]#
Computes FKEA of the input sequences.
- Parameters:
- Returns:
FKEA score.
- Return type:
Methods
Attributes
|
Name of the metric. |
|
Whether lower or higher scores indicate better performance. |