Metrics#

seqme provides a unified framework for evaluating sequences across three metric spaces — sequence, embedding, and property — along with a few general-purpose utilities.

Sequence-based Metrics#

Metrics that operate directly on the raw sequences.

seqme.metrics.Diversity

Measures the diversity of synthetic sequences using normalized pairwise Levenshtein distance.

seqme.metrics.Uniqueness

Fraction of unique sequences within a provided list of sequences.

seqme.metrics.Novelty

Fraction of sequences not in the reference set.

seqme.metrics.NGramJaccardSimilarity

Average Jaccard similarity between each generated sequence and a reference corpus, based on n-grams of size n, using |A ∩ R| / |A ∪ R|.

Embedding-based Metrics#

Metrics that compare or assess distributions in an embedding (vector) space.

seqme.metrics.FBD

Fréchet Biological Distance (FBD) between a set of generated sequences and a reference dataset based on their embeddings.

seqme.metrics.MMD

Maximum Mean Discrepancy (MMD) metric using a Gaussian kernel.

seqme.metrics.KID

Kernel Inception Distance (KID).

seqme.metrics.Precision

Evaluates how realistic synthetic samples are compared to reference data.

seqme.metrics.Recall

Evaluates how well the reference data is covered by the generated sequences.

seqme.metrics.ClippedDensity

Evaluates how realistic synthetic samples are compared to reference data.

seqme.metrics.ClippedCoverage

Evaluates how well the reference data is covered by the synthetic samples.

seqme.metrics.AuthPct

Proportion of authentic generated samples.

seqme.metrics.FKEA

Fourier-based Kernel Entropy Approximation (FKEA) approximates the VENDI-score and RKE-score using random Fourier features.

Property-based Metrics#

Metrics computed on derived physicochemical or predicted properties.

seqme.metrics.ID

Applies a user-provided predictor to a list of sequences and returns the mean and standard error of the predictors outputs.

seqme.metrics.Threshold

Fraction of sequences with property within [min, max] a user-defined threshold.

seqme.metrics.HitRate

Fraction of sequences that satisfy a user-defined condition.

seqme.metrics.Hypervolume

Computes the Hypervolume metric for multi-objective optimization.

seqme.metrics.ConformityScore

Distributional conformity score.

seqme.metrics.KLDivergence

KL-divergence between samples and reference for a single property.

Miscellaneous#

General or utility metrics that don’t fit into the main categories.

seqme.metrics.Fold

A wrapper for any metric, which splits the sequences into non-overlapping subsets, computes the metric on each split and aggregates the results.

seqme.metrics.Subset

A wrapper to approximate expensive metrics by evaluating a subset of the sequences in a group.

seqme.metrics.Count

Number of sequences.

seqme.metrics.Length

Average sequence length.

Supported sequence types#

At-a-glance matrix of all metrics and supported sequence types.

✓ — supported, ✗ — not supported