seqme.metrics.AuthPct

seqme.metrics.AuthPct#

class seqme.metrics.AuthPct(train_set, embedder, *, name='Authenticity')[source]#

Proportion of authentic generated samples.

Authenticity is defined as the fraction of sequences whose nearest training neighbor is closer to some other training sample than to the sequence.

References

[1] Alaa et al., “How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating and Auditing Generative Models.” (2022).

(https://arxiv.org/abs/2102.08921)

__init__(train_set, embedder, *, name='Authenticity')[source]#

Initialize the metric.

Parameters:
  • train_set (list[str]) – List of sequences used to train the generative model.

  • embedder (Callable[[list[str]], ndarray]) – A function mapping a list of sequences to a 2D NumPy array of embeddings.

  • name (str) – Metric name.

__call__(sequences)[source]#

Compute the authenticity score based on the embeddings of the input sequences and the train set.

Parameters:

sequences (list[str]) – Sequences to evaluate.

Returns:

Authenticity score.

Return type:

MetricResult

Methods

__init__(train_set, embedder, *[, name])

Initialize the metric.

__call__(sequences)

Compute the authenticity score based on the embeddings of the input sequences and the train set.

Attributes

name

Name of the metric.

objective

Whether lower or higher scores indicate better performance.