seqme.metrics.KLDivergence#

class seqme.metrics.KLDivergence(reference, predictor, *, n_draws=10000, kde_bandwidth='silverman', seed=0, name='KL-divergence')[source]#

KL-divergence between samples and reference for a single property.

This metric measures how much the empirical distribution of a property \(f(x)\) in the generated samples deviates from the corresponding reference distribution.

The KL-divergence is defined as:

\[\mathrm{KL}\big(p_{f(\mathrm{ref})} \,\|\, p_{f(\mathrm{gen})}\big) = \int p_{f(\mathrm{ref})}(y) \log \frac{p_{f(\mathrm{ref})}(y)}{p_{f(\mathrm{gen})}(y)} \, dy,\]

where \(p_{f(\mathrm{ref})}\) denotes the reference distribution and \(p_{f(\mathrm{gen})}\) denotes the generated distribution.

The KL-divergence is approximated using Monte-Carlo sampling.

__init__(reference, predictor, *, n_draws=10000, kde_bandwidth='silverman', seed=0, name='KL-divergence')[source]#

Initialize the metric.

Parameters:

reference (list[str]) – Reference sequences assumed to represent the target distribution.
predictor (Callable[[list[str]], ndarray]) – Predictor function which returns a 1D NumPy array. One value per sequence.
n_draws (int) – Number of Monte Carlo samples to draw from reference distribution.
kde_bandwidth (Union[float, Literal['scott', 'silverman']]) – Bandwidth parameter for the Gaussian KDE.
seed (int) – Seed for KL-divergence Monte-Carlo sampling.
name (str) – Metric name.

__call__(sequences)[source]#

Compute the KL-divergence between reference and sequence predictor.

Parameters:: sequences (list[str]) – Sequences to evaluate.
Returns:: KL-divergence and standard error.
Return type:: MetricResult

Methods

`__init__`(reference, predictor, *[, n_draws, ...])	Initialize the metric.
`__call__`(sequences)	Compute the KL-divergence between reference and sequence predictor.

Attributes

`name`	Name of the metric.
`objective`	Whether lower or higher scores indicate better performance.

seqme.metrics.KLDivergence

Contents

seqme.metrics.KLDivergence#