seqme.metrics.KLDivergence#
- class seqme.metrics.KLDivergence(reference, predictor, *, n_draws=10000, kde_bandwidth='silverman', seed=0, name='KL-divergence')[source]#
KL-divergence between samples and reference for a single property.
This metric measures how much the empirical distribution of a property \(f(x)\) in the generated samples deviates from the corresponding reference distribution.
The KL-divergence is defined as:
\[\mathrm{KL}\big(p_{f(\mathrm{ref})} \,\|\, p_{f(\mathrm{gen})}\big) = \int p_{f(\mathrm{ref})}(y) \log \frac{p_{f(\mathrm{ref})}(y)}{p_{f(\mathrm{gen})}(y)} \, dy,\]where \(p_{f(\mathrm{ref})}\) denotes the reference distribution and \(p_{f(\mathrm{gen})}\) denotes the generated distribution.
The KL-divergence is approximated using Monte-Carlo sampling.
- __init__(reference, predictor, *, n_draws=10000, kde_bandwidth='silverman', seed=0, name='KL-divergence')[source]#
Initialize the metric.
- Parameters:
reference (
list[str]) – Reference sequences assumed to represent the target distribution.predictor (
Callable[[list[str]],ndarray]) – Predictor function which returns a 1D NumPy array. One value per sequence.n_draws (
int) – Number of Monte Carlo samples to draw from reference distribution.kde_bandwidth (
Union[float,Literal['scott','silverman']]) – Bandwidth parameter for the Gaussian KDE.seed (
int) – Seed for KL-divergence Monte-Carlo sampling.name (
str) – Metric name.
- __call__(sequences)[source]#
Compute the KL-divergence between reference and sequence predictor.
- Parameters:
- Returns:
KL-divergence and standard error.
- Return type:
Methods
Attributes
|
Name of the metric. |
|
Whether lower or higher scores indicate better performance. |