seqme.metrics.NGramJaccardSimilarity#

class seqme.metrics.NGramJaccardSimilarity(reference, n, *, objective='minimize', name='Jaccard-similarity')[source]#

Average Jaccard similarity between each generated sequence and a reference corpus, based on n-grams of size n, using |A ∩ R| / |A ∪ R|.

You can choose to 'minimize' (novelty) or 'maximize' (overlap) via the objective parameter.

__init__(reference, n, *, objective='minimize', name='Jaccard-similarity')[source]#

Initialize the metric.

Parameters:
  • reference (list[str]) – list of strings to build the reference n-gram set.

  • n (int) – size of the n-grams.

  • objective (Literal['minimize', 'maximize']) – "minimize" to reward novelty, "maximize" to reward overlap.

  • name (str) – Metric name.

__call__(sequences)[source]#

Compute the average Jaccard similarity between each generated sequence and a reference corpus, based on n-grams of size n.

Parameters:

sequences (list[str]) – Sequences to evaluate.

Returns:

Jaccard similarity.

Return type:

MetricResult

Methods

__init__(reference, n, *[, objective, name])

Initialize the metric.

__call__(sequences)

Compute the average Jaccard similarity between each generated sequence and a reference corpus, based on n-grams of size n.

Attributes

name

Name of the metric.

objective

Whether lower or higher scores indicate better performance.