seqme.models.ESM2

seqme.models.ESM2#

class seqme.models.ESM2(model_name, *, device=None, batch_size=256, cache_dir=None, verbose=False)[source]#

Wrapper for the ESM2 protein/peptide embedding model.

Computes sequence-level embeddings by averaging token embeddings excluding [CLS] and [EOS] tokens.

Installation: pip install "seqme[esm2]"

Reference:

Lin et al., “Language models of protein sequences at the scale of evolution enable accurate structure prediction” (https://www.biorxiv.org/content/10.1101/2022.07.20.500902v3)

__init__(model_name, *, device=None, batch_size=256, cache_dir=None, verbose=False)[source]#

Initialize the model.

Parameters:
  • model_name (ESM2Checkpoint | str) – Model checkpoint name or enum.

  • device (Optional[str]) – Device to run inference on, e.g., "cuda" or "cpu".

  • batch_size (int) – Number of sequences to process per batch.

  • cache_dir (Optional[str]) – Directory to cache the model.

  • verbose (bool) – Whether to display a progress bar.

__call__(sequences)[source]#

Call self as a function.

Return type:

ndarray

Methods

__init__(model_name, *[, device, ...])

Initialize the model.

__call__(sequences)

Call self as a function.

compute_pseudo_perplexity(sequences[, mask_size])

Compute pseudo-perplexity for a list of sequences, masking mask_size positions per pass.

embed(sequences[, layer])

Compute embeddings of amino acid sequences.