seqme.models.ESM2#

class seqme.models.ESM2(model_name, *, device=None, batch_size=256, cache_dir=None, verbose=False)[source]#

Wrapper for the ESM2 protein/peptide embedding model.

Computes sequence-level embeddings by averaging token embeddings excluding [CLS] and [EOS] tokens.

Installation: pip install "seqme[esm2]"

Reference:: Lin et al., “Language models of protein sequences at the scale of evolution enable accurate structure prediction” (https://www.biorxiv.org/content/10.1101/2022.07.20.500902v3)

__init__(model_name, *, device=None, batch_size=256, cache_dir=None, verbose=False)[source]#

Initialize the model.

Parameters:

model_name (ESM2Checkpoint | str) – Model checkpoint name or enum.
device (Optional[str]) – Device to run inference on, e.g., "cuda" or "cpu".
batch_size (int) – Number of sequences to process per batch.
cache_dir (Optional[str]) – Directory to cache the model.
verbose (bool) – Whether to display a progress bar.

__call__(sequences)[source]#

Call self as a function.

Methods

`__init__`(model_name, *[, device, ...])	Initialize the model.
`__call__`(sequences)	Call self as a function.
`compute_pseudo_perplexity`(sequences[, mask_size])	Compute pseudo-perplexity for a list of sequences, masking `mask_size` positions per pass.
`embed`(sequences[, layer])	Compute embeddings of amino acid sequences.

seqme.models.ESM2