seqme.models.GENALM

seqme.models.GENALM#

class seqme.models.GENALM(model_name, *, device=None, batch_size=256, cache_dir=None, verbose=False)[source]#

GENA-LM is a family of Open-Source Foundational Models for Long DNA Sequences trained on human DNA sequence.

Computes sequence-level embeddings by averaging token embeddings.

Installation: pip install "seqme[genalm]"

Reference:

Fishman et al., “GENA-LM: a family of open-source foundational DNA language models for long sequences” (https://academic.oup.com/nar/article/53/2/gkae1310/7954523)

__init__(model_name, *, device=None, batch_size=256, cache_dir=None, verbose=False)[source]#

Initialize model.

Parameters:
  • model_name (GENALMCheckpoint) – Model checkpoint name.

  • device (Optional[str]) – Device to run inference on, e.g., "cuda" or "cpu".

  • batch_size (int) – Number of sequences to process per batch.

  • cache_dir (Optional[str]) – Directory to cache the model.

  • verbose (bool) – Whether to display a progress bar.

__call__(sequences)[source]#

Call self as a function.

Return type:

ndarray

Methods

__init__(model_name, *[, device, ...])

Initialize model.

__call__(sequences)

Call self as a function.

classify(sequences)

Classify a list of sequences.

embed(sequences[, layer])

Compute embeddings for a list of sequences.