seqme.models.RNAFM

seqme.models.RNAFM#

class seqme.models.RNAFM(*, model_name='mRNA', device=None, batch_size=256, verbose=False)[source]#

A language model trained on RNA sequences, which computes sequence-level embeddings by averaging token embeddings.

Two checkpoints are available:
  • mRNA: 239M parameters, 12 layers, embedding dim 1280, trained on 45 million mRNA coding sequences (CDS). Must be codon aligned.

  • ncRNA: 99M parameters, 12 layers, embedding dim 640, trained on 23.7 million non-coding RNA (ncRNA) sequences.

Installation: pip install "seqme[rnafm]"

Reference:

Chen et al., “Interpretable RNA Foundation Model from Unannotated Data for Highly Accurate RNA Structure and Function Predictions” (https://arxiv.org/pdf/2204.00300)

__init__(*, model_name='mRNA', device=None, batch_size=256, verbose=False)[source]#

Initialize model.

Parameters:
  • model_name (Literal['mRNA', 'ncRNA']) – Either a mRNA or ncRNA checkpoint.

  • device (Optional[str]) – Device to run inference on, e.g., "cuda" or "cpu".

  • batch_size (int) – Number of sequences to process per batch.

  • verbose (bool) – Whether to display a progress bar.

__call__(sequences)[source]#

Call self as a function.

Return type:

ndarray

Methods

__init__(*[, model_name, device, ...])

Initialize model.

__call__(sequences)

Call self as a function.

embed(sequences[, layer])

Compute embeddings for the RNA sequences.