Third-party models#

In this notebook, we show how to integrate “external” models into seqme.

The full list of external models can be found here: seqme-thirdparty.

from functools import partial

import seqme as sm

Some models of interest are not available through e.g., PyPI or Huggingface - only the git repository may be available. Here we show how to run such models in seqme.

An external model is compatible with seqme if it is setup using uv (lockfile, python version defined), and defines an entry point (function).

Setup a project using:

uv init --package hello-model

Let’s use a toy model in a github repository satisfying all the requirements. To do so, we need to define the function entry point, repository url and the path which stores the repository.

If uv or git are not on the PATH of your Jupyter kernel (common when using a virtual environment or running the notebook from an IDE), you can specify their paths explicitly:

UV_PATH = "/Users/rasmus.larsen/.local/bin/uv"
GIT_PATH = "/usr/bin/git"

hello_model = sm.models.ThirdPartyModel(
    entry_point="hello_model.model:embed",
    path="../thirdparty/hello-model",
    url="https://github.com/szczurek-lab/seqme-thirdparty",
    branch="main",
    uv=UV_PATH,
    git=GIT_PATH,
)

You can get access to the function documentation directly.

hello_model.help()

embed(sequences: list[str], batch_size: int = None) -> numpy.ndarray
File: /Users/rasmus.larsen/work/hackathon-2025/seqme/docs/thirdparty/hello-model/src/hello_model/model.py

Embed sequences into fixed-size vectors using precomputed weights.

Args:
    sequences: List of string sequences to embed.
    batch_size: Unused. Reserved for future batched processing.

Returns:
    Array of shape (len(sequences), embedding_dim) where each row is
    the embedding for the corresponding input sequence.

ThirdPartyModel clones the model repository and installs the dependencies first time running the model.

Assuming everything went well, let’s now compute a metric using this embedding model.

hello_model(["MKQW", "RKSPL"], batch_size=32)

array([[44.,  8., 12., 32.],
       [55., 10., 15., 40.]])

sequences = {
    "HydrAMP": ["MMRK", "RKSPL", "RRLSK", "RRLSK"],
    "hyformer": ["MKQW", "RKSPL"],
    "Random": ["KKKKK", "PLQ", "RKSPL"],
}

metrics = [sm.metrics.FBD(reference=sequences["Random"], embedder=hello_model)]
df = sm.evaluate(sequences, metrics)

sm.show(df)

100%|██████████| 3/3 [00:00<00:00,  9.91it/s, data=Random, metric=FBD]  

	FBD↓
HydrAMP	119.24
hyformer	45.17
Random	0.00

Hyformer#

Let’s also use Hyformer to predict whether an peptide has antimicrobial properties.

hyformer = sm.models.ThirdPartyModel(
    entry_point="hyformer.inference:predict",
    path="../thirdparty/hyformer-peptide",
    url="https://github.com/szczurek-lab/hyformer",
    branch="v2.0",
    uv=UV_PATH,
    git=GIT_PATH,
)

hyformer.help()

predict(sequences: list[str], checkpoint: str, batch_size: int = 32, device: Optional[str] = None) -> numpy.ndarray
File: /Users/rasmus.larsen/work/hackathon-2025/seqme/docs/thirdparty/hyformer-peptide/hyformer/inference.py

Return property predictions, shape (len(sequences), num_properties).

Args:
    checkpoint: Must be a checkpoint fine-tuned for property prediction
        (e.g. ``SzczurekLab/hyformer_peptides_34M_MIC``). Base generative
        checkpoints do not have a prediction head — use :func:`embed` or
        :func:`compute_perplexity` instead.

hyformer(["RKSPL", "MKQW"], batch_size=16, checkpoint="SzczurekLab/hyformer_peptides_34M_mic", device="cpu")

array([[2.214258],
       [1.890324]], dtype=float32)

AMPlify#

Let’s also use AMPlify which is an antimicrobial peptide (AMP) classifier, i.e., outputs the probability a peptide has antimicrobial properties.

Let’s setup the model.

amplify = sm.models.ThirdPartyModel(
    entry_point="amplify.predict:predict",
    path="../thirdparty/amplify",
    url="https://github.com/szczurek-lab/seqme-amplify",
    uv=UV_PATH,
    git=GIT_PATH,
)

amplify.help()

predict(sequences: list, model_type: Literal['balanced', 'imbalanced'] = 'balanced', n_ensembles: int = 5, batch_size: int = 128) -> numpy.ndarray
File: /Users/rasmus.larsen/work/hackathon-2025/seqme/docs/thirdparty/amplify/src/amplify/predict.py

Assuming everything went well, let’s now compute a metric using this predictive model.

amplify(["MKQW", "RKSPL"], model_type="imbalanced", batch_size=128, n_ensembles=2)

array([0.00635906, 0.49806994], dtype=float32)

sequences = {
    "HydrAMP": ["MMRK", "RKSPL", "RRLSK", "RRLSK"],
    "hyformer": ["MKQW", "RKSPL"],
    "Random": ["KKKKK", "PLQ", "RKSPL"],
}

metrics = [
    sm.metrics.ID(
        predictor=partial(amplify, model_type="balanced", n_ensembles=5, batch_size=128),
        name="p_AMP (AMPlify)",
        objective="maximize",
    )
]
df = sm.evaluate(sequences, metrics)

sm.show(df)

100%|██████████| 3/3 [00:16<00:00,  5.49s/it, data=Random, metric=p_AMP (AMPlify)]  

	p_AMP (AMPlify)↑
HydrAMP	0.26±0.10
hyformer	0.24±0.25
Random	0.40±0.22

amPEPpy#

Let’s also use amPEPpy which is an antimicrobial peptide (AMP) classifier, i.e., outputs the probability a peptide has antimicrobial properties.