Third-party models#
In this notebook, we show how to integrate “external” models into seqme.
The full list of external models can be found here: seqme-thirdparty.
from functools import partial
import seqme as sm
Some models of interest are not available through e.g., PyPI or Huggingface - only the git repository may be available. Here we show how to run such models in seqme.
An external model is compatible with seqme if it is setup using uv (lockfile, python version defined), and defines an entry point (function).
Setup a project using:
uv init --package hello-model
Let’s use a toy model in a github repository satisfying all the requirements. To do so, we need to define the function entry point, repository url and the path which stores the repository.
If uv or git are not on the PATH of your Jupyter kernel (common when using a virtual environment or running the notebook from an IDE), you can specify their paths explicitly:
UV_PATH = "/Users/rasmus.larsen/.local/bin/uv"
GIT_PATH = "/usr/bin/git"
hello_model = sm.models.ThirdPartyModel(
entry_point="hello_model.model:embed",
path="../thirdparty/hello-model",
url="https://github.com/szczurek-lab/seqme-thirdparty",
branch="main",
uv=UV_PATH,
git=GIT_PATH,
)
ThirdPartyModel clones the model repository and installs the dependencies first time running the model.
Assuming everything went well, let’s now compute a metric using this embedding model.
hello_model(["MKQW", "RKSPL"], batch_size=32)
array([[44., 8., 12., 32.],
[55., 10., 15., 40.]])
sequences = {
"HydrAMP": ["MMRK", "RKSPL", "RRLSK", "RRLSK"],
"hyformer": ["MKQW", "RKSPL"],
"Random": ["KKKKK", "PLQ", "RKSPL"],
}
metrics = [sm.metrics.FBD(reference=sequences["Random"], embedder=hello_model)]
df = sm.evaluate(sequences, metrics)
sm.show(df)
100%|██████████| 3/3 [00:00<00:00, 9.91it/s, data=Random, metric=FBD]
| FBD↓ | |
|---|---|
| HydrAMP | 119.24 |
| hyformer | 45.17 |
| Random | 0.00 |
Hyformer#
Let’s also use Hyformer to predict whether an peptide has antimicrobial properties.
hyformer = sm.models.ThirdPartyModel(
entry_point="hyformer.inference:predict",
path="../thirdparty/hyformer-peptide",
url="https://github.com/szczurek-lab/hyformer",
branch="v2.0",
uv=UV_PATH,
git=GIT_PATH,
)
hyformer(["RKSPL", "MKQW"], batch_size=16, checkpoint="SzczurekLab/hyformer_peptides_34M_mic", device="cpu")
array([[2.214258],
[1.890324]], dtype=float32)
AMPlify#
Let’s also use AMPlify which is an antimicrobial peptide (AMP) classifier, i.e., outputs the probability a peptide has antimicrobial properties.
Let’s setup the model.
amplify = sm.models.ThirdPartyModel(
entry_point="amplify.predict:predict",
path="../thirdparty/amplify",
url="https://github.com/szczurek-lab/seqme-amplify",
uv=UV_PATH,
git=GIT_PATH,
)
Cloning into '/Users/rasmus.larsen/work/hackathon-2025/seqme/docs/thirdparty/amplify'...
Assuming everything went well, let’s now compute a metric using this predictive model.
amplify(["MKQW", "RKSPL"], model_type="imbalanced", batch_size=128, n_ensembles=2)
array([0.00635906, 0.49806994], dtype=float32)
sequences = {
"HydrAMP": ["MMRK", "RKSPL", "RRLSK", "RRLSK"],
"hyformer": ["MKQW", "RKSPL"],
"Random": ["KKKKK", "PLQ", "RKSPL"],
}
metrics = [
sm.metrics.ID(
predictor=partial(amplify, model_type="balanced", n_ensembles=5, batch_size=128),
name="p_AMP (AMPlify)",
objective="maximize",
)
]
df = sm.evaluate(sequences, metrics)
sm.show(df)
100%|██████████| 3/3 [00:16<00:00, 5.49s/it, data=Random, metric=p_AMP (AMPlify)]
| p_AMP (AMPlify)↑ | |
|---|---|
| HydrAMP | 0.26±0.10 |
| hyformer | 0.24±0.25 |
| Random | 0.40±0.22 |
amPEPpy#
Let’s also use amPEPpy which is an antimicrobial peptide (AMP) classifier, i.e., outputs the probability a peptide has antimicrobial properties.
Let’s setup the model.
ampeppy = sm.models.ThirdPartyModel(
entry_point="ampeppy.predict:predict",
path="../thirdparty/ampeppy",
url="https://github.com/szczurek-lab/seqme-amPEPpy",
uv=UV_PATH,
git=GIT_PATH,
)
Cloning into '/Users/rasmus.larsen/work/hackathon-2025/seqme/docs/thirdparty/ampeppy'...
Assuming everything went well, let’s now compute a metric using this predictive model.
ampeppy(["MKQW", "RKSPL"])
array([0.49427083, 0.28333333])
sequences = {
"HydrAMP": ["MMRK", "RKSPL", "RRLSK", "RRLSK"],
"hyformer": ["MKQW", "RKSPL"],
"Random": ["KKKKK", "PLQ", "RKSPL"],
}
metrics = [sm.metrics.ID(predictor=ampeppy, name="p_AMP (amPEPpy)", objective="maximize")]
df = sm.evaluate(sequences, metrics)
sm.show(df)
100%|██████████| 3/3 [00:04<00:00, 1.40s/it, data=Random, metric=p_AMP (amPEPpy)]
| p_AMP (amPEPpy)↑ | |
|---|---|
| HydrAMP | 0.41±0.08 |
| hyformer | 0.39±0.11 |
| Random | 0.39±0.09 |