ESM-2 Zero-Shot Mutation Scoring

ESM-2 Zero-Shot Mutation Scoring takes a wild type protein sequence and a list of single point mutations, and gives each mutation a score for how favorable the ESM-2 protein language model thinks it is. The wild type sequence is the starting, unmutated protein. A point mutation swaps one amino acid at one position for a different one.

Zero shot means the model gives these scores straight away, without being shown any examples or training data for your specific protein. ESM-2 has learned from millions of natural proteins what amino acids tend to appear where, so it can judge whether a proposed swap looks natural or unusual just from what it already knows.

Each mutation is written in the form <original><position><new>, for example M1A, which means the amino acid M at position 1 is changed to A. The tool checks each mutation against the wild type sequence and rejects ones where the position is out of range or the stated original amino acid does not match.

When to use it

Use it when you have a protein and a list of candidate single point mutations and you want a quick, cheap ranking of which ones the model considers favorable, before committing to slower or more expensive methods. A more favorable score suggests the mutation looks natural to the model, which can help you shortlist mutations worth studying further.

Inputs

Input	Required	What it is
`sequence`	yes	The wild type protein sequence as a one letter amino acid string, between 1 and 2048 amino acids long.
`mutants`	yes	A list of point mutations, each written as `<original><position><new>`, for example `M1A`. Each is checked against the wild type sequence.
`model_variant`	no, default `650M`	Which ESM-2 model to use, `650M` or the larger `3B`.
`method`	no, default `masked_marginal`	How each mutation is scored. `masked_marginal` hides one position at a time and asks the model what belongs there (more accurate, slower). `wt_marginal` uses a single pass over the whole sequence (faster, less finely calibrated).
`fp16`	no, default `true`	Runs the model in a faster, lower precision number format on the GPU.

How to run it

Submit your sequence and mutations from Azulene Studio, the Python SDK, or the CLI. New here? The Get started page walks through installing, logging in, and running a ready made example first.

In Azulene Studio

Open ESM-2 Zero-Shot Mutation Scoring from the tools list, then on the Inputs and Parameters step enter the wild type sequence and your list of mutations as a JSON list, pick the model variant and method if you want to change the defaults, then Review and Submit.

From the Python SDK

from opal import jobs

result = jobs.submit(
    job_type="esm2_mutation_score",
    input_data={
        "sequence": "MKIEELKKWVEEFDKKLAEIFKFDFGGYRELADKVAEAVGKKVDEKQKKIVEIFEKVEAEA",
        "mutants": ["M1A", "K2R", "I3V"],
        "method": "masked_marginal",
    },
)

From the CLI

Pass the inputs as a JSON string.

opal jobs submit --job-type esm2_mutation_score \
  --input-data '{"sequence": "MKIEELKKWVEEFDKKLAEIFKFDFGGYRELADKVAEAVGKKVDEKQKKIVEIFEKVEAEA", "mutants": ["M1A", "K2R", "I3V"], "method": "masked_marginal"}'

Reading the result

The main field is scores, a list with one entry per mutation you asked about. Each entry has these fields:

mutant: the mutation it scored, for example M1A.
log_likelihood_ratio: the score. It compares how likely the model thinks the new amino acid is at that position against the original one. A higher (more positive) value means the model finds the mutation more favorable than the wild type at that spot, and a lower (more negative) value means less favorable. Rank your mutations by this number to find the ones the model likes best.
wt_log_prob: how likely the model thinks the original (wild type) amino acid is at that position.
mt_log_prob: how likely the model thinks the mutated amino acid is at that position. The log_likelihood_ratio is the difference between these two.

The result also reports method, the scoring method that was used, model_used, the exact model, and a small scalars block with mean_llr, the average score across your mutations, and n_mutants. In Azulene Studio these per mutation results are shown as raw result data.

Notes

The masked_marginal method is more accurate but runs one pass per mutated position, so it is slower. Choose wt_marginal for a faster, single pass estimate when you have many mutations and want a quick first cut. The scores are best used to rank mutations relative to each other, not as absolute measurements.