Skip to content

ESM-2 Zero-Shot Mutation Scoring

ESM-2 Zero-Shot Mutation Scoring

Rank protein point mutations by how favorable the ESM-2 language model thinks they are, with no training data needed.

ESM-2 Zero-Shot Mutation Scoring takes a wild type protein sequence and a list of single point mutations, and gives each mutation a score for how favorable the ESM-2 protein language model thinks it is. The wild type sequence is the starting, unmutated protein. A point mutation swaps one amino acid at one position for a different one.

Zero shot means the model gives these scores straight away, without being shown any examples or training data for your specific protein. ESM-2 has learned from millions of natural proteins what amino acids tend to appear where, so it can judge whether a proposed swap looks natural or unusual just from what it already knows.

Each mutation is written in the form <original><position><new>, for example M1A, which means the amino acid M at position 1 is changed to A. The tool checks each mutation against the wild type sequence and rejects ones where the position is out of range or the stated original amino acid does not match.

Use it when you have a protein and a list of candidate single point mutations and you want a quick, cheap ranking of which ones the model considers favorable, before committing to slower or more expensive methods. A more favorable score suggests the mutation looks natural to the model, which can help you shortlist mutations worth studying further.

InputRequiredWhat it is
sequenceyesThe wild type protein sequence as a one letter amino acid string, between 1 and 2048 amino acids long.
mutantsyesA list of point mutations, each written as <original><position><new>, for example M1A. Each is checked against the wild type sequence.
model_variantno, default 650MWhich ESM-2 model to use, 650M or the larger 3B.
methodno, default masked_marginalHow each mutation is scored. masked_marginal hides one position at a time and asks the model what belongs there (more accurate, slower). wt_marginal uses a single pass over the whole sequence (faster, less finely calibrated).
fp16no, default trueRuns the model in a faster, lower precision number format on the GPU.

Submit your sequence and mutations from Azulene Studio, the Python SDK, or the CLI. New here? The Get started page walks through installing, logging in, and running a ready made example first.

Open ESM-2 Zero-Shot Mutation Scoring from the tools list, then on the Inputs and Parameters step enter the wild type sequence and your list of mutations as a JSON list, pick the model variant and method if you want to change the defaults, then Review and Submit.

from opal import jobs
result = jobs.submit(
job_type="esm2_mutation_score",
input_data={
"sequence": "MKIEELKKWVEEFDKKLAEIFKFDFGGYRELADKVAEAVGKKVDEKQKKIVEIFEKVEAEA",
"mutants": ["M1A", "K2R", "I3V"],
"method": "masked_marginal",
},
)

Pass the inputs as a JSON string.

Terminal window
opal jobs submit --job-type esm2_mutation_score \
--input-data '{"sequence": "MKIEELKKWVEEFDKKLAEIFKFDFGGYRELADKVAEAVGKKVDEKQKKIVEIFEKVEAEA", "mutants": ["M1A", "K2R", "I3V"], "method": "masked_marginal"}'

The main field is scores, a list with one entry per mutation you asked about. Each entry has these fields:

  • mutant: the mutation it scored, for example M1A.
  • log_likelihood_ratio: the score. It compares how likely the model thinks the new amino acid is at that position against the original one. A higher (more positive) value means the model finds the mutation more favorable than the wild type at that spot, and a lower (more negative) value means less favorable. Rank your mutations by this number to find the ones the model likes best.
  • wt_log_prob: how likely the model thinks the original (wild type) amino acid is at that position.
  • mt_log_prob: how likely the model thinks the mutated amino acid is at that position. The log_likelihood_ratio is the difference between these two.

The result also reports method, the scoring method that was used, model_used, the exact model, and a small scalars block with mean_llr, the average score across your mutations, and n_mutants. In Azulene Studio these per mutation results are shown as raw result data.

The masked_marginal method is more accurate but runs one pass per mutated position, so it is slower. Choose wt_marginal for a faster, single pass estimate when you have many mutations and want a quick first cut. The scores are best used to rank mutations relative to each other, not as absolute measurements.