ESMFold Single-Chain Structure Prediction

ESMFold predicts the 3D structure of one protein chain straight from its amino acid sequence. It uses a protein language model, so it does not need a multiple sequence alignment, the set of related sequences that older folding models depend on. That makes it fast and simple: you give it one sequence and it gives you a structure.

Because there is no alignment step to wait on, ESMFold is a good choice for a quick first look at a single protein. The trade off is that for some sequences an alignment based model can be more accurate.

The only required input is the protein sequence. By default the output structure is written as a PDB file, and the prediction allows sequences of up to 1024 residues.

When to use it

Use it when you want a fast structure prediction for a single protein chain and you do not want to wait on an alignment. It is ideal for a quick look at one protein, or for screening many proteins where speed matters. If you need to fold more than one chain, a complex, or a protein with a ligand or nucleic acid, use OpenFold2, OpenFold3, Chai-1, or Boltz-2 instead.

Inputs

Input	Required	What it is
`sequence`	yes	The protein sequence as a one-letter amino acid string, from 1 to 1024 residues long.
`output_format`	no, default `pdb`	The structure file format, either `pdb` (default) or `cif`. The mmCIF form is generated from the model’s native PDB output.
`chunk_size`	no	An advanced memory setting. Lower values use less GPU memory at the cost of speed. Leave it unset to use the model default.
`num_recycles`	no, default `4`	How many times the model refines its own prediction, from 0 to 8. More recycles can improve the structure. Note that synthetic or unusual sequences may stay low confidence no matter how many recycles you use.

How to run it

Submit your protein sequence from Azulene Studio, the Python SDK, or the CLI. New here? The Get started page walks through installing, logging in, and running a ready made example first.

In Azulene Studio

Open ESMFold Single-Chain Structure Prediction from the tools list, then on the Inputs and Parameters step paste your protein sequence, choose the output format if you want, adjust the number of recycles if you want, then Review and Submit.

From the Python SDK

from opal import jobs

result = jobs.submit(
    job_type="esmfold_predict",
    input_data={
        "sequence": "MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG",
        "output_format": "pdb",
        "num_recycles": 4,
    },
)

From the CLI

Pass the inputs as a JSON string.

opal jobs submit --job-type esmfold_predict \
  --input-data '{"sequence": "MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG", "output_format": "pdb", "num_recycles": 4}'

Reading the result

The main output is the predicted 3D structure of your protein, returned as PDB text by default, or as mmCIF text if you chose that format. Both are plain text formats that list every atom and its position. The structure comes back in the result and can be downloaded and opened in any molecular viewer.

ESMFold also reports confidence scores that tell you how much to trust the prediction:

A per residue confidence score, pLDDT, reported for each part of the chain on a 0 to 1 scale, where higher means the model is more confident about that part’s position. High confidence regions are usually well folded, while low confidence regions are often flexible loops or parts the model is unsure about. Synthetic or out of distribution sequences may have low pLDDT regardless of how many recycles you run.
An overall structure confidence score, pTM, a single number that summarizes how confident the model is about the whole fold, where higher is better.

The full result, including the structure text and the scores, can be downloaded from the result. ESMFold predicts a single chain, so there is no interface or chain to chain score.

Notes

Just one sequence is enough to run it. ESMFold runs on a GPU, and runtime grows with the length of the sequence. Sequences are capped at 1024 residues because the memory the model needs grows quickly with length. For a longer protein, or for a complex of several chains, use one of the alignment based models linked above.