ESMFold Single-Chain Structure Prediction
ESMFold predicts the 3D structure of one protein chain straight from its amino acid sequence. It uses a protein language model, so it does not need a multiple sequence alignment, the set of related sequences that older folding models depend on. That makes it fast and simple: you give it one sequence and it gives you a structure.
Because there is no alignment step to wait on, ESMFold is a good choice for a quick first look at a single protein. The trade off is that for some sequences an alignment based model can be more accurate.
The only required input is the protein sequence. By default the output structure is written as a PDB file, and the prediction allows sequences of up to 1024 residues.
When to use it
Section titled “When to use it”Use it when you want a fast structure prediction for a single protein chain and you do not want to wait on an alignment. It is ideal for a quick look at one protein, or for screening many proteins where speed matters. If you need to fold more than one chain, a complex, or a protein with a ligand or nucleic acid, use OpenFold2, OpenFold3, Chai-1, or Boltz-2 instead.
Inputs
Section titled “Inputs”| Input | Required | What it is |
|---|---|---|
sequence | yes | The protein sequence as a one-letter amino acid string, from 1 to 1024 residues long. |
output_format | no, default pdb | The structure file format, either pdb (default) or cif. The mmCIF form is generated from the model’s native PDB output. |
chunk_size | no | An advanced memory setting. Lower values use less GPU memory at the cost of speed. Leave it unset to use the model default. |
num_recycles | no, default 4 | How many times the model refines its own prediction, from 0 to 8. More recycles can improve the structure. Note that synthetic or unusual sequences may stay low confidence no matter how many recycles you use. |
How to run it
Section titled “How to run it”Submit your protein sequence from Azulene Studio, the Python SDK, or the CLI. New here? The Get started page walks through installing, logging in, and running a ready made example first.
In Azulene Studio
Section titled “In Azulene Studio”Open ESMFold Single-Chain Structure Prediction from the tools list, then on the Inputs and Parameters step paste your protein sequence, choose the output format if you want, adjust the number of recycles if you want, then Review and Submit.
From the Python SDK
Section titled “From the Python SDK”from opal import jobs
result = jobs.submit( job_type="esmfold_predict", input_data={ "sequence": "MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG", "output_format": "pdb", "num_recycles": 4, },)From the CLI
Section titled “From the CLI”Pass the inputs as a JSON string.
opal jobs submit --job-type esmfold_predict \ --input-data '{"sequence": "MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG", "output_format": "pdb", "num_recycles": 4}'Reading the result
Section titled “Reading the result”The main output is the predicted 3D structure of your protein, returned as PDB text by default, or as mmCIF text if you chose that format. Both are plain text formats that list every atom and its position. The structure comes back in the result and can be downloaded and opened in any molecular viewer.
ESMFold also reports confidence scores that tell you how much to trust the prediction:
- A per residue confidence score, pLDDT, reported for each part of the chain on a 0 to 1 scale, where higher means the model is more confident about that part’s position. High confidence regions are usually well folded, while low confidence regions are often flexible loops or parts the model is unsure about. Synthetic or out of distribution sequences may have low pLDDT regardless of how many recycles you run.
- An overall structure confidence score, pTM, a single number that summarizes how confident the model is about the whole fold, where higher is better.
The full result, including the structure text and the scores, can be downloaded from the result. ESMFold predicts a single chain, so there is no interface or chain to chain score.
Just one sequence is enough to run it. ESMFold runs on a GPU, and runtime grows with the length of the sequence. Sequences are capped at 1024 residues because the memory the model needs grows quickly with length. For a longer protein, or for a complex of several chains, use one of the alignment based models linked above.