Skip to content

Boltz-2 Structure + Affinity Prediction

Boltz-2 Structure + Affinity Prediction

Co-fold proteins, cofactors, and a ligand into a 3D structure, and predict binding affinity in one run.

Boltz-2 predicts the 3D structure of a complex from sequence. You give it one or more protein chains, and optionally a small molecule ligand and cofactors, and it folds them together into a single predicted structure. This is called co-folding, because the protein and the ligand are placed in 3D at the same time, in the same calculation.

When you include a ligand, Boltz-2 also runs its affinity head, which predicts how strongly that ligand binds the protein. So a single Boltz-2 run can give you both the shape of the complex and an estimate of binding strength.

You can fold up to 12 protein chains, up to 8 cofactors, and 1 ligand in one run. The minimal input is one protein sequence. Chain identifiers are filled in for you automatically, so proteins become A, B, C, and so on, and the ligand becomes Z. By default the protein alignment step uses the public Boltz alignment server, and the output structure is written as a CIF file.

Use it when you want a 3D model of a protein on its own, a protein with several copies of itself, a protein with cofactors, or a protein with a bound small molecule. Reach for it in particular when you want both the predicted shape of a protein and ligand complex and a quick estimate of how tightly that ligand binds, since the affinity prediction comes for free in the same run whenever a ligand is present.

InputRequiredWhat it is
proteinsyesA list of 1 to 12 protein chains. The smallest entry is {"sequence": "<one-letter amino acid string>"}. Chain identifiers (A, B, C, and so on) are assigned for you. To fold several copies of the same protein, list it several times.
ligandnoA single small molecule, given as {"smiles": "<SMILES>"}. Provide exactly one of smiles or ccd (a chemical component dictionary code). The ligand identifier is set to Z for you. When a ligand is present, the affinity prediction runs.
cofactorsnoA list of 0 to 8 cofactor molecules. Each one needs its own id plus exactly one of ccd (preferred) or smiles.
templatesnoA list of known structures to guide the prediction. Each one points at an uploaded .pdb or .cif file and can name the chain and a few options.
constraintsnoOptional {pockets, bonds}. Use pockets to say a ligand should sit near certain residues, and bonds to declare explicit covalent links such as disulfide bridges.
propertiesnoToggles for the prediction heads. {affinity: true or false}. Defaults to true when a ligand is present and false otherwise.
runtimenoRun settings. Defaults are use_msa_server=true, use_potentials=false, no_kernels=false, diffusion_samples=1, and output_format="cif".
modeno"json" (default) for the normal input shown here, or "raw_yaml" for the power user route that hands Boltz a complete YAML file.
raw_yaml_urlnoA link to a complete Boltz YAML file. Only used when mode is "raw_yaml".

Submit your protein (and optional ligand) from Azulene Studio, the Python SDK, or the CLI. New here? The Get started page walks through installing, logging in, and running a ready made example first.

Open Boltz-2 Structure + Affinity Prediction from the tools list, then on the Inputs and Parameters step enter your protein sequence (add more chains if you have a complex), optionally add a ligand SMILES and any cofactors, leave the run settings at their defaults for a first run, then Review and Submit.

from opal import jobs
result = jobs.submit(
job_type="boltz_prediction",
input_data={
"proteins": [{"sequence": "MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAK"}],
"ligand": {"smiles": "CC(=O)Oc1ccccc1C(=O)O"},
"properties": {"affinity": True},
},
)

Pass the inputs as a JSON string.

Terminal window
opal jobs submit --job-type boltz_prediction \
--input-data '{"proteins": [{"sequence": "MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAK"}], "ligand": {"smiles": "CC(=O)Oc1ccccc1C(=O)O"}, "properties": {"affinity": true}}'

The main output is the predicted 3D structure of your complex, written by default as a CIF file (a plain text format that lists every atom and its position). You can download it from the Files tab of the result and open it in any molecular viewer.

Boltz-2 also reports confidence scores that tell you how much to trust the prediction. In plain words:

  • A per residue confidence score, usually called pLDDT, says how sure the model is about the position of each part of the chain. Higher means more confident. Regions with high confidence are usually well folded, while low confidence regions are often flexible loops or parts the model is unsure about.
  • A predicted aligned error, usually called PAE, says how confident the model is about the position of one part of the structure relative to another. Lower error means the relative placement of two regions is more trustworthy. This matters most for telling whether two chains, or a protein and its ligand, are placed correctly with respect to each other.
  • An overall and an interface confidence score, usually called pTM and ipTM, summarize the whole structure and the part where chains meet, each on a 0 to 1 scale where higher is better.

When you include a ligand, Boltz-2 adds a predicted binding affinity, an estimate of how strongly that ligand binds the protein. Use it as a fast first pass ranking of ligands. For a rigorous binding strength on a physics based footing, follow up with Absolute Binding Free Energy or Relative Binding Free Energy.

The full set of scores and any extra files are available to download from the result. The exact names of each score in the downloaded data come from Boltz-2 itself.

A single protein sequence is enough for a first run. Add a ligand to also get an affinity estimate. Boltz-2 runs on a GPU, and runtime grows with the number and length of the chains and the number of diffusion samples. By default the alignment step uses the public Boltz alignment server, so very long or unusual sequences may take longer at that step.