Skip to content

Chai-1 Structure Prediction

Chai-1 Structure Prediction

Co-fold proteins, nucleic acids, and small molecules into one 3D structure with Chai-1.

Chai-1 predicts the full 3D structure of a molecular complex from sequence. It is an all atom model, which means it places every atom in 3D rather than just the protein backbone. It can handle several kinds of chains in the same run, including proteins, RNA, DNA, and small molecules, and folds them together into one structure.

Chai-1 can run without a multiple sequence alignment by using built in protein embeddings (a learned numeric summary of each protein), or it can use an alignment if you provide one. The constraint and template inputs follow the same pattern as the AlphaFold3 family of models.

You describe what you want to fold by listing chains inside a single request object. Each chain says what kind of molecule it is and gives its sequence. By default the output structure is written as a CIF file.

Use it when you want a 3D model of a complex that mixes molecule types, for example a protein bound to a strand of RNA or DNA, or a protein with a small molecule. It is also a good choice when you want a quick structure prediction without setting up an alignment, since it can fold proteins directly from their built in embeddings.

InputRequiredWhat it is
requestyesA single object describing what to fold. It holds a list of chains, plus optional templates, constraints, and runtime settings. Each chain has an id, a type (protein, rna, dna, or a small molecule), and a sequence. The runtime block controls options such as use_esm_embeddings (fold from built in embeddings instead of an alignment), num_diffusion_samples (how many candidate structures to generate), and output_format (cif by default). For the power user route, set mode: "raw_fasta" and pass a file directly.

Submit your sequences from Azulene Studio, the Python SDK, or the CLI. New here? The Get started page walks through installing, logging in, and running a ready made example first.

Open Chai-1 Structure Prediction from the tools list, then on the Inputs and Parameters step add each chain you want to fold, choosing its type and pasting its sequence, adjust the run settings if you want, then Review and Submit.

from opal import jobs
result = jobs.submit(
job_type="chai_prediction",
input_data={
"request": {
"mode": "json",
"chains": [
{
"id": "A",
"type": "protein",
"sequence": "MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAK",
}
],
"runtime": {
"use_esm_embeddings": True,
"num_diffusion_samples": 5,
"output_format": "cif",
},
}
},
)

Pass the inputs as a JSON string.

Terminal window
opal jobs submit --job-type chai_prediction \
--input-data '{"request": {"mode": "json", "chains": [{"id": "A", "type": "protein", "sequence": "MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAK"}], "runtime": {"use_esm_embeddings": true, "num_diffusion_samples": 5, "output_format": "cif"}}}'

The main output is the predicted 3D structure of your complex, written by default as a CIF file (a plain text format that lists every atom and its position). You can download it from the Files tab of the result and open it in any molecular viewer. If you asked for several diffusion samples, the run produces several candidate structures.

Chai-1 also reports confidence scores that tell you how much to trust the prediction. In plain words:

  • A per residue confidence score, usually called pLDDT, says how sure the model is about the position of each part of a chain. Higher means more confident. High confidence regions are usually well folded, while low confidence regions are often flexible or uncertain.
  • A predicted aligned error, usually called PAE, says how confident the model is about the position of one part of the structure relative to another. Lower error means the relative placement of two regions, for example two chains, is more trustworthy.
  • An overall and an interface confidence score, usually called pTM and ipTM, summarize the whole structure and the part where chains meet, each on a 0 to 1 scale where higher is better.

The full set of scores and any extra files are available to download from the result. The exact names of each score in the downloaded data come from Chai-1 itself.

A single protein chain is enough for a first run. Add more chains, of any supported type, to fold a complex. Chai-1 runs on a GPU, and runtime grows with the number and length of the chains and the number of diffusion samples you ask for.