OpenFold2 Structure Prediction
OpenFold2 predicts the 3D structure of a protein from its sequence. It is an open source, faithful reproduction of AlphaFold2. It handles both single chains and multimers, which are complexes made of more than one protein chain.
OpenFold2 works best with a multiple sequence alignment, a set of related sequences from other organisms that helps the model see which parts of the protein are conserved. You can let the alignment step run on the public alignment server, upload your own alignment, or run from a single sequence with no alignment at all. You describe what you want to fold by listing chains inside a single request object. By default the output structure is written as a CIF file.
When to use it
Section titled “When to use it”Use it when you want a structure prediction for a protein or a protein complex and you want a well established AlphaFold2 style model. It is a strong default for ordinary protein folding. If your system also needs RNA, DNA, or a small molecule ligand folded in, use OpenFold3 or Chai-1 instead, since those handle mixed molecule types.
Inputs
Section titled “Inputs”| Input | Required | What it is |
|---|---|---|
request | yes | A single object describing what to fold. It holds a list of chains, plus optional templates and runtime settings. Each chain has an id, a sequence, and an msa block that picks the alignment mode, for example {"mode": "server"} to use the public alignment server. The runtime block controls options such as use_msa_server (whether to fetch an alignment automatically) and output_format (cif by default). For the power user route, set mode: "raw_fasta" and pass a file directly. |
How to run it
Section titled “How to run it”Submit your protein sequence from Azulene Studio, the Python SDK, or the CLI. New here? The Get started page walks through installing, logging in, and running a ready made example first.
In Azulene Studio
Section titled “In Azulene Studio”Open OpenFold2 Structure Prediction from the tools list, then on the Inputs and Parameters step add each protein chain, paste its sequence, choose how the alignment should be built, adjust the run settings if you want, then Review and Submit.
From the Python SDK
Section titled “From the Python SDK”from opal import jobs
result = jobs.submit( job_type="openfold2_prediction", input_data={ "request": { "mode": "json", "chains": [ { "id": "A", "sequence": "MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAK", "msa": {"mode": "server"}, } ], "runtime": { "use_msa_server": True, "output_format": "cif", }, } },)From the CLI
Section titled “From the CLI”Pass the inputs as a JSON string.
opal jobs submit --job-type openfold2_prediction \ --input-data '{"request": {"mode": "json", "chains": [{"id": "A", "sequence": "MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAK", "msa": {"mode": "server"}}], "runtime": {"use_msa_server": true, "output_format": "cif"}}}'Reading the result
Section titled “Reading the result”The main output is the predicted 3D structure of your protein, written by default as a CIF file (a plain text format that lists every atom and its position). You can download it from the Files tab of the result and open it in any molecular viewer.
OpenFold2 also reports confidence scores that tell you how much to trust the prediction. In plain words:
- A per residue confidence score, usually called pLDDT, says how sure the model is about the position of each part of the chain. Higher means more confident. High confidence regions are usually well folded, while low confidence regions are often flexible loops or parts the model is unsure about.
- A predicted aligned error, usually called PAE, says how confident the model is about the position of one part of the structure relative to another. Lower error means the relative placement of two regions is more trustworthy. For a multimer, this is what tells you whether the chains are placed correctly with respect to each other.
- An overall and, for multimers, an interface confidence score, usually called pTM and ipTM, summarize the whole structure and the part where chains meet, each on a 0 to 1 scale where higher is better.
The full set of scores and any extra files are available to download from the result. The exact names of each score in the downloaded data come from OpenFold2 itself.
A single protein sequence with the alignment server enabled is enough for a first run. Add more chains for a multimer. OpenFold2 runs on a GPU, and runtime grows with the number and length of the chains. The alignment step adds time at the start, so very long or unusual sequences may take longer there. Running from a single sequence with no alignment is faster but usually less accurate.