MPNN Sequence Design (Inverse Folding)
This tool does inverse folding. Normal structure prediction goes from a sequence to a 3D shape; inverse folding goes the other way, from a 3D backbone shape to sequences that would fold into it. You upload a structure, and the tool returns several candidate amino acid sequences predicted to adopt that same backbone.
You choose which model fits your goal. ProteinMPNN is the general purpose choice for protein only inputs. LigandMPNN is for when your structure includes ligands, nucleic acids, or metal ions that the design should account for. SolubleMPNN is tuned to improve the solubility of a soluble protein.
The minimal input is the structure plus the fixed mode of design. By default it returns 8 sequences per structure.
When to use it
Section titled “When to use it”Use it when you have a backbone you want to keep but a sequence you want to change, for example to stabilize a fold, redesign a surface, or generate diverse sequences that all adopt the same shape. You can hold specific positions fixed so the tool only redesigns the rest, which is useful when you want to preserve an active site or a known motif while varying everything around it.
Inputs
Section titled “Inputs”| Input | Required | What it is |
|---|---|---|
mode | yes | Always design for this tool. It is set for you. |
pdb_file | one of | The backbone structure to design onto, uploaded as a PDB file. Provide this or pdb. |
pdb | one of | The same structure supplied inline as a JSON object instead of a file. Provide this or pdb_file. |
model_type | no | Which model to use: protein_mpnn (the default, general purpose), ligand_mpnn (respects ligands, nucleic acids, and metals), or soluble_mpnn (improves solubility). |
chains_to_design | no | A list of which chains to redesign, for example ["A"]. Leave it out to design all chains. |
fixed_positions | no | Positions to keep exactly as they are while the rest is redesigned, given per chain. |
redesign_positions | no | The opposite of fixed_positions: name only the positions you want changed and the rest is kept. |
num_seq_per_target | no | How many candidate sequences to generate. Defaults to 8. |
sampling_temp | no | How adventurous the design is, from near 0 (conservative, close to the most likely sequence) upward (more diverse). Defaults to 0.1. |
A few further optional fields tune the sampling, such as bias_aa and omit_aa to encourage or forbid certain amino acids, batch_size, seed, and checkpoint. Leave them at their defaults for a first run.
How to run it
Section titled “How to run it”Submit your structure from Azulene Studio, the Python SDK, or the CLI. Local file paths are uploaded for you automatically. New here? The Get started page walks through installing, logging in, and running a ready made example first.
In Azulene Studio
Section titled “In Azulene Studio”Open MPNN Sequence Design from the tools list, then on the Inputs and Parameters step upload your backbone PDB, pick the model that fits your case (ProteinMPNN is the default), optionally set how many sequences you want, then Review and Submit.
From the Python SDK
Section titled “From the Python SDK”from opal import jobs
result = jobs.submit( job_type="mpnn_design", input_data={ "mode": "design", "pdb_file": "backbone.pdb", "model_type": "protein_mpnn", "num_seq_per_target": 8, },)From the CLI
Section titled “From the CLI”Pass the inputs as a JSON string. File paths are uploaded automatically.
opal jobs submit --job-type mpnn_design \ --input-data '{"mode": "design", "pdb_file": "backbone.pdb", "model_type": "protein_mpnn", "num_seq_per_target": 8}'Reading the result
Section titled “Reading the result”You get back the candidate sequences, one set per structure you submitted. Each candidate comes with two numbers that help you choose between them.
- A confidence score that says how sure the model is about the sequence it produced. Higher means more confident.
- A native recovery, the fraction of positions where the designed sequence matches the original one the structure came from. It tells you how far each design strays from the starting sequence: high recovery stays close, low recovery is more novel.
Use these to rank candidates and pick a handful to take forward, for example into a structure prediction to check the design folds as intended.
Supply either pdb_file or pdb, not both. Raising sampling_temp and num_seq_per_target gives more and more diverse candidates at the cost of longer runs. Use fixed_positions when you need to preserve an active site or motif while redesigning everything around it. The model runs on a GPU.