Skip to content

MPNN Sequence Design (Inverse Folding)

MPNN Sequence Design (Inverse Folding)

Start from a backbone shape and get amino-acid sequences predicted to fold into it.

This tool does inverse folding. Normal structure prediction goes from a sequence to a 3D shape; inverse folding goes the other way, from a 3D backbone shape to sequences that would fold into it. You upload a structure, and the tool returns several candidate amino acid sequences predicted to adopt that same backbone.

You choose which model fits your goal. ProteinMPNN is the general purpose choice for protein only inputs. LigandMPNN is for when your structure includes ligands, nucleic acids, or metal ions that the design should account for. SolubleMPNN is tuned to improve the solubility of a soluble protein.

The minimal input is the structure plus the fixed mode of design. By default it returns 8 sequences per structure.

Use it when you have a backbone you want to keep but a sequence you want to change, for example to stabilize a fold, redesign a surface, or generate diverse sequences that all adopt the same shape. You can hold specific positions fixed so the tool only redesigns the rest, which is useful when you want to preserve an active site or a known motif while varying everything around it.

InputRequiredWhat it is
modeyesAlways design for this tool. It is set for you.
pdb_fileone ofThe backbone structure to design onto, uploaded as a PDB file. Provide this or pdb.
pdbone ofThe same structure supplied inline as a JSON object instead of a file. Provide this or pdb_file.
model_typenoWhich model to use: protein_mpnn (the default, general purpose), ligand_mpnn (respects ligands, nucleic acids, and metals), or soluble_mpnn (improves solubility).
chains_to_designnoA list of which chains to redesign, for example ["A"]. Leave it out to design all chains.
fixed_positionsnoPositions to keep exactly as they are while the rest is redesigned, given per chain.
redesign_positionsnoThe opposite of fixed_positions: name only the positions you want changed and the rest is kept.
num_seq_per_targetnoHow many candidate sequences to generate. Defaults to 8.
sampling_tempnoHow adventurous the design is, from near 0 (conservative, close to the most likely sequence) upward (more diverse). Defaults to 0.1.

A few further optional fields tune the sampling, such as bias_aa and omit_aa to encourage or forbid certain amino acids, batch_size, seed, and checkpoint. Leave them at their defaults for a first run.

Submit your structure from Azulene Studio, the Python SDK, or the CLI. Local file paths are uploaded for you automatically. New here? The Get started page walks through installing, logging in, and running a ready made example first.

Open MPNN Sequence Design from the tools list, then on the Inputs and Parameters step upload your backbone PDB, pick the model that fits your case (ProteinMPNN is the default), optionally set how many sequences you want, then Review and Submit.

from opal import jobs
result = jobs.submit(
job_type="mpnn_design",
input_data={
"mode": "design",
"pdb_file": "backbone.pdb",
"model_type": "protein_mpnn",
"num_seq_per_target": 8,
},
)

Pass the inputs as a JSON string. File paths are uploaded automatically.

Terminal window
opal jobs submit --job-type mpnn_design \
--input-data '{"mode": "design", "pdb_file": "backbone.pdb", "model_type": "protein_mpnn", "num_seq_per_target": 8}'

You get back the candidate sequences, one set per structure you submitted. Each candidate comes with two numbers that help you choose between them.

  • A confidence score that says how sure the model is about the sequence it produced. Higher means more confident.
  • A native recovery, the fraction of positions where the designed sequence matches the original one the structure came from. It tells you how far each design strays from the starting sequence: high recovery stays close, low recovery is more novel.

Use these to rank candidates and pick a handful to take forward, for example into a structure prediction to check the design folds as intended.

Supply either pdb_file or pdb, not both. Raising sampling_temp and num_seq_per_target gives more and more diverse candidates at the cost of longer runs. Use fixed_positions when you need to preserve an active site or motif while redesigning everything around it. The model runs on a GPU.