Skip to content

Protein Mutation ΔΔG Fold

Protein Mutation ΔΔG Fold

Predict how a single amino acid mutation changes a protein's folding stability.

Protein Mutation ΔΔG Fold predicts how much a single amino acid change makes a protein more stable or less stable, reported as a change in folding free energy, ΔΔG_fold, in kcal/mol.

The sign convention is that a negative ΔΔG_fold means the mutation makes the fold more stable, and a positive value means it makes the fold less stable.

It works through a two step cycle. The same mutation is simulated twice: once in the fully folded protein, and once in a short stretched out reference peptide that stands in for the unfolded state. That reference peptide is built from the local sequence around the mutation site, by default the three residues on each side. The difference between the folded change and the unfolded change is the ΔΔG_fold. Both legs use the same simulation engine and force field so that the comparison is fair.

Alongside the natural twenty amino acids you can also mutate to any of fourteen unnatural amino acids: Aib, Sar, dA, dF, dW, dY, hPhe, Hyp, mePhe, meS, meT, Nle, Orn, and Phe_4F.

Use it when you want to know whether a point mutation will stabilise or destabilise a protein, for example when engineering a more stable variant of an enzyme or antibody, or when you are exploring the effect of an unnatural amino acid at a specific site. Your input protein must be a single chain in version one of this tool. Multi chain structures where the mutation is ambiguous are rejected.

InputRequiredWhat it is
protein_pdbyesThe input protein structure as a PDB file. Single chain only in this version.
mutationsnoA comma separated list of mutations, each written as CHAIN:RESID:TARGET, for example A:78:V. An unnatural amino acid target goes in square brackets, for example A:78:[Aib]. Provide either this or mutant_chain_helm, not both.
mutant_chain_helmnoThe full mutated chain written in HELM2 notation. The tool finds the mutations by lining this sequence up against the input PDB chain position by position. Provide either this or mutations, not both.
mutant_chain_idno, default AWhich chain in the PDB the mutant_chain_helm sequence matches. Ignored when you use mutations directly.
unfolded_flank_sizeno, default 3How many residues to keep on each side of the mutation site when building the unfolded reference peptide. The default of 3 gives a 7 residue window. 0 collapses to a single capped residue.
unfolded_engineno, default feflowThe engine for the unfolded leg. feflow is recommended and shares its method with the folded leg. rfe_legacy is the older path, kept as a fallback.
unfolded_relax_nsno, default 0.005 nsHow long to relax each reference peptide before the free energy ramp, in nanoseconds. This lets the peptide settle into a random coil shape.
unfolded_relax_implicit_solventno, default trueIf true, relaxes the reference peptide in a fast simplified water model. If false, uses explicit water, which is slower but matches the published quantitative protocol.
unfolded_relax_restrained_nvt_nsno, default 0.0 nsLength of an extra relaxation phase that holds the backbone fixed while the rest settles, in nanoseconds. Skipped by default.
unfolded_relax_unrestrained_nvt_nsno, default 0.0 nsLength of a further relaxation phase with the backbone free, run after the restrained phase, in nanoseconds. Skipped by default.
mode_presetno, default smokeA single knob that sets several fields at once. smoke keeps your own values for a quick test. aldeghi_2019_quantitative switches everything to the longer published protocol for accurate numbers.
random_seedno, default 42A fixed seed so a re run gives the same result.
equil_length_nsno, default 5.0 nsEquilibration length per endpoint, in nanoseconds. The small value 0.005 is used for quick plumbing tests.
n_neq_switches_per_directionno, default 50How many switching runs to do in each direction. More switches give a better estimate.
protocol_repeatsno, default 3, minimum 1Number of independent repeats used for the uncertainty estimate. More repeats give a smaller uncertainty.

Longer simulation lengths and more repeats and switches give more reliable numbers, but cost more runtime and credits.

Submit your protein and mutation from Azulene Studio, the Python SDK, or the CLI. New here? The Get started page walks through installing, logging in, and running a ready made example first.

Open Protein Mutation ΔΔG Fold from the tools list, then on the Inputs and Parameters step upload your protein PDB, enter the mutation you want, for example A:6:F, adjust the simulation settings or pick a mode_preset if you want, then Review and Submit.

from opal import jobs
result = jobs.submit(
job_type="protein_mutation_ddg_fold",
input_data={
"protein_pdb": "/path/to/your/protein.pdb",
"mutations": "A:6:F",
"protocol_repeats": 3,
},
)

Pass the inputs as a JSON string. The protein PDB path is uploaded for you.

Terminal window
opal jobs submit --job-type protein_mutation_ddg_fold \
--input-data '{"protein_pdb": "/path/to/your/protein.pdb", "mutations": "A:6:F", "protocol_repeats": 3}'

The main output is ddG_fold, the change in folding free energy on mutation, in kcal/mol, with its error estimate in sigma_total. A negative ddG_fold means the mutation makes the protein more stable, and a positive value means it makes the protein less stable.

The result also reports the two halves of the cycle separately. dG_folded, with its error sigma_folded and unit dG_folded_unit, is the free energy change for the mutation inside the folded protein. dG_unfolded, with its error sigma_unfolded and unit dG_unfolded_unit, is the same change measured on the short reference peptide that stands in for the unfolded state. The ddG_fold is the folded value minus the unfolded value.

Keep the default smoke preset and short simulation lengths for a quick first run. For accurate numbers you can compare against published data, pick the aldeghi_2019_quantitative preset, which sets longer simulations, more switches, and an explicit water model for the reference peptide. This tool runs on a GPU, and runtime grows with the simulation lengths, the number of switches, and the number of repeats. In this version the input protein must be a single chain.