Protein Mutation ΔΔG Fold

Protein Mutation ΔΔG Fold predicts how much a single amino acid change makes a protein more stable or less stable, reported as a change in folding free energy, ΔΔG_fold, in kcal/mol.

The sign convention is that a negative ΔΔG_fold means the mutation makes the fold more stable, and a positive value means it makes the fold less stable.

It works through a two step cycle. The same mutation is simulated twice: once in the fully folded protein, and once in a short stretched out reference peptide that stands in for the unfolded state. That reference peptide is built from the local sequence around the mutation site, by default the three residues on each side. The difference between the folded change and the unfolded change is the ΔΔG_fold. Both legs use the same simulation engine and force field so that the comparison is fair.

Alongside the natural twenty amino acids you can also mutate to any of fourteen unnatural amino acids: Aib, Sar, dA, dF, dW, dY, hPhe, Hyp, mePhe, meS, meT, Nle, Orn, and Phe_4F.

When to use it

Use it when you want to know whether a point mutation will stabilise or destabilise a protein, for example when engineering a more stable variant of an enzyme or antibody, or when you are exploring the effect of an unnatural amino acid at a specific site. Your input protein must be a single chain in version one of this tool. Multi chain structures where the mutation is ambiguous are rejected.

Inputs

Input	Required	What it is
`protein_pdb`	yes	The input protein structure as a PDB file. Single chain only in this version.
`mutations`	no	A comma separated list of mutations, each written as `CHAIN:RESID:TARGET`, for example `A:78:V`. An unnatural amino acid target goes in square brackets, for example `A:78:[Aib]`. Provide either this or `mutant_chain_helm`, not both.
`mutant_chain_helm`	no	The full mutated chain written in HELM2 notation. The tool finds the mutations by lining this sequence up against the input PDB chain position by position. Provide either this or `mutations`, not both.
`mutant_chain_id`	no, default `A`	Which chain in the PDB the `mutant_chain_helm` sequence matches. Ignored when you use `mutations` directly.
`unfolded_flank_size`	no, default `3`	How many residues to keep on each side of the mutation site when building the unfolded reference peptide. The default of 3 gives a 7 residue window. 0 collapses to a single capped residue.
`unfolded_engine`	no, default `feflow`	The engine for the unfolded leg. `feflow` is recommended and shares its method with the folded leg. `rfe_legacy` is the older path, kept as a fallback.
`unfolded_relax_ns`	no, default `0.005` ns	How long to relax each reference peptide before the free energy ramp, in nanoseconds. This lets the peptide settle into a random coil shape.
`unfolded_relax_implicit_solvent`	no, default `true`	If `true`, relaxes the reference peptide in a fast simplified water model. If `false`, uses explicit water, which is slower but matches the published quantitative protocol.
`unfolded_relax_restrained_nvt_ns`	no, default `0.0` ns	Length of an extra relaxation phase that holds the backbone fixed while the rest settles, in nanoseconds. Skipped by default.
`unfolded_relax_unrestrained_nvt_ns`	no, default `0.0` ns	Length of a further relaxation phase with the backbone free, run after the restrained phase, in nanoseconds. Skipped by default.
`mode_preset`	no, default `smoke`	A single knob that sets several fields at once. `smoke` keeps your own values for a quick test. `aldeghi_2019_quantitative` switches everything to the longer published protocol for accurate numbers.
`random_seed`	no, default `42`	A fixed seed so a re run gives the same result.
`equil_length_ns`	no, default `5.0` ns	Equilibration length per endpoint, in nanoseconds. The small value `0.005` is used for quick plumbing tests.
`n_neq_switches_per_direction`	no, default `50`	How many switching runs to do in each direction. More switches give a better estimate.
`protocol_repeats`	no, default `3`, minimum `1`	Number of independent repeats used for the uncertainty estimate. More repeats give a smaller uncertainty.

Longer simulation lengths and more repeats and switches give more reliable numbers, but cost more runtime and credits.

How to run it

Submit your protein and mutation from Azulene Studio, the Python SDK, or the CLI. New here? The Get started page walks through installing, logging in, and running a ready made example first.

In Azulene Studio

Open Protein Mutation ΔΔG Fold from the tools list, then on the Inputs and Parameters step upload your protein PDB, enter the mutation you want, for example A:6:F, adjust the simulation settings or pick a mode_preset if you want, then Review and Submit.

From the Python SDK

from opal import jobs

result = jobs.submit(
    job_type="protein_mutation_ddg_fold",
    input_data={
        "protein_pdb": "/path/to/your/protein.pdb",
        "mutations": "A:6:F",
        "protocol_repeats": 3,
    },
)

From the CLI

Pass the inputs as a JSON string. The protein PDB path is uploaded for you.

opal jobs submit --job-type protein_mutation_ddg_fold \
  --input-data '{"protein_pdb": "/path/to/your/protein.pdb", "mutations": "A:6:F", "protocol_repeats": 3}'

Reading the result

The main output is ddG_fold, the change in folding free energy on mutation, in kcal/mol, with its error estimate in sigma_total. A negative ddG_fold means the mutation makes the protein more stable, and a positive value means it makes the protein less stable.

The result also reports the two halves of the cycle separately. dG_folded, with its error sigma_folded and unit dG_folded_unit, is the free energy change for the mutation inside the folded protein. dG_unfolded, with its error sigma_unfolded and unit dG_unfolded_unit, is the same change measured on the short reference peptide that stands in for the unfolded state. The ddG_fold is the folded value minus the unfolded value.

Notes

Keep the default smoke preset and short simulation lengths for a quick first run. For accurate numbers you can compare against published data, pick the aldeghi_2019_quantitative preset, which sets longer simulations, more switches, and an explicit water model for the reference peptide. This tool runs on a GPU, and runtime grows with the simulation lengths, the number of switches, and the number of repeats. In this version the input protein must be a single chain.