Skip to content

Protein Property Prediction

Protein Property Prediction

Score a protein or peptide on several developability properties straight from its sequence.

This tool predicts a panel of properties for a protein or peptide from its sequence alone, without needing a structure. The properties are the kinds of things that decide whether a candidate is practical to make and use, often grouped under the word developability.

The panel covers solubility (how well it stays dissolved), aggregation (how likely it is to clump), subcellular localization (where in a cell it tends to go), intrinsic disorder (which regions have no fixed shape), toxicity, melting temperature or Tm (the temperature at which it unfolds, a measure of thermal stability), and MHC binding (how likely pieces of it are to be displayed to the immune system, for both class I and class II).

You provide exactly one of a plain amino acid sequence or a HELM string. By default it runs the whole panel.

Use it to triage or rank candidate sequences before committing to wet lab work. When you have many designs and limited capacity to test them, this gives a fast, sequence only read on which ones look developable and which carry red flags such as high aggregation or strong predicted immune visibility. You can run the full panel or just the properties you care about.

InputRequiredWhat it is
sequenceone ofThe protein or peptide as a plain string of the 20 standard amino acids, for example MKTAYIAKQRQ.... Provide this or helm. Sequences over 1022 residues are truncated, and over 2044 are rejected.
helmone ofThe peptide written in HELM2 notation, for example PEPTIDE1{A.G.F.K.L}$$$$V2.0. Use this when your peptide has explicit linkers or modified ends that a plain string cannot capture. Provide this or sequence. Not available in batch mode.
propertiesnoA list choosing which properties to predict, from solubility, aggregation, disorder, localization, toxicity, tm, and mhc. Leave it empty to run the full panel.
properties_optionsnoPer property settings, for example {"mhc": {"peptide_length": 15, "mhc_class": "II"}} to run MHC binding for class II at a given length. Leave empty for sensible defaults.
model_variantnoModel size: 650M (the default, fast and accurate for most uses) or 3B (larger, slightly higher accuracy at higher cost).

Two further optional flags return extra detail: return_per_residue adds per residue scores where available (currently disorder and aggregation hotspots) so you can see which regions drive a prediction, and return_embedding adds the sequence’s numeric fingerprint for downstream similarity search. Both are off by default.

Submit your sequence from Azulene Studio, the Python SDK, or the CLI. New here? The Get started page walks through installing, logging in, and running a ready made example first.

Open Protein Property Prediction from the tools list, then on the Inputs and Parameters step paste your sequence, optionally pick just the properties you want (or leave it to run the full panel), then Review and Submit.

from opal import jobs
result = jobs.submit(
job_type="predict_protein_properties",
input_data={
"sequence": "MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVVHSLAKWKR",
"properties": ["solubility", "aggregation", "tm"],
},
)

Pass the inputs as a JSON string.

Terminal window
opal jobs submit --job-type predict_protein_properties \
--input-data '{"sequence": "MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVVHSLAKWKR", "properties": ["solubility", "aggregation", "tm"]}'

You get back one value per requested property. Solubility, aggregation, and toxicity come as scores you can use to rank candidates against each other. Localization tells you the predicted cellular compartment. Disorder flags which parts of the sequence have no fixed shape. Tm is reported as a temperature, where higher means more thermally stable. MHC binding reports how likely fragments of the sequence are to be displayed to the immune system.

If you asked for per residue scores, the result also shows where along the sequence the disorder and aggregation signals come from, so you can pinpoint the regions to redesign. The full numeric output is available to download from the result, and large results are delivered as a downloadable attachment.

Provide exactly one of sequence or helm. Leaving properties empty runs the full panel, which is the simplest way to start. The 650M model is a good default; switch to 3B only when you want a little more accuracy and accept the higher cost. The model runs on a GPU. For small molecule properties rather than protein properties, see Predict ADMET Properties and Predict Aqueous Solubility.