Skip to content

Predict Aqueous Solubility

Predict Aqueous Solubility

Predict how well a molecule dissolves in water, from its SMILES string.

Predict Aqueous Solubility estimates how readily a molecule dissolves in water, reported as a logS value. LogS is the base ten logarithm of the molecule’s solubility measured in moles per litre. A higher (less negative) logS means the molecule dissolves more easily, and a lower (more negative) logS means it stays mostly undissolved.

The prediction comes from a machine learning model (an XGBoost model) trained on AqSolDB, a public collection of measured solubility values. You give it a SMILES string, which is the plain text way of writing a molecule’s structure, and it returns the predicted logS along with a plain language solubility class.

Solubility is one of the first things people check on a candidate molecule, because a drug that will not dissolve in water is hard to dose and hard to test. This tool gives you a fast first estimate before you commit to slower experiments or heavier simulations.

Use it when you want a quick read on whether a molecule is likely to dissolve in water, for example when triaging a list of candidates, filtering a screening library, or sanity checking a structure before running more expensive work. It needs only a SMILES string, so you can run it the moment you have drawn or typed a molecule.

InputRequiredWhat it is
smilesyesSMILES notation of the molecule, the plain text way of writing its structure.

Submit your molecule from Azulene Studio, the Python SDK, or the CLI. You can send a single SMILES, or a batch of many SMILES at once, since this tool is marked batch capable in the catalog. New here? The Get started page walks through installing, logging in, and running a ready made example first.

Open Predict Aqueous Solubility from the tools list, then on the Inputs and Parameters step enter the SMILES of your molecule (or paste a list, or upload a CSV or SDF for a batch), then Review and Submit.

from opal import jobs
result = jobs.submit(
job_type="predict_solubility",
input_data={
"smiles": "CCO",
},
)

To score many molecules in one go, submit a batch of SMILES instead of a single string.

Pass the inputs as a JSON string.

Terminal window
opal jobs submit --job-type predict_solubility \
--input-data '{"smiles": "CCO"}'

You can also submit a batch of many SMILES in one job from the CLI.

For each molecule the result reports:

  • predicted_logS, the predicted logS value, the base ten logarithm of the solubility in moles per litre. A higher (less negative) number means the molecule dissolves more easily.
  • solubility_class, a plain language label derived from the logS value: highly soluble (logS at or above 0), soluble (logS between -2 and 0), moderate (logS between -4 and -2), poorly soluble (logS between -6 and -4), and insoluble (logS below -6).
  • smiles, the molecule you submitted, echoed back.
  • descriptors, the molecular descriptors the model computed from the structure and used to make the prediction.

To rank a set of molecules from most to least soluble, sort them by predicted_logS from high to low. If a SMILES cannot be read, the result for that molecule carries an error message instead of a prediction.

This is a fast machine learning estimate, not a measured value, so treat it as a first pass screen rather than a final answer. The model was trained on AqSolDB, so molecules that look very different from anything in that dataset will have less reliable predictions. Because the tool is batch capable, you can score a large library in a single job.