Releases · chemprop/chemprop

@akshatzalte

What's Changed

Important changes

CLI implementation of RIGR as an option in --multi-hot-atom-featurizer-mode by @akshatzalte in #1172

A new featurization scheme, RIGR (Resonance Invariant Graph Representation), is now available. To access it via the CLI, use --multi-hot-atom-featurizer-mode rigr. This featurizer uses only resonance invariant features so it treats all resonance structures of a molecule identically. It uses a subset of the atom and bond features from the default v2 featurizer. With 60% fewer features, RIGR has shown comparable or superior performance across a variety of property prediction tasks in a forthcoming manuscript. An example Jupyter notebook is also provided.

Other changes

Apply task_weights to default loss function in CLI by @craabreu in #1170
Check if dropout prop needs to be restored by @KnathanM in #1178
Message Passing Error Message Fix by @twinbrian in #1161
Fix metrics problems - Cuda-> CPU, no _defaults by @KnathanM in #1179
Update convert script for v1.4 by @KnathanM in #1176

New Contributors

@craabreu made his first contribution in #1170

Full Changelog: v2.1.1...v2.1.2

@KnathanM

Notable changes

In #1090, we started the process of integrating logging into the core code. This will make it easier for users to control what information Chemprop prints to output. It will also make it easier for developers to include more information outputs for potential debugging.

Scipy 1.15 subtly change how logit works which caused some of our tests to fail (as the values reported were slightly different than before). The expected test values have been updated. #1142

A new example notebook has been added which demonstraits how to adapt Chemprop to work with Shapley value analysis. This is another method to lend some intepretability to Chemprop models by highlighting which atom/bond features are most impactful to the final prediction value. #938

We continue to try to make chemprop easy to use. In #1091 and #1124 we added better warnings and error messages. And in #1151 we made is easy to open the example notebooks in Google Colab. This allows people reading the docs to immediately jump in and try chemprop without needing to set up a python environment.

Bug Fixes

In #1097, we fixed a bug where the transforms for scaling extra features/descriptors were turned off during validation. This caused models trained with these extra inputs to not report accurate metrics during training, which is a problem if the "best" model is selected instead of the last model as is done in hyperparameter optimization. Training a model and using the last model was unaffected as was doing inference.

#1084 fixed a bug where R2Score did not have the attribute task_weights. This attribute is not used but is needed for compatability with other metrics

In v2.1 we transitioned to using torchmetrics for our metrics and loss functions, in part because it takes care of training across multiple nodes (DDP) automatically. Our custom metric for Matthew's correlation coefficient however was not set up the way torchmetrics expected. This was fixed in #1131.

What's Changed

splits file is json by @KnathanM in #1083
add more helpful warnings about the splitting api change by @JacksonBurns in #1091
Fix: Splits file can have multiple splitting schemes by @KnathanM in #1086
Set all transforms to train during validation by @KnathanM in #1097
updated warning to logger by @twinbrian in #1090
Add task weights to r2score by @KnathanM in #1084
Fix tracking_metric overwrite issue by @shihchengli in #1105
Fix save_individual_predictions with ensembling by @shihchengli in #1110
Add a helpful warning when invalid SMILES are passed by @JacksonBurns in #1124
Fix batch size calculation for multicomponent by @KnathanM in #1098
Not use transform_variance for unscaled targets by @shihchengli in #1108
Add output size to attentive hparams by @KnathanM in #1133
Fix test failure due to scipy logit by @KnathanM in #1142
fix docs about extra atom descriptors by @KnathanM in #1139
Fix MCC for DDP and multitask by @KnathanM in #1131
V2: Add Shapley Value notebook for interpretability by @oscarwumit in #938
add notebooks to colab and docs by @KnathanM in #1151

Full Changelog: v2.1.0...v2.1.1

@shihchengli

The v2.1 release adds the uncertainty quantification modules, including estimation, calibration, and evaluation (#937). For more details on uncertainty quantification in Chemprop, please refer to the documentation and the example notebook. Additionally, we switched the loss functions and metrics to torchmetrics (#1022). With this change we also changed the "val_loss" reported to be calculated the same as the training loss to make them comparable (#1020). We also changed Chemprop to use replicates instead of cross validation (#994) and batch normalization is now disabled by default (#1058).

Core code changes

The validation_loss_function is removed in #1023.
The batch norm is disabled by default in #1058
An new predictor, QuantileFFN, is added in #963
BinaryDirichletLoss and MulticlassDirichletLoss are integrated into DirichletLoss in #1066
The split type of CV and CV_NO_VAL are removed in #994
A models list of metric is now registered as children modules in #1020

CLI changes

Disable batch norm by default, and it can be turned on by --batch-norm #1058
Many CLI flags related to uncertainty quantification are added #1010
Quantile regression is now supported via -t regression-quantile #963
The cross validation (CV) is replaced with replicates. The number of replicates can be specified via --num-replicates and the flag --num-folds is deprecated #994
--tracking-metric is added which is the metric to track for early stopping and checkpointing #1020

New notebooks

An notebook showing interoperability of Chemprop featurizer w/ other libraries (DGL and PyG) #1063
Active learning #910
Uncertainty quantification #1071

CI/CD

Ray can be tested on Python 3.12 #1064
USE_LIBUV: 0 is added into the CI workflow #1065

Backwards Compatibility Note

Models trained with v2.0 will not load properly in v2.1 due to the loss functions file being moved. A conversion script is provided to convert a v2.0 model to one compatible with v2.1. Its usage is python chemprop/utils/v2_0_to_v2_1.py <v2_0.pt> <v2_1.pt>

data.make_split_indices now always returns a nested list. Previously it would only return a nested list for cross validation. We encourage you to use data.make_split_indices(num_replicates=X) where X is some number greater than 1, to train on multiple splits of your data to get a better idea of the performance of your architecture. If you do use only one replicate, you will need to unnest the list like so:

train_indices, val_indices, test_indices = data.make_split_indices(mols)
train_data, val_data, test_data = data.split_data_by_indices(
    all_data, train_indices, val_indices, test_indices
)
train_data, val_data, test_data = train_data[0], val_data[0], test_data[0]

What's Changed

change installed torch version on windows actions again by @shihchengli in #1062
.pt instead of .ckpt by @twinbrian in #1060
add ModelCheckpointing to training.ipynb so best model is used automatically by @donerancl in #1059
Add ray to tests on python 3.12 by @KnathanM in #1064
v2.1 Feature: Replicates Instead of Cross Validation Folds by @JacksonBurns in #994
disable libuv with env var rather than avoiding latest torch by @JacksonBurns in #1065
Add new example notebook for active learning by @joelnkn in #910
Fix: splits column is a string not a list by @KnathanM in #1074
Update chemprop to v2.1 in #1038
- This PR included the following PRs:
- Rerun notebooks for v2.1 by @KnathanM in #1067
- Refactor with torchmetrics by @KnathanM in #1022
- update train docs for v2.1 by @KnathanM in #1069
- Disable batch norm by default by @jonwzheng in #1058
- Add notebook showing interoperability of Chemprop featurizer w/other libraries by @jonwzheng in #1063
- Add tracking metric options; make metrics ModuleList; other improvements by @KnathanM in #1020
- Remove old validate-loss-function function by @KnathanM in #1023
- V2: Uncertainty implementation in #1058
  - This PR included the following PRs:
  - Improve the docstring for uncertainty modules by @shihchengli in #986
  - Add Platt calibrator by @KnathanM in #961
  - Add dropout and ensemble predictors by @joelnkn in #970
  - Add NLL and Spearman Uncertainty Evaluators by @am2145 in #984
  - Add quantile regression by @shihchengli in #963
  - Add miscalibration area and ence evaluators by @shihchengli in #1012
  - Add isotonic calibrators by @KnathanM in #1053
  - V2 conformal calibrators by @shihchengli in #989
  - V2 conformal evaluators by @shihchengli in #1005
  - Uncertainty regression calibrators (non-conformal) by @shihchengli in #1055
  - Adding Evidential, MVE, and Binary Dirichlet Uncertainty Predictors by @akshatzalte in #1061
  - Cleanup the uncertainty modules by @shihchengli in #1072
  - Multiclass dirichlet give uncertainty by @KnathanM in #1066
  - Rename uncertainty estimator by @KnathanM in #1070
  - Update uncertainty notebook by @shihchengli in #1071
  - Add uncertainty quantification to the predict CLI by @shihchengli in #1010

Full Changelog: v2.0.5...v2.1.0

@shihchengli

We continue to enhance and improve the functionality and usability of Chemprop. If there are things you'd like to see addressed in a future update, please open an issue or PR.

Core code changes

We discovered that our Noam learning rate scheduler does not match what was originally proposed. The current scheduler does work well though, so it was decided to not change the definition. Instead the scheduler was renamed and refactored to be more clear. By @shihchengli in #975
Work on uncertainty quantification methods revealed that our previous prediction tensor return dimensions would cause difficulty down the line. Now we have placed uncertainty into a separate dimension. By @hwpang in #959
The BinaryDirichletFFN and MulticlassDirichletFFN predictors were added early in the v2 development, but not tested. Now they have been tested and corrected. By @shihchengli in #1017
The RDKit 2D molecular featurizer was added back by popular demand. The versions used in v1 are available as well as a version that uses all available molecular features in rdkit.Chem.Descriptors. By @KnathanM in #877

CLI changes

Log statistical summary of training, validation, and test datasets by @donerancl in #882
Change the default verbose level to INFO by @shihchengli in #953
Save both probabilities and class label for multiclass classification by @shihchengli in #987
Add --remove-checkpoints flag to opt out of saving checkpoints by @shihchengli in #1014
Add --class-balance flag to train CLI by @shihchengli in #1011
Save target column names in model for use at inference by @hwpang in #935
Fix save-smiles-splits not working with rxn. columns as column header by @jonwzheng in #998

Transfer learning

Add new example notebook for transfer learning by @joelnkn in #904
Use pre-train output scaler to scale training data in CLI by @KnathanM in #1051
Add --checkpoint and --freeze-encoder flags in train CLI for transfer learning by @shihchengli in #1007

Documentation

Fixed typos in CLI reference and standardized formatting by @donerancl in #880
Example Notebook for Classification by @twinbrian in #1047
Improve frzn-ffn-layers description and update doc for transfer learning by @oscarwumit in #993
add transform tests by @KnathanM in #955
Add documentation for how to use a separate splits file (CLI) by @KnathanM in #1041

Other small bug fixes

Convert v1 models trained on GPU by @KnathanM in #978
Fix hpopting Notebook and CLI for Windows by @JacksonBurns in #1034
Update multiclass data to be compatible with rdkit 2024.09.1 by @jonwzheng in #1037
Define task_weights if it is None in MulticlassClassificationFFN by @shihchengli in #988
change installed torch version on windows actions again by @KnathanM in #1016
Update batch norm freezing to freeze running stats by @joelnkn in #952
Pass map_location through load_submodules() to torch.load() by @shihchengli in #1029
fix no-header-rows in predict command error by @sunhwan in #1001

New Contributors

@sunhwan made their first contribution in #1001
@twinbrian made his first contribution in #1047

Full Changelog: v2.0.4...v2.0.5

@shihchengli

Enhancements and New Features

This release introduces several enhancements and new features to Chemprop. A notable addition is a new notebook demonstrating Monte Carlo Tree Search for model interpretability (see here). Enhancements have been made to the output transformation and prediction saving mechanisms for MveFFN and EvidentialFFN. Additionally, users can now perform predictions on CPU even if the models were trained on GPU. Users are now also warned when not using the TensorBoard logger, helping them to be aware of available logging tools for better monitoring.

Bug Fixes

Several bugs have been fixed in this release, including issues related to Matthews Correlation Coefficient (MCC) metrics and loss calculations, and the behavior of the CGR featurizer when the bond features matrix is empty. The task_weights parameter has been standardized across all loss functions and moved to the correct device for MCC metrics, preventing device mismatch errors.

What's Changed

Standardize task_weights in LossFunction across all loss functions by @shihchengli in #941
Improve output transformation and prediction saving for MveFFN and EvidentialFFN by @shihchengli in #943
Enable CPU prediction for GPU-trained models by @snaeppi in #950
Fix Issues in MCC Metrics and Loss Calculations by @shihchengli in #942
Fix docs building by pinning sphinx-argparse by @jonwzheng in #964
Add Monte Carlo Tree search notebook for interpretability by @hwpang in #924
Fix CGR featurizer behavior when bond features matrix is empty by @jonwzheng in #958
Fix Failing CI for torch==2.4.0 on Windows ray[tune] Tests by @JacksonBurns in #971
warn users when not using tensorboard logger by @JacksonBurns in #967
Bug: Move task_weights to 'device' for MCC metrics by @YoochanMyung in #973

New Contributors

@snaeppi made their first contribution in #950
@YoochanMyung made their first contribution in #973

Full Changelog: v2.0.3...v2.0.4

@shihchengli

Notable changes

The mfs argument of MoleculeDatapoint was removed in #876. This argument accepted functions which generated molecular features to use as extra datapoint descriptors. When using chemprop in a notebook, users should first manually generate their molecule features and pass them into the datapoints using x_d which stands for (extra) datapoint descriptors. This is demonstrated in the extra_features_descriptors.ipynb notebook under examples. CLI users will see no change as the CLI will still automatically calculate molecule features using user specified featurizers. The --features-generators flag has been deprecated though in favor of the more descriptive --molecule-featurizers. Available molecule features can be found in the help text generated by chemprop train -h.

The default aggregation was changed to norm in #946. This was meant to be change in version 2.0.0, but got missed. Norm aggregation was used in all the benchmarking of version 1 as it performs better than mean aggregation when predicting properties that are extensive in the number of atoms.

More documentation for the CLI hpopt and fingerprint commands have been added and can be viewed here and here.

The individual predictions of an ensemble of models are now automatically averaged and the individual predictions are saved in a separate file. #919

What's Changed

Change the installed numpy version in pyproject by @shihchengli in #922
Explicitly double save scalers/criterion by @KnathanM in #898
Add --show-individual-scores CLI flag by @shihchengli in #920
Set Ray Train's trainer resources to 0 by @hwpang in #928
Save individual and average predictions into different files by @shihchengli in #919
Add CLI pages for hpopt and fingerprint by @jonwzheng in #914
Make fingerprint CLI consistent with predict CLI by @hwpang in #927
Fix issue related to target column for fingerprint by @hwpang in #939
build molecule featurizer in parsing by @KnathanM in #875
Remove featurizing from datapoint by @KnathanM in #876
change aggregation default to norm by @KnathanM in #946
Use mol.GetBonds() instead of for loop by @KnathanM in #931

Full Changelog: v2.0.2...v2.0.3

@jonwzheng

In this release, we have included numerous notebooks to document modules. Chemprop may be used in python scripts, allowing for greater flexibility and control than the CLI. We recommend first looking through some of the worked examples to get an overview of the workflow. Then further details about the creation, customization, and use of Chemprop modules can be found in the module tutorials.

New CLI Features

Improved `--model-path` CLI

Previously --model-path could take either a single model file or a directory containing model files. Now it can take any combination of checkpoint files (.ckpt), model files (.pt), and directory containing model files. Directories are recursively searched for model file (.pt). Chemprop will use all models given and found to make predictions (#731).

Improvements for hpopt CLI

Some flags related to Ray Tune (i.e., --raytune-temp-dir, --raytune-num-cpus, --raytune-num-gpus, and --raytune-max-concurrent-trials) have been added. You can use the CLI to initiate your Ray instance using these flags. (#918)

Bug fix

An incorrect max learning rate was used when writing the config file after hyperparameter optimization. This is now fixed (#913).

What's Changed

Fix typos in docstrings and .rst files that led to rendering errors by @jonwzheng in #901
Add CLI transition guide link to RTD by @kevingreenman in #907
Add meaningful warning for warm up epoch search space by @hwpang in #909
Fixing small bug in hpopt for learning rate by @akshatzalte in #913
Add notebooks to document modules by @KnathanM in #834
V2: consolidate --checkpoint CLI by @hwpang in #731
Improvements for hpopt cli by @hwpang in #918

Full Changelog: v2.0.1...v2.0.2

@kevingreenman

New CLI Features

Caching in CLI

MolGraphs are now created (by featurizing molecules) and cached at the beginning of training by default in the CLI. If you wish to disable caching, you can use the --no-cache flag, which will featurize molecules on the fly instead. (#903)

Change the default trial scheduler in HPO

We changed the default trial scheduler for HPO from AsyncHyperBand to FIFO, as it is the default in Ray and was used in version 1. You can switch the trial scheduler back to AsyncHyperBand by using --raytune-trial-scheduler AsyncHyperBand if needed. (#896)

Support Optuna in HPO

You can use optuna as a HPO search algorithm via --raytune-search-algorithm optuna. (#888)

CLI Bug Fixes

HPO-related bugs

In #873, we changed the search space for the initial and final learning rate ratios and max_lr to avoid very small (~10^-10) learning rates and also ensured that some hyperparameters are saved as integers instead of floating-point numbers (e.g., batch_size). In #881, we addressed the bug concerning the incompatibility of the saved config file with the training config. In #836, we shut down Ray processes after HPO completion to avoid zombie processes. For those encountering issues with Ray processes, we suggest you start Ray outside of the Python process.

DDP-related bugs

In #884, we resolved the issue where metrics were not synchronized across processes and disabled the distributed sampler during testing in DDP.

Backwards incompatibility note

In #883, we fixed the bug related to unused parameters in DDP. Models created via the CLI in v2.0.0 without additional atomic descriptors cannot be used via the CLI in v2.0.1. You will need to first remove message_passing.W_d.weight and message_passing.W_d.bias from the model file's state_dict to make it compatible with the current version.

What's Changed

update v2 installation instructions page in docs by @kevingreenman in #831
Remove Ray zombie processes by @shihchengli in #836
Docker images for v2 by @JacksonBurns in #841
Change Docker sytnax for MyBinder compatibility by @JacksonBurns in #872
[V2] Fix featurizer cli by @hwpang in #865
Fix hyperparameter predictorbase by @c-w-feldmann in #832
V2: Add all notebooks to test by @hwpang in #840
Fix small bugs in hpopt by @akshatzalte in #873
Add pip setup step to environment.yml install instructions by @cjmcgill in #889
Avoid scrambling target column name order by @JacksonBurns in #893
Fix unused parameters issue in DDP by @shihchengli in #883
Fix the inference issue related to the target columns by @shihchengli in #895
Change the default trial scheduler to FIFOScheduler by @shihchengli in #896
Add Optuna support for HPO by @shihchengli in #888
Fix Circular Import with isort by @JacksonBurns in #887
make LookupAction work with ConfigArgParse by @KnathanM in #900
V2: Fix typo in hpopt installation instruction by @hwpang in #897
V2: Make hpopt config compatible with training config by @hwpang in #881
Fix DDP prediction and checkpoint Issues by @shihchengli in #884
Add simple cache to CLI by @KnathanM in #903
V2: Fix small hpopt bugs and add example notebook by @hwpang in #842

New Contributors

@akshatzalte made his first contribution in #873

Full Changelog: v2.0.0...v2.0.1

This is the first stable release of Chemprop v2.0.0, with updates since the v2.0.0-rc.1 release candidate in early March.

The primary objectives of v2.0.0 are making Chemprop more usable from within Python scripts, more modular, easier to maintain and develop, more compute/memory efficient, and usable with PyTorch Lightning. Some features will not be migrated from v1 to v2 (e.g. web, sklearn). Some v1 features will be added in later versions of v2 (v2.1+) (e.g. uncertainty, interpret, atom- and bond-targets); see milestones here. The new version also has substantially faster featurization speeds and much higher unit test coverage, enables training on multiple GPUs, and works on Windows (in addition to Linux and Mac). Finally, the incorporation of a batch normalization layer is expected to result in smoother training and improved predictions. We encourage all Chemprop users to try using v2.0.0 to see how it can improve their workflows.

v2 documentation can be found here.

There are v2 tutorial notebooks in the examples/ directory.

A helpful transition guide from Chemprop v1 to v2 can be found here. This includes a side-by-side comparison of CLI argument options, a list of which arguments will be implemented in later versions of v2, and a list of changes to default hyperparameters.

Note that if you install from source, the primary branch of our repository has been renamed from master to main.

Due to development team bandwidth, Chemprop v1 will no longer be actively developed, so that we can focus our efforts on v2. Bug reports and questions about v1 are still welcome to benefit users who haven't yet made the switch to v2, but bug reports will not be fixed by the development team.

Please let us know of any bugs you find, questions you have, or enhancements you want in Chemprop v2 by opening an issue.

@shihchengli

This is the final release of chemprop v1. All future development will be done on chemprop v2. The development team is still happy to answer questions about v1, but no new feature requests or PRs for v1 will be accepted. Users who identify bugs in v1 are still encouraged to open issues to report them - they will be tagged as v1-wontfix to signify that we won't be publishing fixes for them in official chemprop releases, but the bugs can still be open to community discussion.

We encourage all users to try migrating their workflows over to chemprop v2 (available now as a release candidate, stable version planned to be released within the next week) and let us know of any issues you encounter. All v1 releases will remain available on PyPI, and the v1 source code will remain available in this GitHub organization.

What's Changed

fix the uncal_vars for atom/bond property prediction by @shihchengli in #712
[v1]: Add Docker Image Building Action and Official Images to DockerHub by @JacksonBurns in #718
remove macos and windows from v1 ci by @JacksonBurns in #720
update docker build if to use correct upstream branch name by @JacksonBurns in #723
fix the task names by @shihchengli in #725
Fixed typo in README.md by @willspag in #745

New Contributors

@willspag made their first contribution in #745

Full Changelog: v1.7.0...v1.7.1

Releases: chemprop/chemprop

v2.1.2

What's Changed

New Contributors

Contributors

v2.1.1

Notable changes

Bug Fixes

What's Changed

Contributors

v2.1.0

Core code changes

CLI changes

New notebooks

CI/CD

Backwards Compatibility Note

What's Changed

Contributors

v2.0.5

Core code changes

CLI changes

Transfer learning

Documentation

Other small bug fixes

New Contributors

Contributors

v2.0.4

Enhancements and New Features

Bug Fixes

What's Changed

New Contributors

Contributors

v2.0.3

Notable changes

What's Changed

Contributors

v2.0.2 Adding Document Modules and hpopt Enhancement

New CLI Features

Improved --model-path CLI

Improvements for hpopt CLI

Bug fix

What's Changed

Contributors

v2.0.1 First Patch

New CLI Features

Caching in CLI

Change the default trial scheduler in HPO

Support Optuna in HPO

CLI Bug Fixes

HPO-related bugs

DDP-related bugs

Backwards incompatibility note

What's Changed

New Contributors

Contributors

v2.0.0 Stable Release

Final Patch for Version 1

What's Changed

New Contributors

Contributors

Improved `--model-path` CLI