Replica-Exchange Enveloping Distribution Sampling: Calculation of Relative Free Energies in GROMOS

: Free-energy calculations based on molecular dynamics (MD) simulations are playing an increasingly important role for computer-aided drug design and material discovery in recent years. Free-energy differences between pairs of end-states can be estimated using well-established methods such as thermodynamic integration (TI) or Bennett’s acceptance ratio (BAR). An attractive alternative is the recently developed replica-exchange enveloping distribution sampling (RE-EDS) method, which enables estimating relative free-energy differences between multiple molecules from a single simulation. Here, we provide an introduction to the principles underlying RE-EDS and give an overview of the RE-EDS pipeline. In addition, we provide a description of the two complementary tools RestraintMaker and amber2gromos. We briefly discuss the findings of three recent applications of RE-EDS to calculate relative binding or hydration free energies. In all three studies, good agreement was found between the results obtained using RE-EDS and experimental values as well as values obtained using other free-energy methods.

Salomé R. Rieder obtained her Bachelor's and Master's degree in Computational Science and Engineering with a specialization in fluid dynamics and computational chemistry from ETH Zurich. For her Master's thesis she was awarded the ETH Medal in 2019. She has been working as a PhD candidate in the CSMS and CCG research groups since 2018. Her research is focused on the CombiFF force-field parameterization project and on the development and application of the RE-EDS multistate free-energy method.

Introduction
Molecular dynamics (MD) simulation is a well-established tool to investigate the properties of molecular systems in silico. In classical MD simulations, Newton's equations of motion are integrated numerically to propagate the particles of a system in time. The accuracy of MD calculations relies critically on the quality of the underlying force field, i.e. the functional form and parameters that determine the covalent and non-bonded interactions between the particles. Free-energy calculations are an important and challenging task in the field of MD simulation, specifically for computer-aided drug design and material discovery. The free energy of a state determines the probability that the system will adopt this specific state. [1] For example, the (relative) binding free energy calculated in simulations is directly related to the experimental binding affinities of ligands to a protein. [2] Persistent methodological, software and hardware advances continue to make the calculation of binding free energies more efficient and accurate. Furthermore, the automation of free-energy pipelines renders the setup of such calculations less error prone. However, they can still be quite timeconsuming. Owing to much smaller system sizes, the calculation of solvation/hydration free energies of small organic molecules is much faster, making them a good test case to compare different free-energy methods and force fields.
Here, we provide an overview of the replica-exchange enveloping distribution sampling (RE-EDS) [3,4] pipeline, [5] as well as of the two complementary tools RestraintMaker [6] and amber2gromos. [7] We also briefly discuss the results of three recent studies

RE-EDS Pipeline
Recently, Ries et al. [5] published an improved RE-EDS pipeline consisting of three main phases: parameter exploration, parameter optimization, and production. During the parameter exploration, a favorable configuration is generated for each end-state to use as initial coordinates for a starting state mixing (SSM) approach. Further, a lower bound for the s-values is determined and initial energy offsets are estimated. The second phase involves optimizing the s-distribution (number and location of replicas) and rebalancing of the energy offsets such that all end-states are sampled roughly equally at s = 1. Finally, the relative free-energy differences of all end-state pairs in the system are calculated from the production run. The workflow is controlled by the Python3 reeds module, freely available at https://github.com/rinikerlab/reeds/.

RestraintMaker
In free-energy calculations, the end-states can be represented as 'single', 'hybrid', or 'dual topology'. [6] In (RE-)EDS, a dual topology representation is typically used, i.e. the coordinates of all ligands are present separately in the system, independent from each other. Consequently, the molecules might drift away from each other during a simulation. One strategy to prevent this drift is the application of (atomic) distance restraints to the common core of the molecules. As the number of end-states in the system and/or the size of the ligands grows, the choice investigating the performance of RE-EDS, RestraintMaker and am-ber2gromos to calculate relative binding or hydration free energies.

Calculation of Free-energy Differences
In the thermodynamic cycle, two equivalent pathways can be taken to obtain the relative hydration free energy of two end-states A and B, either via absolute hydration free energies or via pairwise free-energy differences in water and in vacuum (Eqn. (1)) Thermodynamic integration (TI) [8] and (multistate) Bennett's acceptance ratio (MBAR) [9,10] are two well-established pathwaydependent methods to estimate free-energy differences from MD simulations using a coupling parameter λ. They can be used, for example, to estimate the free-energy difference of a molecule upon a change in the environment, or the relative free-energy difference of a pair of molecules. Enveloping distribution sampling (EDS), [11,12] on the other hand, is a pathway-independent multistate free-energy method, i.e. the free-energy difference of any end-state pair in the system can be calculated from a single simulation. In EDS, a reference state V R combining N end-states is defined in Eqn. (2) (Fig. 1) where s > 0 is the smoothness parameter, E R is a set of energy offsets, and β = 1/(k B T) with k B being the Boltzmann constant and T the absolute temperature. The energy offsets govern the contribution of the end-states to the reference potential energy. At high s-values, the reference state is dominated by the end-state with the lowest value of . As the smoothness parameter decreases towards zero, it flattens the potential-energy landscape such that each end-state contributes to the reference state simultaneously (Fig. 1). The free-energy difference between any pair of end-states in the system can then be calculated as, To obtain accurate free-energy estimates with EDS, an optimal choice of the s-value and of the energy offsets is essential in practice. To mitigate the necessity for an optimal s-value, replicaexchange EDS (RE-EDS) was introduced, [3,4] combining EDS with Hamiltonian replica exchange. [13,14] In RE-EDS, the sampling of the end-states is enhanced by simulating multiple replicas with different s-values and attempting replica exchanges at fixed intervals (Fig. 2). Three 'sampling regions' can be distinguished: physical sampling (s ≈ 1) with typically one dominant end-state, an intermediate region where more end-states start to contribute to the sampling, and a so-called 'undersampling' region with all end-states contributing roughly equally. [5,15]  with FEP+ [23] and QligFEP [24] (Fig. 3, left). With an RMSE of 3.3 kJ/mol against the experimental values, the accuracy of the results obtained with RE-EDS SSM was considerably better than with RE-EDS 1SS (RMSE = 4.8 kJ/mol). The accuracy of RE-EDS SSM was found to be comparable to that of FEP+ and QligFEP. With a total simulation time of about 50 ns, QligFEP was the most efficient of the three approaches, followed by the RE-EDS pipeline with about 160 ns, requiring only about a quarter of the simulation time of FEP+. One of the main advantages of RE-EDS is the fact that all transformations are sampled explicitly, avoiding the error propagation that occurs when only selected pairs are calculated with pathway dependent methods.

Validation of RestraintMaker: Relative Hydration Free Energies of Small Molecules
In another study, [6] relative hydration free energies were calculated for two sets of small molecules in order to validate the quality of the distance restraints selected by RestraintMaker. The first set (labeled A) consisted of six molecules and the second set (labeled B) contained ten molecules. The selected molecules contained a small ring core and involved R-group changes and core-hopping transformations, e.g. benzene to cyclohexane. Dual topology TI and RE-EDS simulations were carried out in vacuum/water in GROMOS to estimate the relative hydration free energies. RestraintMaker was used to select atomic distance restraints. For the multistate RE-EDS calculations, each molecule was restrained to two other molecules, forming a cyclic chain of pairwise restrained molecules. The obtained relative hydration free energies were compared to experimental values as well as to the results obtained from TI calculations as reported on the AT B server [24] (Fig. 3, center). For a reasonable force constant (5000 kJ mol -1 nm -2 ), the molecules remained well-aligned during the simulations and the sampling behavior was not (or only negligibly) affected by the employed distance restraints. For both selected sets, the results obtained with the different free energy estimators agreed well with each other and with the experimental values. The simulation time required to obtain converged freeenergy estimates with RE-EDS was shorter than for the dual topology TI calculations by about a factor 5 for set A and a factor 3.5 for set B.
of appropriate distance restraints becomes increasingly nontrivial. To address this, the Python3 program RestraintMaker was developed by Ries et al., [6] which is freely available at https://github.com/rinikerlab/restraintmaker/. Given a set of pre-aligned molecules, RestraintMaker employs a greedy algorithm to select locally optimal distance restraints for dual topology free-energy calculations, e.g. with RE-EDS or TI.

amber2gromos
A popular small molecule force field is the generalized AMBER force field (GAFF). [16] One of the advantages of GAFF is the availability of antechamber, [17] a tool to automatically parameterize molecular systems. GAFF was originally developed for the AMBER MD engine. [18] There are several differences between AMBER and GROMOS force fields, such as the file formats, the units, and the functional form. [19] As RE-EDS is currently implemented in the GROMOS MD engine, [20] the C++ program amber2gromos [7] was developed to convert GAFF (or generally AMBER) topologies into a format compatible with the GROMOS MD engine, facilitating the preparation of new systems of interest for (RE-EDS) simulations in GROMOS. amber2gromos will be part of the next scheduled release of the GROMOS++ package, [21] freely available at http://gromos.net/.

RE-EDS Case Studies
The performance of RE-EDS has been assessed in several studies. [3][4][5][6][7] In this section, we would like to highlight three recent publications.

Validation of the RE-EDS Pipeline: Relative Binding Free Energies of CHK1 Inhibitors
In a study by Ries et al., [5] the improved RE-EDS pipeline was tested by estimating the relative binding free energies of five inhibitors of human checkpoint kinase 1 (CHK1). [22] The five selected ligands contained R-group modifications as well as different core-hopping transformations such as ring size change and ring opening/closing. Two RE-EDS pipelines were compared, one with a single set of starting coordinates (labeled 1SS) and one with SSM. The relative binding free energies obtained were then compared to experimental values [22] and to reported results obtained Fig. 3. Comparison of relative binding free energies (left) or hydration free energies (middle, right) obtained with different free-energy methods to experimental values. The underlying datasets and simulation setups are discussed in detail in refs. [5][6][7]. The gray line corresponds to perfect agreement with ±4.184 kJ/mol (±1 kcal/mol). The reported metrics (RMSE, MAE, r Spearman ) compare the RE-EDS results to the experimental values. Overall, there is a good agreement between RE-EDS and both experiment and other free-energy estimators. (Left): relative binding free energies of five CHK1 inhibitors obtained with FEP+ with/without cycle closure (CC), [23] QligFEP, [24] and RE-EDS (SSM and 1SS) [5] compared to experimental results. [22] (Middle): relative hydration free energies of two sets of small molecules with a ring core obtained with TI [25] (direct via absolute free-energy calculations), TI [6] (indirect via relative free-energy calculations), and RE-EDS [6] compared to experimental results. [25] (Right): relative hydration free energies for two sets of benzene derivatives obtained with MBAR [26,27] and RE-EDS [7] compared to experimental results. [26,27]

License and Terms
This is an Open Access article under the terms of the Creative Commons Attribution License CC BY 4.0. The material may not be used for commercial purposes. The license is subject to the CHIMIA terms and conditions: (https://chimia.ch/chimia/about).
The definitive version of this article is the electronic one that can be found at https://doi.org/10.2533/chimia.2022.327

Validation of amber2gromos: Relative Hydration Free Energies of Small Benzene Derivatives
A third study [7] focused on the validation of the GAFF topologies converted with amber2gromos for RE-EDS simulations in GROMOS. Two sets of benzene derivatives were selected from the FreeSolv [26,27] database. A small set (labeled A) containing six molecules was used as an initial sanity check. To assess the accuracy of the relative free-energy differences in vacuum/water, complementary single topology TI simulations were carried out with the GROMACS [28] MD engine for set A. In order to explore the performance of multistate RE-EDS, a larger set (labeled B) with 28 molecules was selected. To investigate whether the performance is adversely affected by the larger number of end-states, set B was additionally divided into two subsets (Ba and Bb, with one molecule in common). For all sets, amber2gromos was used to generate the topologies from the input GAFF topologies, RestraintMaker to generate appropriate distance restraints, and the RE-EDS pipeline to carry out the simulations. The results were compared to the experimental and calculated (using MBAR) values reported in FreeSolv (Fig. 3, right). With an overall RMSE of 2.6 kJ/mol against experimental data and 1.1 kJ/mol against the MBAR results, the agreement was excellent. The relative hydration free energies obtained with RE-EDS from the two combined subsets Ba and Bb were slightly more accurate than the results for the full set B (RMSE of 2.5 kJ/mol versus 2.6 kJ/mol). As for the previously described study, the required simulation time was considerably shorter for RE-EDS than for TI (at least a factor 6 for set A) and MBAR (about a factor 5.5 for set A, 4.5 for set B).

Conclusions
We provided an overview of the RE-EDS pipeline and the complementary tools RestraintMaker and amber2gromos. Combined, they provide a convenient, robust, and largely automated workflow for the setup and execution of free-energy calculations with RE-EDS in GROMOS.
The results from three applications of RE-EDS to calculate relative binding or hydration free energies were briefly discussed. In all three studies, the relative free energies obtained with RE-EDS accurately reproduced experimental values, as well as results obtained with different free energy methods, even for a large number (e.g. 28) of end-states. A major advantage of the RE-EDS pipeline is the comparably small amount of total simulation time required.