CP-MLR/PLS directed quantitative structure-activity relationship study on the histamine H 3 receptor binding affinity: The cyclohexylamine based series

The histamine H 3 receptor binding affinities of cyclohexylamine derivatives has been analysed with the topological and molecular features from Dragon software. Analysis of the structural features in conjunction with the biological endpoints in combinatorial protocol in multiple linear regression (CP-MLR) led to the identification of 26 descriptors for modelling the activity. The study clearly suggested the role of atomic properties such as mass, electronegativity or charge content, polarizability, atomic van der Waals volume, average valence connectivity index chi-5 and absence of number of acceptor atoms for H-bonds (N, O, F) type functionality to optimise the histamine H 3 receptor binding affinity of titled compounds. The models developed and the participating descriptors advocate that the substituent groups of the cylohexylamine moiety hold scope for further modification in the optimization of the H 3 receptor binding affinity. Analysis of these descriptors in partial least squares (PLS) highlighted their relative significance in modulating the biological response. The selected descriptors are enriched with information corresponding to the activity when compared to the remaining ones. Applicability domain analysis revealed that the suggested model matches the high quality parameters with good fitting power and the capability of assessing external data and all of the compounds was within the applicability domain of the proposed model and were evaluated correctly.


Introduction
The histamine H3 receptor (H3R) which is a G-protein-coupled receptor was discovered by Arrang et al. [1] in 1983 and cloned by Lovenberg et al. [2] in 1999.Extensive studies and reviews on H3R revealed that this receptor mostly controls the histamine biosynthesis and release via a negative-feedback process [3][4][5][6][7][8][9][10]. It is found that the release of several other neurotransmitters such as acetylcholine [11,12], noradrenaline [13], dopamine [14] and serotonin [15] regulated by H3R.In the brain the release of waking and pro-cognitive action associated histamine is triggered by inverse agonists rather than neutral antagonists of H3R [16].Based on these observations H3R inverse agonists might be useful in several CNS-related disorders that includes narcolepsy, attention deficit hyperactivity disorder (ADHD), schizophrenia, Alzheimer's disease, and excessive daytime sleepiness in obstructive sleep apnea or Parkinson's disease.
In search of potent H3R inverse agonists, devoid of hERG and CYP450 interactions with favorable CNS penetration and pharmacokinetic profile, cyclohexylamine based derivatives have been reported by Labeeuw and coworkers [17].The aim of present communication is to establish the quantitative relationships between the reported binding affinities and molecular descriptors unfolding the substitutional changes in titled compounds.

Data-set
For present work the reported sixty cyclohexylamine based derivatives have been considered as the data set [17].The general structure of these compounds is represented in Figure 1 and structural variations are mentioned in Table 1.

Figure 1 General structure of cyclohexylamine based derivatives
These compounds were evaluated for their histamine H3 binding affinity by displacement of [ 125 I]-iodoproxyfan (IPX) binding to membranes of stably transfected HEK-293 cells [18].The binding affinity has also been reported in Table 1 [17].The same is expressed as pKi on a molar basis and considered as the dependent variable for the present quantitative analysis.The data set was sub-divided into training set to develop models and test set to validate the models externally.The test set compounds which were selected using an in-house written randomization program, are also mentioned in Table 1.

Molecular descriptors
The structures of the compounds (Table 1), under study, have been drawn in 2D ChemDraw [19] and were converted into 3D objects using the default conversion procedure implemented in the CS Chem3D Ultra.The generated 3Dstructures of the compounds were subjected to energy minimization in the MOPAC module, using the AM1 procedure for closed shell systems, implemented in the CS Chem3D Ultra.This will ensure a well defined conformer relationship across the compounds of the study.All these energy minimized structures of respective compounds have been ported to DRAGON software [20] for computing the descriptors corresponding to 0D-, 1D-, and 2D-classes.

Development and validation of model
The combinatorial protocol in multiple linear regression (CP-MLR) [21][22][23][24][25] and partial least squares (PLS) [26][27][28] procedures have been used in the present work for developing QSAR models.The CP-MLR is a "filter"-based variable selection procedure, which employs a combinatorial strategy with MLR to result in selected subset regressions for the extraction of diverse structure-activity models, each having unique combination of descriptors from the generated dataset of the compounds under study.The embedded filters make the variable selection process efficient and lead to unique solution.Fear of "chance correlations" exists where large descriptor pools are used in multilinear QSAR/QSPR studies [29,30].Furthermore, in order to discover any chance correlations associated with the models recognized in CP-MLR, each cross-validated model has been put to a randomization test [31,32] by repeated randomization of the activity to ascertain the chance correlations, if any, associated with them.For this, every model has been subjected to 100 simulation runs with scrambled activity.The scrambled activity models with regression statistics better than or equal to that of the original activity model have been counted, to express the percent chance correlation of the model under scrutiny.
Validation of the derived model is necessary to test its prediction and generalization within the study domain.For each model, derived by involving n data points, a number of statistical parameters such as r (the multiple correlation coefficient), s (the standard deviation), F (the F ratio between the variances of calculated and observed activities), and Q 2 LOO (the cross-validated index from leave-one-out procedure) have been obtained to access its overall statistical significance.In case of internal validation, Q 2 LOO is used as a criterion of both robustness and predictive ability of the model.A value greater than 0.5 of Q 2 index suggests a statistically significant model.The predictive power of derived model is based on test set compounds.The model obtained from training set has a reliable predictive power if the value of the r 2 Test (the squared correlation coefficient between the observed and predicted values of compounds from test set) is greater than 0.5.

Applicability Domain
The utility of a QSAR model is based on its accurate prediction ability for new compounds.A model is valid only within its training domain and new compounds must be assessed as belonging to the domain before the model is applied.The applicability domain is assessed by the leverage values for each compound [33].The Williams plot (the plot of standardized residuals versus leverage values, h) can then be used for an immediate and simple graphical detection of both the response outliers (Y outliers) and structurally influential chemicals (X outliers) in the model.In this plot, the applicability domain is established inside a squared area within ± x (s.d.) and a leverage threshold h * .The threshold h * is generally fixed at 3(k + 1)/n (n is the number of training-set compounds and k is the number of model parameters) whereas x = 2 or 3. Prediction must be considered unreliable for compounds with a high leverage value (h > h * ).On the other hand, when the leverage value of a compound is lower than the threshold value, the probability of accordance between predicted and observed values is as high as that for the training-set compounds.

QSAR results
For the compounds in Table 1, a total number of 501 descriptors belonging to 0D-to 2D-classes of DRAGON have been computed.Prior to model development procedure, all those descriptors that are inter-correlated beyond 0.90 and showing a correlation of less than 0.1 with the biological endpoints (descriptor versus activity, r < 0.1) were excluded.This procedure has reduced the total descriptors from 501 to 120 as relevant ones to explain the biological actions of titled compounds and these were subjected to CP-MLR analysis with default "filters" set in it.The descriptors have been scaled between the intervals 0 to 1 [34] to ensure that a descriptor will not dominate simply because it has larger or smaller pre-scaled value compared to the other descriptors.In this way, the scaled descriptors would have equal potential to influence the QSAR models.In multi-descriptor class environment, exploring for best model equation(s) along the descriptor class provides an opportunity to unravel the phenomenon under investigation.In other words, the concepts embedded in the descriptor classes relate the biological actions revealed by the compounds.
The 60 compounds were divided into training-set and test-set.Fifteen compounds (25% of total population) have been selected for test-set.The identified test-set was then used for external validation of models derived from remaining fourty five compounds in the training-set.The squared correlation coefficient between the observed and predicted values of compounds from test-set, r 2 Test, was calculated to explain the fraction of explained variance in the test-set which is not part of regression/model derivation.It is a measure of goodness of the derived model equation.A high r 2 Test value is always good.But considering the stringency of test-set procedures, often r 2 Test values in the range of 0.5 to 0.6 are regarded as logical models.Following the strategy to explore only predictive models, CP-MLR resulted into 02, 54 and 55 models in one, two and three descriptors, respectively.The generated models in one, two and three descriptors, all having r 2 Test<0.5, for the CDK8 inhibitory activity.The selected models are mentioned in Table 2.
The signs of the regression coefficients have indicated the direction of influence of explanatory variables in above models.The positive regression coefficient associated to a descriptor will augment the activity profile of a compound while the negative coefficient will cause detrimental effect to it.a The descriptors are identified from the four parameter models for activity emerged from CP-MLR protocol with filter-1 as 0.79, filter-2 as 2.0, filter-3 as 0.896 and filter-4 as 0.3 ≤ q 2 ≤1.0 with a training set of 45 compounds.b The average regression coefficient of the descriptor corresponding to all models and the total number of its incidence.The arithmetic sign of the coefficient represents the actual sign of the regression coefficient in the models.
Considering the number of observation in the dataset, models with up to four descriptors were explored.It has resulted in 59 four-parameter models with test set r 2 > 0.50.These models (with 120 descriptors) were identified in CP-MLR by successively incrementing the filter-3 with increasing number of descriptors (per equation).For this, the optimum rbar value of the preceding level model has been used as the new threshold of filter-3 for the next generation.These models have shared 26 descriptors among them.All these 26 descriptors along with their brief meaning, average regression coefficients, and total incidence are listed in Table 3, which will serve as a measure of their estimate across these models.
Following are the selected four-descriptor models for the histamine H3 receptor binding affinitiesemerged through CP-MLR.These models have accounted for nearly 85% variance in the observed activities.In the randomization study (100 simulations per model), none of the identified models has shown any chance correlation.The values greater than 0.5 of Q 2 index is in accordance to a reasonable robust QSAR model.The pKi values of training set compounds calculated using Eqs.( 4) to (7) have been included in Table 1.The models (4) to (7) are validated with an external test set of 15 compounds listed in Table 1.The predictions of the test set compounds based on external validation are found to be satisfactory as reflected in the test set r 2 (r 2 Test) values and the same is reported in Table 1.The plot showing goodness of fit between observed and calculated activities for the training and test set compounds is given in Figure 2.
The newly appeared descriptors in above models are BELp7 (BCUT class), Me (constitutional class), GATS8p, GATS5e and MATS2v (2D autocorrelation class) and X5Av (topological class).The descriptors BELp7, GATS8p and MATS2v have shown positive correlation to the activity whereas descriptors Me, X5Av and GATS5e have correlated negatively to the activity.The signs of regression coefficients advocated that higher values of atomic polarizabilities weighted lowest eigenvalue n.7 of Burden matrix (descriptor BELp7), atomic polarizabilities weighted Geary autocorrelation of lag 8 (descriptor GATS8p) and atomic van der Waals volume weighted Moran autocorrelation of lag 2 (descriptor MATS2v) and lower values of mean atomic Sanderson electronegativity scaled on Carbon atom (descriptor Me), average valence connectivity index chi-5 (X5Av) and atomic Sanderson electronegativities weighted Geary autocorrelation of lag 5 (descriptor GATS5e) would be helpful to augment the histamine H3 receptor binding affinity.
A partial least square (PLS) analysis has been carried out on these 26 CP-MLR identified descriptors, mentioned in Table 3, to facilitate the development of a "single window" structure-activity model.For the purpose of PLS, the descriptors have been autoscaled (zero mean and unit SD) to give each one of them equal weight in the analysis.In the PLS crossvalidation, four components are found to be the optimum for these 26 descriptors and they explained 88.17% variance in the activity.The MLR-like PLS coefficients of these 26 descriptors are given in Table 4.For the sake of comparison, the plot showing goodness of fit between observed and calculated activities (through PLS analysis) for the training and test set compounds is also given in Figure 2. Figure 3 shows a plot of the fraction contribution of normalized regression coefficients of these descriptors to the activity.3) associated with histamine H3 receptor binding affinityof cyclohexylamine derivatives The PLS analysis has suggested BELp5 as the most determining descriptor for modeling the activity of the compounds (descriptor S. No. 10 in Table 4; Figure 3).The other nine descriptors in decreasing order of significance are MLOGP, Me, BELp7, MATS6m, GGI1, MATS2e, nHAcc, GATS3e and JGI6.The descriptors BELp5, BELp7, Me, nHAcc and JGI6 are part of Eqs. ( 1) to ( 7) and convey same inference in the PLS model as well.The positive influence of descriptors, MLOGP (Moriguchi octanol-water partition coefficient, logP), MATS6m (atomic masses weighted Moran autocorrelation of lag 6), GGI1 (topological charge index of order 1) and GATS3e (atomic Sanderson electronegativities weighted Geary autocorrelation of lag 3) advocate that higher values of these descriptors would be beneficiary to the binding affinity whereas, a lower value of descriptor MATS2e (atomic Sanderson electronegativities weighted Moran autocorrelation of lag 2) would be helpful for improved activity.It is also observed that PLS model from the dataset devoid of CP-MLR identified 26 descriptors (Table 3) is inferior in explaining the activity of the analogues.

Applicability domain
On analyzing the applicability domain (AD) for the H3 receptor binding affinity in the Williams plot (Figure 4) of the model based on the whole data set (Table 5), No any compound has been identified as an obvious 'outlier' for the H3R binding affinity if the limit of normal values for the Y outliers (response outliers) was set as 3×(standard deviation) units.None of the compound was found to have leverage (h) values greater than the threshold leverage (h*=0.333).For both the training-set and test-set, the suggested model matches the high quality parameters with good fitting power and the capability of assessing external data.Furthermore, all of the compounds were within the applicability domain of the proposed model and were evaluated correctly.1.The horizontal dotted line refers to the residual limit (±3×standard deviation) and the vertical dotted line represents threshold leverage h* (=0.333)

Conclusion
The histamine H3 receptor binding affinities of cyclohexylamine derivatives has been analysed with the topological and molecular features from Dragon software.Analysis of the structural features in conjunction with the biological endpoints in combinatorial protocol in multiple linear regression (CP-MLR) led to the identification of 26 descriptors for modelling the activity.The study clearly suggested the role of atomic properties such as mass, electronegativity or charge content, polarizability, atomic van der Waals volume, average valence connectivity index chi-5 and absence of number of acceptor atoms for H-bonds (N, O, F) type functionality to optimise the histamine H3 receptor binding affinity of titled compounds.The models developed and the participating descriptors advocate that the substituent groups of the cylohexylamine moiety hold scope for further modification in the optimization of the H3 receptor binding affinity.
Analysis of these descriptors in partial least squares (PLS) highlighted their relative significance in modulating the biological response.The selected descriptors are enriched with information corresponding to the activity when compared to the remaining ones.Applicability domain analysis revealed that the suggested model matches the high quality parameters with good fitting power and the capability of assessing external data and all of the compounds was within the applicability domain of the proposed model and were evaluated correctly.

Figure 2
Figure 2 Plot of observed versus caculated pKi values for training-and test-set compounds for histamine H3 receptor binding affinity

Figure 4
Figure 4Williams plot for the training-set and test-set for H3R binding affinity of compounds in Table1.The horizontal dotted line refers to the residual limit (±3×standard deviation) and the vertical dotted line represents threshold leverage h* (=0.333)

Table 2
(1)(2)(3)ignificant models in one, two and three parameters derived for training set through CP-MLR for histamine H3 receptor binding affinity In above model Eqs.,(1)(2)(3), the descriptor BELp5 is BCUT class descriptor.The other participating descriptors are GATS7e (2D autocorrelation class), JGI6 (Galvez topological charge index) and nHAcc (functional group).The positive sign of regression coefficients of descriptors BELp5 (atomic polarizabilities weighted lowest eigenvalue n.5 of Burden matrix), GATS7e (atomic Sanderson electronegativities weighted Geary autocorrelation of lag 7) and JGI6 (mean topological charge index of order 6) suggested that a higher value of these descriptors would be beneficial to augment the H3R binding affinity.On the other hand, absence of number of acceptor atoms for H-bonds (N, O, F) type functionality (descriptor nHAcc) in a molecule would be supportive to the H3R binding affinity.

Table 4
PLS and MLR-like PLS models from the 26 descriptors of four parameter CP-MLR models for histamine H3 receptor binding affinity Regression coefficient of PLS factor and its standard error.b Coefficients of MLR-like PLS equation in terms of descriptors for their original values; c f.c. is fraction contribution of regression coefficient, computed from the normalized regression coefficients obtained from the autoscaled (zero mean and unit s.d.) data. a

Table 5
Models derived for the whole data set (n = 60) in descriptors identified through CP-MLR for histamine H3 receptor binding affinity