QSAR rationales for the PPARα/γ agonistic activity of 4,4-Dimethyl-1,2,3,4-tetrahydroquinoline derivatives

The PPARγ binding affinity and transactivation profiles for hPPARα and hPPARγ of tetrahydroquinoline derivatives have been quantitatively analyzed in terms of topological 0D-, 1Dand 2D-descriptors based on molecular graph theory. Statistically sound models have been obtained between the biological actions and various DRAGON descriptors through combinatorial protocol-multiple linear regression (CP-MLR) computational procedure. Amongst the large number of such derived models, the most significant ones have only been discussed to draw meaningful conclusions. From the statistically significant models, it appeared that the mode of actions of titled compounds were different for hPPARα and hPPARγ transactivation profiles and PPARγ binding affinity. Applicability domain analysis carried out for PPARγ binding affinity revealed that the suggested model matches the high quality parameters with good fitting power and the capability of assessing external data and all of the compounds was within the applicability domain of the proposed model and were evaluated correctly.


Introduction
A complex metabolic disease, Type-2 diabetes (T2D), come with a defect in pancreatic-cell and is characterized by resistance of insulin in the liver and peripheral tissues [1]. T2D, due to lack of physical activity and excessive food intake, is presumed to attain epidemic proportions [2]. The treatment of T2D is currently aimed at to improve insulin secretion by reducing hyperglycemia or to reduce the insulin resistance of peripheral tissues. Most of such types of commonly used therapies were developed without considering therapeutic target. Therefore attempts were made to identify more suitable therapeutic strategies with better insight of the disease's pathogenesis [3]. Peroxisomes proliferators activated receptors (PPARs), belonging to the family of nuclear receptors, are ligand-activated transcription factors [4]. Three subtypes namely PPAR, PPAR and PPAR/() have been identified after the discovery in 1990 by Issemen and Green [5]. These receptors are extensively involved in glucose and lipid homeostasis [6][7][8]. A number of agonists in this class have progressed to the clinical phase and marketed as anti-diabetic drugs [9,10].The hypolipidemic fibrates and glitazones class of insulin sensitizers, full-agonists of PPAR [4] and PPAR, respectively, has motivated pharmaceutical companies to focus on developing more potent and dual acting agonists belonging to these two subtypes. In the treatment of dyslipidemic T2D dual-acting PPAR/ agonists such as Tesaglitazar and Muraglitazar have been observed as a very attractive option [6,10,[13][14][15][16][17][18]. These compounds may also circumvent or reduce the main side effects such as weight gain or edema induced by the full PPAR agonists like TZDs [19]. The ligand-protein interactions of a typical PPAR agonists revealed that the acidic head group of ligand, known as carboxylic acid, is involved in up to four hydrogen bounds with the receptor which is crucial part for activation of PPAR. The central aromatic moiety is located in a hydrophobic pocket while the cyclic tail tolerates more polar substituents [20]. Based on the typical topology of synthetic PPAR agonists 4,4-dimethyl-1,2,3,4-tetrahydroquinoline has been considered as novel cyclic tail to design novel PPAR selective agonists and/or dual PPAR/ agonists [21]. A new series of 4,4-dimethyl-1,2,3,4-tetrahydroquinoline-based compounds as effective PPARselective agonists and dual-acting agonists of PPAR and PPARhas been reported [22,23]. The aim of present communication is to establish the quantitative relationships between the reported activities and molecular descriptors unfolding the substitutional changes in titled compounds.

Biological actions and theoretical molecular descriptors
The reported eighteen tetrahydroquinoline derivatives are considered as the data set for this study [22,23]. The structures of these analogues are given in Table 1.  These derivatives were evaluated for binding affinity to human PPARγ using a competitive binding assay with [ 3 H] Rosiglitazone. Functional activity was determined in a transient transfection assay using pGAL4hPPAR and pGAL4hPPAR The reported binding affinity in terms of pKi(M) and transactivation activity in terms of pEC50(M) of these congeners is presented in Table 3 and Table 6, respectively. For modeling purpose the data set has been sub-divided into training set (for model development) and test set (for external prediction or validation). The selection of test set compounds was made using an in-house written randomization program. The test and training set compounds are mentioned in Table  3.
The structures of the all the compounds (listed in Table 1)were drawn in 2D ChemDraw [24] and subjected to energy minimization in the MOPAC using the AM1 procedure for closed shell system after converting these into 3D modules. The energy minimization was carried out to attain a well-defined conformer relationship among the congeners under study. The molecular descriptors of titled compounds were computed using DRAGON software [25]. This software offers a large number of descriptors corresponding to ten different classes of 0D-to 2D-descriptor modules. The different descriptor classes include the constitutional, topological, molecular walk counts, BCUT descriptors, Galvez topological charge indices, 2D-autocorrelations, functional groups, atom-centered fragments, empirical descriptors and the properties describing descriptors. These descriptors are characteristic to the molecules under multi-descriptor environment. A total number of 486 descriptors, belonging to 0D-to 2D-modules, have been computed to obtain most appropriate models describing the biological activity.
Descriptors which are inter-correlated beyond 0.9 (descriptor vs. descriptor, r > 0.9) and poorly correlated with biological actions (descriptor vs. activity, r < 0.1) has been excluded prior to the application of CP-MLR procedure. In this way the reduced descriptor data set contained 55, 39 and 67 as relevant descriptors for PPARγ binding, and hPPARα and hPPARγ transactivation activities, respectively. The descriptors have been scaled between the intervals 0 to 1 [33] to ensure that a descriptor will not dominate simply because it has larger or smaller pre-scaled value compared to the other descriptors and the scaled descriptors would have equal potential to influence the QSAR models.
The subdivision of data set into training set and test set have been used, respectively, for model development and external prediction. Goodness of fit of the models was assessed by examining the multiple correlation coefficient (r), the standard deviation (s) and the F-ratio between the variances of calculated and observed activities (F). The internal validation of derived model was ascertained through the cross-validated index, Q 2 , from leave-one-out (Q 2 LOO) and leave-three-out (Q 2 L3O) procedures. The LOO method creates a number of modified data sets by taking away one compound from the parent data set in such a way that each observation has been removed once only. Then one model is developed for each reduced data set, and the response values of the deleted observations are predicted from these models.
The external validation or predictive power of derived model is based on test set compounds. The index r 2 Test, representing the squared correlation coefficient between the observed and predicted data of the test-set, has been used to infer the same. A value greater than 0.5 of r 2 Test suggests that the model obtained from training set has a reliable predictive power. Chance correlations, if any, associated with the CP-MLR models were explored through randomization test [34,35] by repeated scrambling of the biological response. Every model has been subjected to 100 such simulation runs. This has been used as a measure to express the percent chance correlation of the model under scrutiny.
To support the findings, a partial least squares (PLS) analysis has been carried out on descriptors identified through CP-MLR. The PLS analysis facilitates the development of a 'single window' structure-activity model and help to categorize the potentiality of identified descriptors in explaining the PPARγ binding profiles of the compounds. It also gives an opportunity to make a comparison of the relative significance among the descriptors. The fraction contributions obtainable from the normalized regression coefficients of the descriptors allow this comparison within the modeled activity.

Applicability domain
The utility of a QSAR model is based on its accurate predictive ability for new congeners. A model is valid only within its training domain, and new compounds must be assessed as belonging to this domain before the model is applied. The applicability domain is assessed by the leverage values for each compound [36,37]. A Williams plot (the plot of standardized residuals versus leverage values (h) can then be used for an immediate and simple graphical detection of both the response outliers (Y outliers) and structurally influential chemicals (X outliers) in the model. In this plot, the applicability domain is established inside a squared area within ± β.(standard deviations) and a leverage threshold h*. The threshold h* is generally fixed at 3(k+1)/n (n is the number of compounds in the training-set and k is the number of independent descriptors of the model) whereas β = 2 or 3. Prediction must be considered unreliable for compounds with a high leverage value (h > h*). On the other hand, when the leverage value of a compound is lower than the threshold value, the probability of agreement between predicted and observed values is as high as that for the training set compounds.

QSAR results
Initially, the pEC50 values pertaining to hPPARα and hPPARγ transactivation actions were correlated to pKi values corresponding to PPARγ binding activity, and pEC50 values pertaining to hPPARα and hPPARγ transactivations for all active congeners to confer the diversity between the binding and transactivation activities, and hPPARα and hPPARγ transactivations. The derived correlations are given in Equations (1) where n, r, s and F represent respectively the number of data points, the multiple correlation coefficient, the standard deviation and the F-ratio between the variances of calculated and observed activities. All these equations have divulged not very much significant statistical parameters. No correlation between EC50 values obtained from transactivation PPAR tests and Ki values from binding tests suggested that these derivatives may have a binding site different from the Rosiglitazone binding site. This ensures us that the biological actions in terms of binding and or transactivations are independent. Therefore we have considered all types of biological endpoints as the dependent variables in the subsequent parametric analysis.
The PPARγ binding activity of titled compounds was investigated with 55 relevant 0D-, 1D-and 2D-descriptors. A training set consisting 11 compounds was considered for the development of QSAR models and test set involving 04 (nearly one-fourth of the total) compounds for the external validation of derived significant models. CP-MLR resulted one model in one parameter and ten models in two parameters having r 2 Test> 0.5. These models shared 12 descriptors and are listed in Table 2 along with their physical meaning, average regression coefficient and total incidences. The sign of the regression coefficients indicates the direction of influence of explanatory variables in above models. The positive regression coefficient associated to a descriptor will augment the activity profile of a compound while the negative coefficient will cause detrimental effect to it. Table 2 Identified descriptors a along with their physical meaning, average regression coefficient and incidence b , in modeling the binding and transactivation activity.

Descriptor; average regression coefficient and (incidence) in analysis for the:
Binding activity Transactivation activity The data within the parentheses are the standard errors associated with regression coefficients. The descriptors, participated in above models, are from constitutional (AMW), topological (MAXDP and X2A), functional group (nHDon and nROR) and atom-centered fragment (O-060) class. Constitutional class descriptors are molecular connectivity and conformations independent 0D descriptors. The emerged constitutional class descriptor AMW (average molecular weight) has shown positive correlation to activity favoring high average molecular weight of a molecule for elevated binding activity. Topological class descriptors are based on a graph representation of the molecule and are numerical quantifiers of molecular topology obtained by the application of algebraic operators to matrices representing molecular graphs and whose values are independent of vertex numbering or labeling. They can be sensitive to one or more structural features of the molecules such as size, shape, symmetry, branching and cyclicity and can also encode chemical information concerning atom type and bond multiplicity. The negative contribution of descriptor MAXDP (maximal electrotopological positive variation) and positive contribution of descriptor X2A (average connectivity index, chi-2) suggested that a lower value of descriptor MAXDP and a higher value of X2A would be supportive to the activity.  (Table 2).
In above equations (4) to (7), the F-values are significant at 99% level. Value greater than 0.5 of both the indices q 2 LOO and q 2 L3O showed internal robustness of the models whereas accountability of selected test-set for external validation reflected through the r 2 Test values (> 0.5). These models are able to estimate up to 88.36 percent of variance in observed activity of the compounds. The derived statistical parameters of these four models in two parameters have shown the statistically significance, therefore, these models were used to calculate the PPARγ binding activity profiles of all the compounds and are included in Table 3 for the sake of comparison with observed ones. A close agreement between them has been observed. Additionally, the graphical display, showing the variation of observed versus calculated activities is given in Figure 1 to ensure the goodness of fit for each of these four models.

Figure 1 Plot of observed and calculated pKivalues for training-and test-set compounds.
A PLS analysis has also been carried out on 12 descriptors (identified through CP-MLR) to support the study. The results of PLS analysis are given in Table 4. For this purpose, the descriptors have been autoscaled (zero mean and unit s.d.) to give each one of them equal weight in the analysis. In the PLS cross-validation, two components have been found to be the optimum for these 12 descriptors and they explained 91.4 percent variance in the activity (r 2 = 0.914).The MLR-like PLS coefficients of these 12 descriptors are given in Table 4. The calculated activity values of training-and test-set compounds are in close agreement to that of the observed ones and are listed in Table 3. For the sake of comparison, the plot between observed and calculated activities (through PLS analysis) for the training-and test-set compounds is given in Figure 1. Figure 2 shows a plot of the fraction contribution of normalized regression coefficients of these descriptors to the activity (Table 4).  (Table 4) remained inferior in explaining the activity of the analogues.
QSAR rationales, with the same test-set used earlier for the analysis of PPAR binding activity, have also been obtained for other reported activity profile pertaining to hPPARand hPPARtransactivation. A descriptor pool of 39 and 67 relevant descriptors for hPPARand hPPARtransactivation, respectively, were subjected to CP-MLR analysis. CP-MLR resulted a total number of 08 models in two parameters sharing 9 descriptors for hPPARactivity. For the hPPAR activity 15 three parameters models sharing 18 descriptors were obtained. The shared descriptors along with their physical meaning, average regression coefficient and total incidences for both the analysis have been given in Table 2.
The selected models emerged through CP-MLR are mentioned below.
Newly appeared descriptors IC1 and T(N..N) are topological class descriptors whereas descriptor MATS7m belong to 2D-autocorrelations (2D-AUTO) class. The 2D-AUTO descriptors, ATSke, GATSke and MATSke have their origin in autocorrelation of topological structure of Broto-Moreau, of Moran and of Geary, respectively. The computation of these descriptors involves the summation of different autocorrelation functions corresponding to the different fragment lengths and lead to different autocorrelation vectors corresponding to the lengths of the structural fragments. Also a weighting component in terms of a physicochemical property has been embedded in these descriptors. As a result, these descriptors address the topology of the structure or parts thereof in association with a selected physicochemical property. In these descriptors' nomenclature, the penultimate character, a number, indicates the number of consecutively connected edges considered in its computation and is called as the autocorrelation vector of lag k (corresponding to the number of edges in the unit fragment). The very last character of the descriptor's nomenclature indicates the physicochemical property considered in the weighting component -m for atomic mass, e for atomic Sanderson electronegativity and p for atomic polarizability -for its computation.
All the descriptors, participated in Eqs. (8) to (11), have shown negative correlation to activity as evinced from the signs of the correlation coefficients thus lower values of information content index of 1 st order neighborhood symmetry (descriptor IC1), sum of topological distances between N..N (descriptor T(N..N)), maximal electrotopological positive variation (descriptor MAXDP), average molecular weight (descriptor AMW) and Moran autocorrelation of lag-7/ weighted by atomic masses (descriptor MATS7m) would be beneficiary to the hPPARactivity.
The derived statistical parameters models have revealed that these models are statistically significant. The values greater than 0.5 of indices q 2 LOO and q 2 L3O have accounted the internal robustness of models and the r 2 Test values greater than 0.5 are accountable for external validation.
These models are able to estimate up to 89.36 percent of variance in observed activity of the compounds. These models were, therefore, used to calculate the activity profiles of all the compounds and are included in Table 5 for the sake of comparison with observed ones. A close agreement between them has been observed. Considering the number of observation in the data set for the hPPAR transactivation profile, models with up to three descriptors were explored. Following are the selected three-descriptor models, obtained from CP-MLR, for the hPPAR transactivation.  In all above equations (12) to (15) the F-values remained significant at 99% level. The values, greater than 0.5, obtained for the indices q 2 LOO, q 2 L3O, and r 2 Test ascertained the internal robustness and external validation of the models.
These models are capable to explain up to 90.40 percent of variance in observed activity of the compounds. The derived statistical parameters are in tune to statistical significance. The activity profiles of all the compounds calculated using these equations are in close agreement to the observed ones and the same are included in Table 5. Descriptor HNar, corresponds to Narumi harmonic index, is a topological class descriptor. The positive contribution of descriptor HNar suggested that a higher value of it would be supportive to the activity. The other participated descriptors are nCt (from the functional group class), and C-006 and C-008 (from the atom-centered fragments). Number of total tertiary C(sp3) (descriptor nCt) and CHR2X type atom centered fragment (descriptor C-008) correlated positively to the activity suggested that a higher value of these will augment the activity. On the other hand negative correlation of descriptor C-006 advocated that CH2RX type structural fragments would be detrimental to the activity.

Applicability domain
On analyzing the applicability domain (AD) in the Williams plot (Figure 3) of the model based on the whole dataset ( Table 7), none of the compound has been identified as an obvious 'outlier' for the PPAR binding activity if the limit of normal values for the Y outliers (response outliers) was set as 3×(standard deviation) units.  Figure 3 Williams plot for the training-set and test-set for binding affinity of PPAR for the compounds in Table 1. The horizontal dotted line refers to the residual limit (±3×standard deviation) and the vertical dotted line represents threshold leverage h* (= 0.6).
Compounds 2 and 17 found to have leverage (h) values greater than the threshold leverage (h*) suggesting them as chemically influential compounds. For both the training-set and test-set, the suggested model matches the high quality parameters with good fitting power and the capability of assessing external data. Furthermore, all of the compounds were within the applicability domain of the proposed model and were evaluated correctly.

Conclusion
This study has provided a rational approach for the development of tetrahydroquinoline derivatives as Derived statistical significant models for hPPAR transactivation activity revealed that lower values of information content index of 1st order neighborhood symmetry (descriptor IC1), sum of topological distances between N..N (descriptor T(N..N)), maximal electrotopological positive variation (descriptor MAXDP), average molecular weight (descriptor AMW) and Moran autocorrelation of lag-7/ weighted by atomic masses (descriptor MATS7m) would be beneficiary to the hPPARactivity. Role of atomic van der Waals volumes and electronegativities to explain the hPPAR transactivation activity is evinced through participation of descriptors MATS5v, MATS8e, GATS6e and Me. Additionally a higher value of Narumi harmonic index (HNar), number of total tertiary C(sp3) (descriptor nCt), presence of CHR2X type atom centered fragment (descriptor C-008) and absence of CH2RX type structural fragments (descriptor C-006) will augment the hPPAR transactivation activity.