Statistical modeling of antifungal activity of substituted benzo-1-thia- and selenium diazoles

Statistical methods revealed a close relationship between antifungal activity (suppression of the growth of the mycelium of the fungi of the species Venturia inaqualis, Aspergillus niger and Fusarium moniliforme) and the molecular structure of the substituted benzo-1-thiadiazoles and benzo-1-seleniumdiazoles. The molecular pseudopotential is used as the explanatory variable in the regression equations. It was established the threshold dependence of the antifungal activity on the pseudopotential of the molecule. The explanatory variable exceeding of its threshold value leads to a rapid linear increase in bioactivity for all species of fungi. Antifungal activity of drugs is low to the threshold value of the molecular pseudopotential and does not have a significant relationship with the variability of the molecular structure of the chemical compounds.


Introduction
The antifungal activityofbenzo-1-thiadiazoles, which has been detected at various test sites, is well known. [1][2][3]. However, there is no a unified method for the primary testing of antifungal drugs. [4]. This makes it difficult to identify the quantitative relationship between the chemical structure of compounds and their biological effects. This article analyzes by statistical methods of antifungal activity of the benzo-1-thia-and selenium-diazoles series(suppressing mycelium growth of the fungi of the species Venturia inaqualis (Ve), Aspergillus niger (As) and Fusarium moniliforme (Fu)) [5]. This will allow significant molecular parameters (explanatory variables) of chemical compounds for the purposes of predicting new drugs, within the framework of the heterocycles under study (Fig. 1). In addition, these studies will make some assumptions about the mechanism of biological action of drugs. The search for the connection between the molecular structure of a chemical compound and its bioactivity will be based on the idea that the objects of study possess some effective electrostatic molecular potential approximated by pseudopotential (1) and (2). Molecular potential can affect the biological system, thereby interfering with the mechanisms regulating the processes of vital activity, that is, determine the biological activity of a chemical compound.

Material and methods
The calculation of the real molecular potential is associated with complex quantum chemical calculations, which greatly complicates the construction of a practically convenient mathematical model. At the same time, the pseudopotential method allows one to reliably reproduce many properties of condensed media. Molecular pseudopotential is determined by the sum of the model potentials of the atoms forming the molecule [6].
Here it is proposed to use the average number of electrons on the outer shell of atoms in a molecule for the purpose of identifying the relationship of the biological action of chemical compounds with their molecular structure: here i n is the number of atoms of the i-th kind; i Z is the number of electrons in the outer electron shell. Summation is performed over all atoms in the molecule;   i i N n is the total number of atoms. Model pseudopotential [6] can be written as follows where Z is determined by equation (1); ) (r f and ) (r F are corrections [6] to the Coulomb potential, which depend on the distance r between the core of the molecule and the electron; there is an electron charge, RM is the radius of the scattering center. It can be shown that the parameter Z is a common factor for the pseudopotential (2) [7,8].
The method of model molecular pseudopotential assumes that only electrons are taken into account on the outer (valence) shell of the scattering center. It is well known that the chemical properties of molecules are determined by a small group of outer shell electrons. The properties of the remaining electrons have almost no impact on the chemical processes in which the molecule participates. This approximation is sometimes referred to as the "frozen core approach". Valence and core electrons are significantly separated not only by the energy scale, but also spatially. Moreover, changes in one electronic subsystem have little effect on changes in another electronic subsystem. In this approximation, the external electrons do not move in the real Hartree-Fock force field of the molecule, but in a much weaker field of the pseudopotential. The parameter relating to the variation of the potential in molecules is the average number of valence electrons per atom in the molecule. This is certainly an inference that will be used in our research.
In accordance with the pseudopotential model, the average number Z of electrons on the outer electron shells of atoms in a molecule will be used as a molecular trait.
Let us analyze the group of drugs presented in Table 1. This group includes chemical compounds having only one substituent in the R 1 position of the benzene ring.
Here S is the standard deviation; Fu min and Fu max , min Z and max Z are the minimum and maximum values of attributes in the samples; Fu av is the mean value of the Fu-activity. Z av is the mean value of the explanatory variable; τ is the Grubbs criterion; f is the number of degrees of freedom. From inequalities (3) it follows that the sets Fu and Z are homogeneous at the 95% confidence level. Using the attribute Z as an explanatory variable, the following rectilinear regression was obtained: [11], RMSE1 = 8.94, From statistics (4) it follows that the regression coefficients are statistically significant at a 95% confidence level. The correlation coefficient and the Fisher criterion cannot be interpreted as a random deviate from zero. Similar statistical relationships can be obtained for the Ve and As bioactivities. Ve and As bioactivities correlate with Fu bioactivity: Relationships (4) and (5) suggest that the intensity of the suppression of the fungi mycelium growth Venturia inaqualis, Aspergillus niger and Fusarium moniliforme by chemical compounds depends on the magnitude of the explanatory variable Z and, apparently, is carried out for Fu, Ve and As bioactivities uniformly.
The explanatory variable Z characterizes the molecule as a whole, while the active region of the drug molecule, presumably participating in the interaction with the biosystem, is uncertain. Therefore, it is of interest to perform a regression equation analysis in which the variable Zsub is used as an explanatory variable. The variable Zsub is calculated by the formula (1) for electrons of substituent. For example, the following rectilinear regression was obtained for Fu bioactivity: The population of elements of the Zsub is homogeneous and normally distributed: Regression (6) as well as regression (4) are statistically significant. Similarly, we can be obtained the corresponding regressions for the bioactivities of Ve and As. This result indicates that the active region in the molecules is a substituent at the R 1 position of the benzene ring. Apparently, it is the substituent at the R 1 position interacts with the biosystem. Since the value of Zsub characterizes the average number of valence electrons of a substituent, then most likely the area of the biosystem with which the drug interacts has a positive charge.
It is possible in the regression (6) to simultaneously take into account two explanatory variables Z and Zsub. As the analysis showed, this leads to an improvement in the quality of the regression. However, these variables are collinear for the studied series of chemical compounds ( Table 1). The correlation coefficient between the variables is equal to r = 0.98. It is known [12] that one of the variables should be excluded from the regression equation if the correlation coefficient r is greater than 0.8. One of the variables can be replaced by the difference Z -Zsub. Sometimes this transformation allows you to get rid of collinearity. However, collinearity is retained for the chemical compounds of Table 1. For example, the Farrar-Glauber criterion [13] indicates a presence of collinearity between explanatory variables. (m = 2): is the number of degrees of freedom; m is the number of explanatory variables. This criterion can be applied only for a standard normal distribution of residues (W = 0.902 > ) 9 ( If there is a collinearity between the explanatory variables, the estimates of the regression coefficients can be, firstly, unreliable and, secondly, sensitive to sample data. For sulfur containing compounds the variation of bioactivities depending on the magnitude of the explanatory variable Z can be approximated by nonlinear regression. Figure 2.A demonstrates the non-linear dependence of the bioactivity Ve on the change in the value of the explanatory variable Z. The nature of the nonlinear change in bioactivity indicates the possible existence of a threshold biological action of drugs. A marked increase in the explanatory variable Z from a value of 3.0 to a value of ~ 3.65, which can be taken as a threshold value of Z thr , is accompanied, firstly, by a low bioactivity (~ 10%) of chemical compounds, and, secondly, by a weak variability of bioactivities of Ve, As and Fu. At the same time, if the explanatory variable Z exceeds its threshold value, an intense linear increase in the bioactivity is observed. (Figures 2.B, 2.C and 2.D). A linear regression equation indicates a close relationship between bioactivity and molecular factor Z for the range of Z ≥ Z thr : For small samples (N < 15), the best estimate of the correlation coefficient is the ratio [14]: For example, from the relation (11) we obtain the following value R * = 0.999 for the regression (10). All substituents for which Z ≥ Z thr are electron acceptors (sulfur containing compounds).
Using as an explanatory variable Zsub does not change the significance of the detected rectilinear dependencies.
This result does not contradict the conclusion (6) that the active center in the molecule is precisely the atom or group of atoms in the position R 1 of the benzene ring. Similar linear regressions can be written for Ve and As activities. It can be noted that for all three bioactivities an intensive linear growth of activity begins after the factor Z reaches the boundary (or threshold) value of Z = thr sub Z .
Let us check whether the regression dependences Ve(Z) (8) and Fu(Z) (9) differ significantly. Perform a comparison of regressions for these activities. First, we will compare the residual variances: FVe/Fu= 4.21 2 /3.92 2 = 1.15 < That is, the residual variances do not differ at the 95% confidence level. Further, we compare the regression coefficients that characterize the slope of straight lines: Here we used a summary estimate of residual variances The standard deviation SZ = 0.156 was used for the explanatory variable Z. It follows from the inequalities (13) and (14) that the magnitudes of the slopes of the regression lines are statistically indistinguishable. Therefore, the variability of the Ve bioactivity is the same as that of the Fu bioactivity. Similarly, we can quantitatively compare the regressions of Ve(Z) and As(Z), Fu(Z) and As(Z). Apparently, it can be assumed that the mechanism of variability in suppressing of the mycelium growth of the fungi Venturia inaqualis, Aspergillus niger and Fusarium moniliforme with benzothiadiazoles is identical. The differences in the parameters of the regression equations can be attributed to the random fluctuations of the sample data. For forecasting purposes, the linear function is more preferable than a nonlinear dependence in the region of variation of the attribute Z ≥ Z thr .
Thus, from the inequalities (13)   (17) We will compare the regressions (4) and (17) to find out how much the regression changes as the sample volume expands by adding selenium containing chemical compounds (sample volume N2 = 11) compared to the regression for sulfur containing chemical compounds (sample volume N1 = 9). We first compare the residual variances: (18) , the null hypothesis on the equality of residual variances at the 95% confidence level is accepted. Consequently, both regression lines are characterized by the same random variance, that is, a similar scattering pattern option around the lines. At the second stage of the test, it is necessary to compare the slopes of the regressions, which are determined by the regression coefficients (SZ1 = 0.254 (3) and SZ2 = 0.449 (16)). For this purpose, we use the relations (14) and (15): Consolidate assessment of residual variances is Here S1 = 8.94 (4), S2 = 7.28 (17), N1 = 9, N2 = 11. Since inequality holds (19), it can be recognized that the difference between regression coefficients ( 1 a is insignificant at a 95% confidence level. We also compare the correlation coefficients R1 = 0.85 and R2 = 0.96. To do this, we use Fisher normalizing transform z = 0.5ln ((1 + R)/(1 -R)) and the λ-criterion [12]: The inequality (21) implies the absence of a significant difference between the correlation coefficients. That is, there is no significant improvement in regression after an increase in the sample size. A similar analysis can be performed for Ve and As bioactivities.
For samples containing sulfur containing and selenium containing chemical compounds (homogeneous populations), the following significant linear regressions were obtained: Statistical comparison of regressions, for example, Fu bioactivities (12) and (23) In accordance with the definitions (14) and (15) the regression coefficients leads to the following inequality: The distinction in the correlation coefficients (12) and (23) It follows from the inequalities (25) and (26) that the regressions (12) and (23) differ insignificantly. However, it should be noted that the share of explained variations, determined by the coefficient of determination R 2 (regressions (22) -(24)), is systematically reduced compared to samples containing only sulfur containing diazoles. Moreover, adding only two selenium containing chemical compounds to the samples leads to a significant difference in residual variances for the activity of Ve (regressions (22) and (8) and for the activity of As (regressions (24) and (10) Adding selenium containing chemical compounds to the sample significantly increases the residual variance (or RMSE). Further comparison of regressions is difficult due to the lack of accurate statistical criteria if the conditions (28) set out. Apparently, the condition Z > Z thr is necessary, but not sufficient for the high bioactivity of the chemical compound. It is also important that the substituent is an electron acceptor.

Results and discussion
A comparative analysis of the relationship between the activity of sulfur containing chemical compounds and selenium containing chemical compounds with the explanatory variable Z leads to the conclusion that benzothiadiazoles and benzoseleniumdiazoles, apparently, belong to various general populations. The common feature of the analyzed drugs and the activities of Ve, As and Fu is the presence of a significant trend, namely the increase of the explanatory variable Z (or the pseudopotential of the molecule), accompanied by an increase in the bioactivity of the chemical compound.
It is possible to note the important property of substitutes which, apparently, on a qualitative level is associated with the manifestation of drug activity. So the highest activity values are characteristic for substituent (Table 1) . This series correlates with the bioactivities of Ve, As, and Fu. As is known, the acceptor properties of substituents are characterized by the position of the lowest unoccupied molecular orbital (MO). The lower the position of the level on the energy scale (relative to a molecule without a substituent), the stronger the acceptor properties of the substituent. The donor properties of the substituents are determined by the position of the energy level of the highest occupied MO. The magnitude of the donor influence of the substituents can be arranged in the following row: NH2 (Z = 3.33; Zsub = 2.33; μ = + 1.5 D) > OH (Z = 3.57; Zsub = 3.50; μ = + 1.6 D) > SH (Z = 3.57; Zsub = 3.50; μ = + 0.7 D). Apparently, the condition Z > Z thr is necessary, but not sufficient for the high bioactivity of the chemical compounds. For high bioactivity of the drug, it is also important that the substituent is an electron acceptor. This result indicates that the active region in the diazole molecules is the substituent at position R 1 of the benzene ring. It could be accepted that the substituent R 1 is the active center of the molecule. Since the value Zsub characterizes the average number of valence electrons of the substituent, it is most likely that the region of the biosystem has a positive charge. This is also indicated by the direction of the dipole moment of the chemical bond C-R 1 (see the remark to Table 1). Table 1 also lists the additional parameters of the substituent: the dipole moment μ of the C -R 1 bond [10] and the magnitude of the molar refraction MR [9] characterizing the volume size of the substituent R 1 . As analysis has shown, the use of these molecular parameters does not lead to an improvement in the quality of regression equations. This may be due to an insufficient sample size. However, it can be noted, for example, that drugs have a relatively high activity if the three-dimensional size of the R 1 substituent has a value close to MRopt ~ 7-9. This range of MR values can be taken as the optimal three-dimensional size of the substituent for the interaction of the drug molecule with the biophase region. The weakening of the activity of Ve, As and Fu is usually accompanied by either a decrease in the amount of molar refraction of the substituent or a marked excess of the MRopt size (Table 1). Such a change in bioactivity may be due to the complementarity of the substituent R 1 size to the local region of the biosystem.

Conclusion
A new effective approach to the analysis of the relationship between the antifungal activity of benzo-1-thia and selenium diazoles with their molecular structure is proposed. The method allows to predict the biological activity of chemical compounds when there is no experimental physicochemical information on the properties of chemical compounds. The proposed method allows researchers to perform a quick express analysis of the potential biological activity of new chemical compounds.