No job name

J. Med. Chem. 2001, 44, 2432-2437
Discriminating between Drugs and Nondrugs by Prediction of Activity Spectra
for Substances (PASS)

Soheila Anzali,*,† Gerhard Barnickel,† Bertram Cezanne,† Michael Krug,† Dmitrii Filimonov,‡ andVladimir Poroikov‡ Bio- and Chemoinformatics Department, Merck KGaA, Darmstadt D-64271, Germany, and Institute of Biomedical Chemistryof Russian Academy of Medical Sciences, Pogodinskaya Street, 10, Moscow 119832, Russia Using the computer system PASS (prediction of activity spectra for substances), which predictssimultaneously several hundreds of biological activities, a training set for discriminatingbetween drugs and nondrugs is created. For the training set, two subsets of databases of drugsand nondrugs (a subset of the World Drug Index, WDI, vs the Available Chemicals Directory,ACD) are used. The high value of prediction accuracy shows that the chemical descriptors andalgorithms used in PASS provide highly robust structure-activity relationships and reliablepredictions. Compared to other methods applied in this field, the direct benchmark undertakenwith this paper showed that the results obtained with PASS are in good accordance with theseapproaches. In addition, it has been shown that the more specific drug information used in thetraining set of PASS, the more specific discrimination between drug and nondrug can beobtained.
ACD database they may become drugs in the future,whereas a few compounds from MDDR and WDI will In the past decade the drug discovery process has changed dramatically. The challenge to identify novel Because of the lack of discrimination among struc- leads has driven the need for automated systems that tural features for drug and nondrug compounds, differ- can rapidly perform selection of compounds at the ent approaches have to be applied to compensate. As beginning of the drug discovery process, namely in the concluded by Walters et al.,17 “future work is likely to analysis and the extension of the high throughput include additional approaches and more robust attempts screening (HTS) pool. The number of discovered hits depends on the cutoff level, e.g., 10 mM. First of all, The PASS program,18-22 which is based on a regres- the activity needs have to be confirmed and then sion approach applied to noncongeneric chemical series, followed by selectivity and functional assays.
provides highly robust predictions for more than 500 An important task is the rejection of false hits and biological activities. Since PASS is trained to recognize focus on the promising molecules. The lead molecule drugs with activities on various targets, the approach plays the pivotal role for the initiation of a lead may have potential use to discriminate drugs from optimization project. A promising lead compound with nondrugs. The purpose of this work is to evaluate the a desired pharmacological activity may have undesir- ability of the PASS approach in discriminating between able side effects, characteristics that limit its bioavail- ability, or structural features which adversely influenceits metabolism and excretion from the body.
Materials and Methods
Therefore biological activity has to be balanced with PASS Approach. The computer system PASS (prediction
“drug-like” properties, and the closer we get to a of activity spectra for substances)18-21 predicts several hun- candidate compound, the more important drug-likeness dreds of biological activities (pharmacological main and side becomes. Despite the many attempts1-11 to classify effects, mechanisms of action, mutagenicity, carcinogenicity, compounds into the “drug” and “nondrug” categories, teratogenicity, and embryotoxicity).
Biological activity results from the interaction of chemical there is no unambiguous definition for drug and non- compounds with biological entities. In clinical studies, the drug. Especially, it may vary depending the indications biological entity is the whole human organism. In preclinical or diseases considered.12 Reagent databases such as testing they are the experimental animal (in vivo) and/or the ACD,13 as an example, is often used as a model database experimental model (in vitro). Biological activity depends on for nondrug compounds, while CMC,14 WDI,15 and peculiarities of compound (structure and physicochemical MDDR16 could be seen as databases for drugs. Certainly, properties), biological entity (species, gender, age, etc.), andmode of treatment (dose, route of administration, etc.).
if one could consider the fate of some compounds in the The majority of biologically active compounds reveal often a wide spectrum of different effects. Some of them are useful * Correspondence: Soheila Anzali, Ph.D., Merck KGaA, Bio- and in treatment of definite diseases; others cause various side and Chemoinformatics Department, Frankfurter Str. 250, D-64271 Darm- toxic effects. The whole complex of activities caused by the stadt, Germany. Tel: +49-6151-724863. Fax: +49-6151-7233299.
compound in biological entities is called the “biological activity ‡ Institute of Biomedical Chemistry of Russian Academy of Medical The biological activity spectrum of a compound presents all its activities despite the difference in essential conditions of Discriminating between Drugs and Nondrugs by PASS Journal of Medicinal Chemistry, 2001, Vol. 44, No. 15 Table 1. Functional Groups Describing Nondrug Compoundsa
Figure 1.
their experimental determination. If the difference in species, gender, age, dose, route, etc., is neglected, the biological activity can be identified only qualitatively. Thus, “the biologi- cal activity spectrum” is defined as the “intrinsic” property of a compound depending only on its structure and physicochem- The prediction of this spectrum by PASS is based on SAR analysis of a training set containing more than 30 000 compounds which reveal more than 500 kinds of biological activities. Therefore, PASS once trained is able to predict for a test compound all likely biological activities, which are It was shown that the mean accuracy of prediction with PASS is about 86% in leave-one-out cross-validation.21 PASS prediction accuracy exceeds more than three times the expert’s guess-work for an independent set of 33 different compounds studied as pharmacological agents, which are not included in the PASS training set.22 Recently PASS was tested in a blind mode by nine scientists from eight countries on the hetero- geneous set of 118 compounds having 138 activities, and the mean accuracy of prediction was shown to be 82.6%.23 The PASS prediction is relatively successful even in the case of rather new compounds which have nontraditional structures and/or belong to new chemical classes. Like any other ligand- based design approach, PASS cannot predict the affinity for a Minimum frequency of a certain functional group is indicated new targets, but even in that case PASS points to possible side in parentheses; in all other cases it is 1. Compounds with MW < effects which may also prevent the application of a drug 150 were also classified as nondrugs.
Besides this SAR-base available in PASS, it is also possible approach could provide a reasonable discrimination between to create other SAR-bases or to enlarge it.
drugs/nondrugs, the expected results should be better for the Activities Description. In this work, the investigated
activity is “drug”, so the compounds from WDI and the CipslineDB were described as drugs and the compounds from the other As an example for a nondrug data set, we prepared 9737 compounds (ND) from a supplier database of approximately Chemical Structure Description. We described in detail
57 000 commercially available compounds. A compound was the substructure descriptors called “multilevel neighborhoods identified as nondrug by the analysis of 60 different functional of atoms” (MNA) in a paper published recently.24 MNA groups/fragments. Most of them are reactive groups, which are descriptors of a molecule are based on the 2D representation unfavorable for drugs. Some examples of such groups are of its structure. According to the valences and partial charges of the atoms, hydrogens are included, whereas bond types are In addition, all compounds with a molecular weight less not explicitly specified. An MNA descriptors set is subdivided than 150 Da were classified as nondrugs.
on levels and generated recursively. A zero-level MNA descrip- As an independent evaluation set of drugs (TOP-100), we tor describes the atom itself. Any next level MNA descriptor use a list of top-100 prescription pharmaceuticals26 (Table 2).
is the substructure notation A(D
Twelve of these entries are biopolymers and were not included 1D2.), where A is the atom
A descriptor, and D
i is the previous level MNA descriptor of
ith neighbor atom for atom A. For example, for carbon(3) in Computation Time. The calculation time on a PC (Pen-
Figure 1, the MNA descriptors are as follows: first, “C”; second, tium 2; 300 MHz; 128 Mb RAM) for the prediction of one “C(CCCC)”; third, “C(C(HHHC)C(HHHC)C(HHCN)C(HH- compound is 4 ms, which demonstrates the ability of PASS to handle huge data sets, as they are used, for example, in the Different stereoisomers of a molecule have identical MNA analysis of virtual libraries or supplier databases.
descriptors and are considered as equivalent molecules inPASS. The use of MNA descriptors in PASS for prediction is Results and Discussion
described in the Appendix. In the present version of PASS,up to second level MNA descriptors are used.
Training of PASS. The results of a leave-one-out
Databases Used for the Training and Evaluation of
cross-validation (LOO), which characterizes the quality PASS. To compare the PASS ability in discriminating drug-
of obtained structure-property relationships, are shown like compounds and nondrugs with the recently published in Table 4, no. 1. The quality of the prediction is results of Sadowski and Kubinyi,3 we used the same subsets described by the percentage of false classification.
of WDI and ACD compounds for the training of PASS. Thesesubsets include 5000 compounds from WDI (“drugs”) and 5000 During model building (including LOO cross-validation), compounds from ACD (“nondrugs”). This data set was also the quality is expressed as the mean error of prediction (MEP). The mean accuracy for prediction in the LOO To evaluate the method we prepared several test sets. As a cross-validation is about 80%, which is slightly less than sample of drug compounds we extracted two data sets from in the current version of PASS applied for the prediction the Cipsline database,25 which is a subset of MDDR.16 The first of the biological activity spectra, but which is still subset includes all launched, registered, and investigatedcompounds (LRID). At the second stage, in order to focus on satisfactory to discriminate between drugs and non- real drug compounds, we extracted the subset of Cipsline with drugs. Such accuracy of prediction is comparable to the just launched and registered compounds (LRD). If the PASS results obtained by Sadowsky and Kubinyi.3 Journal of Medicinal Chemistry, 2001, Vol. 44, No. 15 Table 2. Evaluation Set Based on the List of Top-100 Drugs
a Pa scores representing probability belong to this therapeutic class.
Evaluation of PASS vs “Drugs”. Formally the first
pounds. A total of 4514 (73.4%) compounds were pre- test set (LRID) includes 7468 presumed drug com- dicted as drugs and 1634 (26.6%) compounds as non- pounds. Their structures were checked for being present in the training set yielding 632 compounds. These There exists no independent criteria to be sure that compounds were eliminated from the test set, as were some compounds predicted as nondrug will not become 688 compounds which had no connection table fields or drugs in the future; therefore we eliminated all the had errors in structural formulas (invalid compounds).
investigated compounds from the LRID set. The re- After filtering, the final test set contained 6148 com- maining 1184 compounds were launched and registered Discriminating between Drugs and Nondrugs by PASS Journal of Medicinal Chemistry, 2001, Vol. 44, No. 15 Figure 2. Distribution of predicted scores Pa for drugs (black) and nondrugs (white): a, WDI/ACD training set and LRID test
set (Table 4, no. 2); b, WDI/ACD training set and LR test set (Table 4, no. 3); c, WDI/ACD training set and ND test set (Table 4,
no. 4); d, WDI/ACD training set and TOP-100 test set (Table 4, no. 5); e, LR/ND training set and TOP-100 test set (Table 4, no.
Table 3. Entries Excluded from Evaluation (Biologicals)
Table 4. Quality of Discriminating between Drugs and
Nondrugs by Different Methods
a LOO c-v: leave-one-out cross-validation. b MEP: maximal error of prediction in LOO cross validation.
compounds (LR) and represent real drugs. Their mo-lecular structures were again checked for presence in diction. A total of 7950 compounds (83.8%) were pre- the training set (111 compounds), and 208 compounds dicted as nondrugs and 1534 (16.2%) compounds as were removed as being invalid. A total of 864 structures drugs (Table 4; no. 4). These results show that cleaning were calculated, and 678 (78.5%) compounds were of the test set gave a higher prediction accuracy.
predicted as drugs and 186 (21.5%) as nondrugs (Table Evaluation of PASS vs Drugs from the Top-100
4; no. 3). It is obvious that the fraction of compounds List. As we suggested that most of drugs from this list
classified as drugs is higher in comparison with the first may be also included into the WDI set, all predictions test set. This can be explained by a more objective were carried out under exclusion of the equivalent definition of drug and nondrug for the second test set, compounds from the training set. For 88 compounds which provides better recognition of real drugs from the remaining from the list of top-100 prescription phar- maceuticals, 77 compounds (87.5%) were predicted as Evaluation of PASS vs “Nondrugs”. The third
drugs and 11 (12.5%) were predicted as nondrugs (Table evaluation set (ND) included 9737 compounds from different sources carefully selected as nondrugs accord- Evaluation of PASS with the Cleaned Training
ing to the criteria discussed above. After the same Set. It was interesting to see if the cleaning of the
filtering procedure, 9484 compounds were left for pre- training set could also increase the accuracy of the PASS Journal of Medicinal Chemistry, 2001, Vol. 44, No. 15 prediction. Therefore, we trained PASS with a new drug/ ni is the amount of compounds, containing descriptor nondrug SAR-base represented by the test sets LR and ND. The results of the LOO cross-validation are listed nj is the amount of compounds, revealing activity j.
in the Table 4; no. 10. It is obvious that the accuracy of nij is the amount of compounds, containing descriptor prediction is about 90%. That is significantly higher i and revealing activity j.
than in the WDI/ACD training procedure used in the nj/n is the estimate of the a priori probability of The results of prediction for the 88 compounds from nij/ni is the estimate of the conditional prob- the list of top-100 prescription pharmaceuticals were ability of the activity j for the descriptor i.
even better than in the LOO cross-validation. A total m is the number of descriptors for the compound of 84 compounds (95.5%) were predicted as drugs, while only four compounds (4.5%) were predicted as nondrugs.
0.5/m) is the regulating factor.
In Figure 2 the distributions of the numbers of drugs/ Prj is the initial estimate of the probability of the nondrugs predicted with different training and test sets activity j for the compound under prediction.
are presented versus the value of the PASS score Pa, LOO is leave-one-out procedure: for each com-
which represents the estimated probability of compound pound in the training set, the values n, ni, nj,
belonging to the class of “drugs”. It is clear that the n
ij are changed for n - 1, ni
1, and nj
1, nij
discriminating ability of PASS is significantly higher - 1 when one is active, and the estimates Prj
in case of the cleaned training set, as it was obviously are calculated.
demonstrated for the test set of the top-100 prescription For the compound under prediction, the struc-
ture descriptors are generated.
For each activity, the following values are calculated: The discrimination between drug and nondrug is facing three problems: (i) not well-defined databases, (ii) choice of a method to discriminate, and (iii) the selection of appropriate descriptors.
The widely used databases for the discrimination s0j)/(1 - sjs0j))/2 between drugs and nondrugs are relatively noisy: some compounds assigned as drugs are nondrugs in reality For each compound in the training set, the LOO and vice versa. Since this problem lies in the nature of the complex term “drug-likeness”, there seems no simple way to overcome the underlying problem.
j(CP) is the estimate of the first kind of error Our experiments provide the evidence that informa- ESj(CP) is the estimate of the second kind of error tion-guided selection of the data sets gives higher accuracy in discrimination between the classes of drug- like compounds and nondrugs. The high value of predic- The first kind of error is fixed when the compound tion accuracy shows that the chemical descriptors and under prediction actually is active but Pr < algorithms used in PASS provide highly robust struc- The second kind of error is fixed when the compound ture-activity relationships and reliable predictions on this basis. Compared to other methods applied in the For each activity, the estimates of EFj(CP) and ESj- field, the direct benchmark undertaken with this paper showed that the results obtained with PASS are in good The cutting points CPj* which gives equality: EFj(CPj*) ) ESj(CPj*) are calculated.
Since no specific adaption of the prediction scheme The maximal error of prediction MEP is as follows: implemented in the PASS program was required, the EFj(CPj*) ) ESj(CPj*) advantage of the PASS approach lies in the fact thatonly two annotated data pools for drug and nondrug cases are necessary to allow a reliable prediction of The probability to be active is Paj discrimination of given features. So the PASS methodol- The probability to be inactive is Pij ogy opens the door to include more specific drug Pa (Pi) can be considered as the probability of the information in order to get a more specific discrimina- first (second) kind of errors for the compound under tion. This may also be extended to physical-chemical prediction or as the probability of the compound properties as well as the interplay of those properties belonging to classes of active (inactive) compounds, with dedicated pharmacological properties.
Acknowledgment. We are sincerely grateful to Jens
Sadowski (AstraZeneca) for providing us with subsets (1) Cummins, D. J.; Andrews, C. W.; Bentley, J. A.; Cory, M.
from WDI and ACD, which were used as an initial Molecular Diversity in Chemical Databases: Comparison ofMedicinal Chemistry Knowledge Bases and Databases of Com- mercially Available Compounds. J. Chem. Inf. Comput. Sci.
1996, 36, 750-763.
Appendix: Mathematical Method
(2) Ajay; Walters, W. P.; Murcko, M. A. Can We Learn To Distin- guish between “Drug-like” and “Nondrug-like” Molecules? J. Med. Chem. 1998, 41, 3314-3324.
(3) Sadowski, J.; Kubinyi, H. A scoring Scheme for Discriminating n is the total amount of compounds in the training between Drugs and Nondrugs. J. Med. Chem. 1998, 41, 3325-
Discriminating between Drugs and Nondrugs by PASS Journal of Medicinal Chemistry, 2001, Vol. 44, No. 15 (4) Gillett, V. J.; Willett, P.; Bradshaw, J. Identification of Biological (17) Walters, W. P.; Ajay; Murcko, M. A. Recognizing Molecules with Activity Profiles Using Substructural Analysis and Genetic Drug-Like Properties. Curr. Opin. Chem. Biol. 1999, 3, 384-
Algorithms. J. Chem. Inf. Comput. Sci. 1998, 38, 165-179.
(5) Ghose, A. K.; Viswanadhan, V. N.; Wendolowski, J. J. A (18) Filimonov, D. A.; Poroikov, V. V.; Karaicheva, E. I.; Kazaryan, Knowledge-Based Approach in Designing Combinatorial and R. K.; Boudunova, A. P.; Mikhailovsky, E. M.; Rudnitskih, A.
Medicinal Chemistry Libraries for Drug Discovery. 1. A Qualita- V.; Goncharenko, L. V.; Burov, Yu. V. Computer-Aided Predic- tive and Quantitative Characterization of Known Drug Data- tion of Biological Activity Spectra of Chemical Substances on bases. J. Comb. Chem. 1999, 1, 55-68.
the Basis of Their Structural Formulae: Computerized System (6) Blake, J. F. Chemoinformatics - predicting the physicochemical PASS. Exp. Clin. Pharmacol. (Rus) 1995, 58, 56-62.
properties of “drug-like” molecules. Curr. Opin. Biotechnol. 2000,
(19) Filimonov, D. A.; Poroikov, V. V. PASS: Computerized prediction of biological activity spectra for chemical substances. In Bioactive (7) Teague, S. J.; Davis, A. M.; Leeson, P. D.; Oprea, T. The design Compound Design: Possibilities for Industrial Use; BIOS Sci- of leadlike combinatorial libraries. Angew. Chem., Int. Ed. 1999,
entific Publishers: Oxford, 1996, 47-56.
(20) Poroikov, V. V.; Filimonov, D. A.; Stepanchikova, A. V.; Bou- (8) Oprea, T. I. Property distribution of drug-related chemical dunova, A. P.; Shilova, E. V.; Rudnitskih, A. V.; Selezneva, T.
databases. J. Comput.-Aided Mol. Des. 2000, 14, 251-264.
M.; Goncharenko, L. V. Optimization of synthesis and pharma- (9) Wagener, M.; van Geerestein, V. J. Potential drugs and non- cological testing of new compounds based on computerized drugs: Prediction and identification of important structural prediction of their biological activity spectra. Chim.-Pharm. J. features. J. Chem. Inf. Comput. Sci. 2000, 40, 280-292.
(Rus) 1996, 30, 20-23.
(10) Clark, D. E.; Picket, S. D. Computational methods for the (21) Web site:
prediction of “drug-likeness”. Drug Discovery Today. 2000, 5, 49-
(22) Poroikov, V. V.; Filimonov, D. A.; Boudunova, A. P. Comparison of the Results of Prediction of the Spectra of Biological Activity (11) Frimurer, Th.; Bywater, R.; Naerum, L.; Lauritsen, L. N.; of Chemical Compounds by Experts and the PASS System.
Brunak, S.; Improving the odds in discriminating “drug-like” Autom. Doc. Math. Linguist. 1993, 27, 40-43.
from “non drug-like” compounds. J. Chem. Inf. Comput. Sci.
(23) Website: 2000, 40, 1315-1324.
(12) Ajay; Bemis, G. W.; Murcko, M. A. Designing Libraries with CNS (24) Filimonov, D. A.; Poroikov, V. V.; Borodina, Y.; Gloriozova, T.
Activity. J. Med. Chem. 1999, 42, 4942-4951.
Chemical Similarity Assessment trough Multilevel Neighbor- (13) ACD: Available Chemicals Directory, Version 2/97, MDL Infor- hoods of Atoms: Definition and Comparison with the Other Descriptors. J. Chem. Inf. Comput. Sci. 1999, 39, 666-670.
(14) CMC: Comprehensive Medicinal Chemistry, Version 1/97, MDL (25) Cipsline, Correlates in Pharmacostructures Online, Version (15) WDI: World Drug Index, Version 2/96; Derwent Information, (26) Pharma Business 1996, July/August, 18-53.
(16) MDDR: MDL Drug Report, Version 2/97; MDL Information


Pii: s0360-3016(99)00063-2


Outpatient warfarin management

Outpatient Warfarin Management What is warfarin? Warfarin is an anticoagulant or blood thinning agent and its efficacy can only be It is used to reduce the chance of blood clots occurring or recurring. Warfarin slows down the clotting process by interfering with the action of vitamin K, this vitamin is required for blood clotting to occur. Warfarin is often recommended for the followi

Copyright ©2010-2018 Medical Science