The Nicholls paper1 studied how to evaluate two essential aspects of virtual screening: experimental design and performance metrics. First the author claimed that for any prediction study, four elements to be evaluate:
- whether it is well designed
- what metrics are used for evaluation
- significance, or error analysis
- whether the results are particular or general
He pointed out that virtual screening research lack of adequate consideration on any of these issues, but he expected the situation will be better as people became more educated in these issues. The author then discussed the experimental design evaluation. First he discussed how to choose decoys, which is a type of intensive properties, when designing a retrospective screen. He listed four types of decoy sets: universal, drug-like, mimetic and modeled. Universal decoys are any compounds available to be physically screened, drug-like decoys are drug like compounds, and mimetic decoys are similar decoys while modeled decoys are specific to methods and models. He suggests to use a decoys set that contain all the above categories with careful labeling of individual intent. For extensive properties, the author first pointed out several points based on quantitative analysis, 1. the number of actives are important, but the number of inactives are not, when calculating properties of a single system. 2. when testing a method against other methods with 95% confidence level, the number of systems required is very large. 3. the number of actives per target doesn’t need to be large. He then discussed the importance of the study of correlations between methods. He discussed several metrics for evaluating the performance of an experimental design: AUC, ROC, BedRoc. He showed that none of these methods could perfectly solve recognition problem. However, he also pointed out that BedRoc is a better method comparing to the others. 1. Nicholls, A. What do we know and when do we know it? Journal of Computer-Aided Molecular Design 22, 239-255(2008).