X-Informatics

Finding Structures in the Unstructured

Archive for the ‘Uncategorized’ Category

Experience form mothers can be passed to offsprings

leave a comment »

mother’s experience before pregnancy in mice can be passed to their offsprings, researches found. Basically, the researchers knockout certain genes (Ras-GRF-2) to cause them having memory defect. Then they put them in an “enriched environment”, where they receive a shock to their feet. Then they found that the offsprings of these enriched knockout mice associated the electric shock with the cage.More details at:Can experiences be passed on to offspring? – life – 06 February 2009 – New Scientist

Written by djiao

February 6, 2009 at 11:54 am

Posted in Uncategorized

Paper summary: "Do Structurally Similar Molecules Have Similar Biological Activity?"

leave a comment »

The Martin paper[1] discussed whether similarity of chemical structures could implicate similarity of biological activities. The paper first discussed their findings in previous research led to their conclusion that when designing diverse combinatorial libraries, they would not considering compound that is gt;= 0.85 similar to a compound in their collection. However, other studies had results contradicts it. In this study, the authors used different methods to demonstrate how frequently molecules similar in structure have similar biological activities, and whether they should revise their previous conclusions. They used an IC50 dataset and the MAO dataset to quantify the fraction of chemicals with similar biological activities when they were similar. In their study, they have considered bias in their data, such as not including all active compounds, number of bits in fingerprints. The results showed that at gt;=0.85 Tanimoto similarity in Daylight fingerprints, only 30% of compounds similar to an active compound are active. They came to the conclusion that a redesign of strategies for compound acquisition: not all compounds that have a similarity over 0.85 should be excluded. Small portion of them should be included in the libraries. This paper reviewed many related researches, and clearly demonstrated the prefect results cannot be expected with the statistical nature of library design. However, it doesn’t quantify their suggestions of how much percent of the compounds with gt;=0.85 similarity should be considered and how they should be selected.br/br/1. Martin, Y.C. et al, “Do Structurally Similar Molecules Have Similar Biological Activity?”, iJ. Med. Chem./i, b2002/b, i45/i, 4350-4358

Written by djiao

January 16, 2009 at 1:20 pm

Posted in Uncategorized

Paper review: "Using Reaction Mechanism to Measure Enzyme Similarity"

leave a comment »

The O’Boyle paper described novice quantitative methods to measure the similarities of reactions based upon their mechanisms. The first method was based on fingerprints (FP) calculated from a list of 58 features that described the mechanism steps. After steps were encoded using these features and normalized, they could be compared with similarity comparison. The second method was based on bond changes (BC) occurred in a mechanism step, and the similarity was measured with Tanimoto coefficient. The similarity between two reactions was calculated by pair wise aligning the two mechanisms based on their steps. The results of the two methods displayed reasonable overlaps in finding the most similar enzyme reactions when applied to MACiE dataset. Comparing to the Enzyme Classification (EC), these methods were able to find similarities between reaction mechanisms that appear in different EC classes, as illustrated by a few examples given by the paper. The paper clearly discussed the implication of both the FP and MC methods and explained the features used in the FP methods and described how they can classify chemical reaction mechanisms, as well as what are specific to enzyme catalyzed reactions. I think one interesting discussion could be whether the number of steps (or the complexity of the mechanism) of the mechanism can play a role in the similarity measuring with the two methods. Another extension from this paper might be using these two methods, or especially the features in the FP methods, to cluster the enzymes in the MACiE database and compare the result to the EC classification. br /br /1. O’Boyle, N.M. et al, “Using Reaction Mechanism to Measure Enzyme Similarity”, J. Mol. Bio., 2007, 368, 1484-1499 (10.1016/j.jmb.2007.02.065)

Written by djiao

January 16, 2009 at 1:16 pm

Posted in Uncategorized

Paper Summary: "Discriminating between drugs and nondrugs by prediction of activity spectra for substances (PASS)"

leave a comment »

The Anzali paper1 uses PASS (prediction of activity spectra for substances) system to predict drugs and nondrugs. They used the a subset of the World Drug Index (WDI) and Available Chemicals Directory (ADS) as their training set. They first built biological activity spectrum based on SAR analysis of a large data set (> 30k compounds). They then used 5k compounds in WDI and 5k compounds in ADS to train the prediction model based on the biospectra. The method was evaluated using testing set from MDDR. The accuracy of their prediction model is 80% when testing with LOO(leave-one-out). When testing with a dataset of all drugs, the accuracy is 73.4% or 78.5% when the drugs being investigated removed from the testing set. When testing with nondrugs, the accuracy is 83.8%. The authors also showed that when testing with drugs from top-100 prescription pharmaceuticals, 87.5% were correctly predicted. 1. Anzali, S. et al. Discriminating between Drugs and Nondrugs by Prediction of Activity Spectra for Substances (PASS). J. Med. Chem. 44, 2432-2437(2001).

Written by djiao

November 11, 2008 at 12:31 pm

Posted in Uncategorized

Paper summary: "Biological spectra analysis: Linking biological activity profiles to molecular structure"

leave a comment »

The Fliri paper1 discussed a molecular property descriptor that uses biological activity profiles. The descriptor is obtained by using the percent inhibition values at single drug concentration in bioassays that represent a cross section of the proteome, or as named by the authors, biological spectra. The authors uses the percent inhibition values of 1567 structurally diverse compounds and 92 ligand-binding assays from the BioPrint database to construct the spectra. Then they use cosine correlation to calculate the biospectra similarities between compounds. By using the biospectra similarities, they searched the dataset with clotrimazole and tioconazole as entry points. The two compounds were chosen because their target was not in the 92 assays in the biospectra. Both searches returned structurally similar compounds when the similarity ranking is over 0.8. They also compared spectra by hierarchically clustering the datasets by using confidence in cluster similarity. It also identifies structurally similar compounds when CCS values are over 0.8. They authors also used structure information of molecules to predict the biospectra. They added four compounds that were not in the 1567 compounds dataset and reclustered the new 1571 datasets. By comparing the structures of the compounds in the new clusters where the four new compounds were placed, the authors showed that biospectra can be very precise in describing the molecular peroperties and is capable of differentiating molecules on the basis of single atom pair differences. I think activity cliff can also be identified by comparing the biospectra of compounds. However, I am not very clear on how to build such biospectra with new compounds without the assay data. 1. Fliri, A. F., et al.,”Biological Spectra Analysis: Linking Biological Activity Profiles to Molecular Spectra”, Proc. Natl. Acad. Sci., 2005, 102, 261-266

Written by djiao

November 10, 2008 at 5:34 pm

Posted in Uncategorized

Fibonacci numbers (Recoginizing patterns from data)

leave a comment »

If looking at this picture, what do you see? A plant?Picture from reference 1How about now? A tree with number of branches in the same level:  1, 2, 3, 5, 8, 13 ….Picture from reference 1Those are Fibonacci numbers.Can we model it differently? How about a tree structure2?


b
|
a
_|_
a b
_| \
a b a
_| | |_
a b a a b
_| | |_ |_ \
a b a a b a b a

Or, more precisely, a model described by the following2:


Initial State: b
b -> a
a-> ab

At what level did we exactly subtract knowledge? I think the picture is our data; and the numbers, or the tree are information; when we built the model, or when we know that the numbers are Fibonacci numbers, that’s when we subtracted knowledge – “the number of branches in any stage of the sneezewort plant growth is a Fibonacci number, and can be represented by such a model… However, even when we built the model, we might still not know that this model will generate a Fibonacci number. That, as I understand, shows that knowledge have different levels – as we might think, the Fibonacci numbers may represents something we haven’t found out, or there are some patterns hidden in this number, similar to we saw in the movie Pi. By the way, the plants does exist in nature, it’s called sneezewortReference:1. Fibonacci Numbers in Nature. at http://britton.disted.camosun.bc.ca/fibslide/jbfibslide.htm2. i501_lecture7_slides.pdf. at http://www.informatics.indiana.edu/rocha/i501/pdfs/i501_lecture7_slides.pdf

Written by djiao

November 10, 2008 at 2:08 pm

Posted in Uncategorized

Paper Summary: "Image-based multivariate profiling of drug responses from single cells"

leave a comment »

The Loo paper1 presented an drug response profiling method that addresses three challenges in data analysis of high content screening (HCS): 1. transforming HCS measurements data to multivariate profiles that are interpretable; 2. feature selection and reduction; 3. how to determine dosage range and multiphasic drug response. In their approach, they uses a support vector machine (SVM) with linear kernal function to determine a hyperplane between treated and control distributions of measured data extracted from images. The unit vector and the hyperplanes are used as compound dosage profiles. Then they used an SVM recursive feature elimination (SVMRFE) algorithm to remove features that are redundant and less interpretable. Then they extracted dosage profile (d-profile) based on titration clustering, which can reveal that multiphasic drug effects. Finally they used the d-profiles in applications such as drug screening, phenotypic change detection and category prediction.

Written by djiao

November 5, 2008 at 12:11 am

Posted in Uncategorized

Paper Summary: "Integrating high-content screening and ligand-target prediction to identify mechanism of action"

leave a comment »

The Young paper1 integrated high-content screening (HCS) predictions with similarity based prediction methods to investigate SARs. They first tried to solve the following problem: how to reduce the large size of data created by high content screening (HCS), and interpret the meanings of the measurements. The method they used is called factor analysis. It is based on the supposition that in a multivariate dataset, highly correlated variables are likely to be measuring a common trait, or “factor”. They applied this method to high-content image data, created by an HCS assay to identify compounds that affect cell proliferation. The factor analysis reduced the 36 cytological features in the dataset to 6 factors, and each of them can be biologically interpreted. The authors also used these factors to create phenotypic profiles of the active ligands in the screen. The authors then compared results of clustering results based on the phenotype factors with clustering results based on structural similarities, and found the correlation between the two results statistically significant, and it indicates the molecular similarity principle can be applied to their phenotype compound profiling. The author then used the comparison to find activity cliffs. They compared Tanimoto similarities with phenotypic distances between each pair of compounds, and identified 4% of their active compound sets to be structurally similar but dissimilar in biological activities (activity cliffs). Then the authors discussed a few phenotypic SARs examples. They also compared the correlation between the phenotypic similarity matrix and the compound target similarity matrix, which was generated from on predictive models based on the WOMBAT database, and found that the correlation is twice statistically significant comparing to the correlation between the phenotypic similarity matrix with chemical structural similarity matrix. This shows the potential use of phenotypic measures to target predictions. 1. Young, D.W. et al. Integrating high-content screening and ligand-target prediction to identify mechanism of action. Nat Chem Biol 4, 59-68(2008).

Written by djiao

November 4, 2008 at 6:46 pm

Posted in Uncategorized

Paper Summary: "Image-based chemical screening"

leave a comment »

The Carpenter paper1 is a review of the overall methods and individual steps for conducting a high content screening(HCS). First the author briefly introduced the key concepts of visual assays and its development to HCS. In visual assays, cells are labeled flurescently and can be observed using a microscope of their phenotype changes. The recent development of automated microscopes and cell image analysis software have made visual assays to a large scale that is compatible with HTS. The author then discussed the following HCS components in detail:
1. Assay development: One goal of the assay development is to label protein with fluoreescent protein to probe the phenotype of interests.
2. Image acquisition and storage: The microscopy instruments have become increasingly automated, but attention needed for critical issues such as experiment environment control, image format. Image storage is another critical issue as a large image set could require TB spaces. The author listed a few image storage repository softwares and services in image screening centers as solutions to this issue.
3. Image analysis: this is required to extract quantitative measurements from images. The current available commercial software can process cell counting, protein expression, translocation and neurite outgrowth but the results can be less reliable in certain circumstances. Open-source software are also being developed to address these issues generically but it requires long term development. Finally the author pointed out that data quality, speed and flexibility, cost and ease of use are parameters to consider when choosing a image analysis software.
4. Data analysis ad exploration: The quantitative measurements can be used for statistical analysis. The study still need to address two major areas, how to correct biases in data caused by experiment variations, and how to select hits from the screen. Another issue needs to address is how to explore the data and discover the full spectrum of information, as currently many potentially useful data are discarded or not included in analysis. Better analysis and exploration tools are needed, and methods also need to be developed and improved to make the most of the useful information from HCS.

1. Carpenter, A.E. Image-based chemical screening. Nat Chem Biol 3, 461-465(2007).

Written by djiao

November 4, 2008 at 3:38 am

Posted in Uncategorized

Paper Summary: "What do we know and when do we know it?"

leave a comment »

The Nicholls paper1 studied how to evaluate two essential aspects of virtual screening: experimental design and performance metrics. First the author claimed that for any prediction study, four elements to be evaluate:

  1. whether it is well designed
  2. what metrics are used for evaluation
  3. significance, or error analysis
  4. whether the results are particular or general

He pointed out that virtual screening research lack of adequate consideration on any of these issues, but he expected the situation will be better as people became more educated in these issues. The author then discussed the experimental design evaluation. First he discussed how to choose decoys, which is a type of intensive properties, when designing a retrospective screen. He listed four types of decoy sets: universal, drug-like, mimetic and modeled. Universal decoys are any compounds available to be physically screened, drug-like decoys are drug like compounds, and mimetic decoys are similar decoys while modeled decoys are specific to methods and models. He suggests to use a decoys set that contain all the above categories with careful labeling of individual intent. For extensive properties, the author first pointed out several points based on quantitative analysis, 1. the number of actives are important, but the number of inactives are not, when calculating properties of a single system. 2. when testing a method against other methods with 95% confidence level, the number of systems required is very large. 3. the number of actives per target doesn’t need to be large. He then discussed the importance of the study of correlations between methods. He discussed several metrics for evaluating the performance of an experimental design: AUC, ROC, BedRoc. He showed that none of these methods could perfectly solve recognition problem. However, he also pointed out that BedRoc is a better method comparing to the others. 1. Nicholls, A. What do we know and when do we know it? Journal of Computer-Aided Molecular Design 22, 239-255(2008).

Written by djiao

October 29, 2008 at 12:12 pm

Posted in Uncategorized

Follow

Get every new post delivered to your Inbox.