The great tree of life

The great tree of life

A wonderful visualization of the tree of life

Advertisements

A lightweight JavaCC Newick parser

I could not easily find a lightweight Newick parser written in Java. Newick parsers included in bioinformatics tools seem to parse into some customized complicated tree structure, so I decide to spend a few hours to write one myself.

With JavaCC, the parser is just a few lines of code. It builds a very simple tree structure which could be easily converted to any more complex tree structure.

A binary jar file is here: https://bitbucket.org/djiao/a-lightweight-javacc-newick-parser/downloads/newick-1.0-SNAPSHOT.jar. I tested it with a newick file (downloaded from http://users-birc.au.dk/zxr/phyloprof/taxomonicTreeNCBI_bin.nwk) and it worked pretty well. To run it, simply do

> cat taxomonicTreeNCBI_bin.nwk | java -cp newick-1.0-SNAPSHOT.jar newick.NewickParser

Sources are here: https://bitbucket.org/djiao/a-lightweight-javacc-newick-parser/src

List of COGs (another example of how powerful SED is)

To get a table of COGs (Clusters of Orthologous Groups) with their category code and name, a one liner sed command can do the job:


> wget ftp://ftp.ncbi.nih.gov/pub/COG/COG/whog
> sed -n 's/\[\(.*\)\] \(COG[0-9]*\) \(.*\)/\1\t\2\t\3/p' whog > cog.tab

Download the table: cog.tab

KEGG organism code to NCBI taxonomy id mapping

KEGG organisms are assigned 3 letter codes, e.g., “hsa” for homo sapiens. To map these codes to the NCBI taxonomy ID, a one-liner sed command is sufficient:

First download by copy n’ paste from the KEGG taxonomy page:

Save it to a text file kegg_taxonomy.txt

Then sed:

sed -n 's/.*\[TAX\:\([0-9]*\)\] \[GN\:\([a-z]\{3\}\)\].*/\2 \1/p' kegg_taxonomy.txt > kegg2taxonomy.txt

Download kegg2taxonomy.txt

Install RMySQL on Windows 7 (64bit)

RMySQL 0.9.3; MySQL 5.5.28; R 2.15.1; RTools: 2.15; Windows 7 64bit

This might not be as hard as it sounds but I did spend a couple of hours on installing RMySQL. Here are the steps

  1. Install RTools if you haven’t done it. Add “C:\Rtools\bin;C:\Rtools\gcc-4.6.3\bin” in your path (Environement Variables in System Advanced Settings)
  2. Add “MYSQL_HOME=C:/Program Files/MySQL/MySQL Server 5.5” in Environment Variables. 
  3. Copy libmysql.dll from C:/Program Files/MySQL/MySQL Server 5.5/lib to C:/Program Files/MySQL/MySQL Server 5.5/bin
  4. Download RMySQL: http://cran.r-project.org/web/packages/RMySQL/index.html
  5. From command line (windows CMD, not MSys or cygwin, as it might use different gcc other than the one from RTools), run “R CMD INSTALL RMySQL_0.9-3.tar.gz”

FBA in context of absolute gene expression data

Flux balance analysis (FBA) [1] is based on the stoichiometric constraints of the metabolic reaction network, and estimates the reaction fluxes by maximizing an biological objective function, such as the biomass production. However, such biological objectives are not always valid, since cells are not always in pursuit of maximizing its own growth, especially for cells in multicellular organisms. In a recent paper published on BMC Systems Biology, Lee et. al. tries to integrate absolute gene expression into the metabolic flux prediction, and improved the predictions of experimentally measured fluxes [2].

If we recall, FBA is formulated as such:
\max_{\mathbf{v}}{Z} \\ s.t. \\ Z = \mathbf{c}^T \cdot \mathbf{v} \\ \mathbf{S\cdot v = 0} \\ \mathbf{v_l} < \mathbf{v} < \mathbf{v_u}
where \mathbf{S} is the stoichiometric matrix based on the constructed metabolic network, and \mathbf{v} is the flux vector. \mathbf{v_l} and \mathbf{v_u} are the lower bounds and upper bounds of the fluxes. Z = \mathbf{c}^T \cdot \mathbf{v} defines the objective function.

The improved FBA model proposed in the Lee paper takes the absolute gene expression (measured by RNA-seq) into account. Instead of maximizing a biological objective function, they tries to maximizing the correlation between the predicted flux and the absolute gene expression measurement. In the revised model, the objective function is to minimize:
Z = \sum_i{\frac{1}{\sigma_i}\left| v_i-d_i \right|}
where v_i is the flux of reaction i, and d_i is the reaction data by mapping the gene expression data to reaction i. \sigma_i is the error in data point i as calculated in the gene-protein-reaction mapping process. Basically, Z is the weighted sum according to the confidence in the estimate of the reaction data from gene expression data.

The authors showed that their method outperforms FBA and gene expression data based FBA extensions (GIMME [3] and iMAT [4]).

References:

1. Orth, J.D., Thiele, I. & Palsson, B.\O. (2010). What is flux balance analysis?. Nature biotechnology, 28, 245-248.

2. Lee, D., Smallbone, K., Dunn, W.B., Murabito, E., Winder, C.L., Kell, D.B., Mendes, P. & Swainston, N. (2012). Improving metabolic flux predictions using absolute gene expression data. BMC Systems Biology, 6, 73.

3. Shlomi, T., Cabili, M.N., Herrg\aard, M.J., Palsson, B.\O. & Ruppin, E. (2008). Network-based prediction of human tissue-specific metabolism. Nature biotechnology, 26, 1003-1010.

4. Becker, S.A. & Palsson, B.O. (2008). Context-specific metabolic networks are consistent with experiments. PLoS computational biology, 4, e1000082.

Import tabular data into a new Cytoscape 3 CyTable

For a Cy3 app to import tabular data, there are several options. Data can be imported into the default table, which is the same as the old fashion in cy2.x. A nice feature in cy3 is that it supports an arbitrary number of tables, and except for the default table, the tables don’t have to have nodes or edges as their keys – it makes sense to use them as keys though.

Here is how to create a new table with the tabular data:

1. The CyActivator needs to pass a few useful CyTable related services to the App class.

public class CyActivator extends AbstractCyActivator {
    @Override
    public void start(BundleContext bc) throws Exception {
        ...
        CyApplicationManager manager = getService(bc,CyApplicationManager.class);
        CyTableFactory tableFactory = getService(bc, CyTableFactory.class);
        CyTableManager tableManager = getService(bc,CyTableManager.class);
        MyApp app = new MyApp(..., tableFactory, tableManager);
        ...
    }
}

2. In the app class, use CyTableFactory to create a table, adding your data, and then add the table to Cy3 using CyTableManager.

public class MyApp {
    private CyTableFactory tableFactory;
    private CyTableManager tableManager;
    public MyApp(..., CyTableFactory tableFactory, CyTableManager tableManager) {
        this.tableFactory = tableFactory;
        this.tableManager = tableManager;
    }
    public void createTable(...) {
        ...
        CyTable table = tableFactory.createTable(...);
        // populate your table. See http://wiki.cytoscape.org/Cytoscape_3/AppDeveloper/Cytoscape_3_App_Cookbook#How_to_load_attribute_data.3F
        ...

        tableManager.addTable(table);
    }
}

A nicer way is to encapsulate the table creation in a Task. Cy3 has a few classes in org.cytoscape.task.edit that can serve as good examples of using Tasks with CyTable creation and data import.