I could not easily find a lightweight Newick parser written in Java. Newick parsers included in bioinformatics tools seem to parse into some customized complicated tree structure, so I decide to spend a few hours to write one myself.

With JavaCC, the parser is just a few lines of code. It builds a very simple tree structure which could be easily converted to any more complex tree structure.

A binary jar file is here: https://bitbucket.org/djiao/a-lightweight-javacc-newick-parser/downloads/newick-1.0-SNAPSHOT.jar. I tested it with a newick file (downloaded from http://users-birc.au.dk/zxr/phyloprof/taxomonicTreeNCBI_bin.nwk) and it worked pretty well. To run it, simply do

> cat taxomonicTreeNCBI_bin.nwk | java -cp newick-1.0-SNAPSHOT.jar newick.NewickParser

Sources are here: https://bitbucket.org/djiao/a-lightweight-javacc-newick-parser/src

Advertisements