Blossoc

About Blossoc

Blossoc is a linkage disequilibrium association mapping tool that attempts to build (perfect) genealogies for each site in the input and score these according to non-random clustering of affected individuals, and judge high-scoring areas as likely candidates for containing disease affecting variation.

Building the local genealogy trees is based on a number of heuristics that are not guaranteed to build true trees, but have the advantage over more sophisticated methods of being extremely fast. Blossoc can therefore handle much larger datasets than more sophisticated tools, but at the cost of sacrificing some accuracy.

The mapping method implemented in Blossoc is described in:

Whole genome association mapping by incompatibilities and local perfect phylogenies
T. Mailund, S. Besenbacher and M.H. Schierup
BMC Bioinformatics 2006 7(454). doi:10.1186/1471-2105-7-454.
Local phylogeny mapping of quantitative traits: Higher accuracy and better ranking than single marker association in genomewide scans
S. Besenbacher, T. Mailund and M.H. Schierup
Genetics 181 747-753 2009; doi:10.1534/genetics.108.092643

See also

Comparison of analyses of the QTLMAS XII common dataset. II: genome-wide association and fine mapping
L. Crooks, G. Sahana, D-J de Koning, M.S. Lund and O. Carlborg
BMC Proceedings 2009, 3(Suppl 1):S2
Data modeling as a main source of discrepancies in single and multiple marker association methods
M.C. Ledur, N. Navarro and M. Perez-Enciso
BMC Proceedings 2009, 3(Suppl 1):S2

Blossoc is released under the GNU General Public License.

Installation

Blossoc is written in C++ and is available as source code (under the GNU General Public License, GPL) and as binary versions as Linux RPM and Debian files. The source code has been successfully compiled on various Linux and UNIX systems. As I have only limited access to various architecture, it is at present not possible for me to make binary distributions for other platforms, but if anyone is willing to build the distributions I will be more than happy to put them on this site.

Blossoc requires the Boost Library and the GNU Scientific Library (GSL) to be installed. For the graphical user interface version, Qt 3.3 is needed and for SNPfile support, the SNPfile library is needed.

The most recent versions can be downloaded below, older versions are available from here.

Binary Distributions

The rpm-files were built on Linux Fedora Core 5 and the Debian files on Ubuntu Feisty Fawn. If you have any problems installing them on other Redhat or Debian based systems, please let me know.

Source Code Distributions

To build the source files, first uncompress and untar the file, then run 'configure' and finally 'make'. To test that the build was successful, run 'make check'. To install the program, run 'make install'.

    $ tar zxf blossoc-version.tar.gz
    $ cd blossoc-version
    $ ./configure
    $ make
    $ make check
    $ make install

Usage

You can use Blossoc in two different ways, as a command-line program or through a graphical user interface.

The command-line version

The command line version of Blossoc is started by the command

    $ blossoc

It takes, as input, a file containing the positions of markers and a file containing haplotype data.

The positions file should consist of an ordered sequence of space separated integers.

The genotypes file should contain one or two lines per individual (depending on whether the data is phased or unphased) where each line is a list of space separated allels: 0 and 1 for homozygotes and 2 for heterozygotes (with 2 only allowed for unphased data). The first column is a 'pseudo'-allele used for the case/control dichotomy: a 0 in the first column is taken to mean that the individual is a control and a 1 at the first column is taken to mean that the individual is a case.

Run blossoc --help to get a complete list of command-line options accepted by Blossoc.

If SNPfile support is enabled, you can use the command line tool

    $ snpfile_blossoc

to analyse SNPfiles generated with the text2snpfile tool.

Run snpfile_blossoc --help to get a complete list of command-line options accepted.

The GUI version

The GUI version of Blossoc is started from your applications menu or using the command

    $ Blossoc-Qt

See the help files of the GUI version for documentation.

Example files

Simulated test examples can be downloaded below. The positions-*.*.txt files contain the marker positions, the haplotypes-*.*.txt the haplotypes.

10 cM region (or ρ = 4000)

Two datasets with 200 markers in a region corresponding to recombination rate ρ = 4000 or about 10 cM. Each contains 1000 affected and 1000 unaffected haplotypes; the disease risk for a mutant is 10%, the risk for a wildtype is 5%.

0.1 cM region (or ρ = 40)

Three datasets with 200 markers in a region corresponding to recombination rate ρ = 40 or about 0.1 cM. Each contains 1000 affected and 1000 unaffected haplotypes; the disease risk for a mutant is 10%, the risk for a wildtype is 5%.

Contact

For bug-reports or feature requests, please use our bug-tracking software.

For comments or questions, please contact Thomas Mailund <mailund@birc.au.dk>, Bioinformatics Research Center (BiRC), University of Aarhus, Høegh-Guldbergsgade 10, DK-8000 Århus C.

Contact: mailund@birc.au.dk