About

COTRASIF: conservation-aided transcription factor binding site finder, is a free tool for genome-wide detection of putative TFBS in the promoters of the eukaryotic genes. (Each run currently accepts only single TFBS matrix/set of sequences.)

PFM and PWM

The TFBS consensus sequence motifs are usually represented using either IUPAC (International Union of Pure and Applied Chemistry) nomenclature consensus string, or matrices, the two most common being PFM (position frequency matrix, also known as position count matrix) and PWM (position weight matrix, or nucleotide weight matrix). PFM is a matrix consisting of nucleotide counts per each position of the identified binding site. PFMs were first used to characterize DNA-binding site specificity in 1982-1986. Later, quantitative discrimination of sites with calculated site scores using position weight matrices was introduced. A weight matrix pattern definition is superior to a simple IUPAC consensus sequence, as it represents the complete nucleotide occurrence probabilities for each position. It also allows the quantification of the similarity between the weight matrix and a potential TFBS detected in the target sequence. PWM is an estimate of the binding energy of the transcription factor to its specific binding site.

HMM

HMM's use for TFBS search is based on the fact that Markov chains, which constitute the Markov model, hold more information about the experimentally detected binding sites, than PWM. Where PWM only counts the frequencies of occurrence of each nucleotide in each position, Markov model also accounts for the neighborhood of each nucleotide. Thus, HMM-based method increases specificity of the search, though search sensitivity might be impaired when comparing to PWM search.

Problem

Computational TFBS prediction provides reliable results in application to prokaryotes and yeast. However, in higher eukaryotes accurate and reliable TFBS prediction is an outstanding challenge.

Online applications, such as MatInspector, MATCH and ConSite have been built to predict transcription factor binding sites embedded in promoter sequences. However, TFBS search only identifies sites where the transcription factor could bind, but not necessarily will bind.

Solution

When applying PWM-based methods, matrix-site similarity score threshold can be used to increase specificity (get less false-positives) at the cost of sensitivity (find less true-positives). To avoid the loss of sensitivity, and reduce the number of false-positive binding site predictions, additional analysis can be applied: looking for paired TFBS, TFBS motifs, using gene orthology information, microarray-derived gene co-expression data, applying learning algorithms trained on known transcription factor target genes, etc.

In COTRASIF, additional TFBS evolutionary conservation analysis is proposed to filter biologically-insignificant binding sites. As of the COTRASIF launch, analysis is performed according to these steps (using ISRE PFM as an example):

  1. COTRASIF user runs PWM search with ISRE PFM on rat with cut-off set to 0.75;
  2. then user runs PWM search with ISRE on mouse with cut-off set to 0.75;
  3. finally, user starts Conservation filter, and receives the list of rat and mouse Ensembl gene identifiers; these are the genes which do have ISRE in their promoter (as detected by the PWM method with 0.75 relative similarity cut-off), and which are orthologs between these two organisms.

Gene orthology data

In COTRASIF, orthology data is taken from Ensembl's Compara. Two genes are considered orthologs (for the purposes of the Conservation filter tool), if:

Promoter definition

Gene promoters were defined as 2000bp upstream of the TSS, plus the first 5'UTR. Promoters are fetched into the database of COTRASIF by the specially designed pipeline from Ensembl.

Contacting COTRASIF team

Until COTRASIF gets it's dedicated contact form, please use this contact page for any suggestions, feature requests, questions, criticism, and support offers.

© Bogdan Tokovenko (2006 - 2011) and Rostyslav Golda (2008 - 2009)
Portions © Oleksiy Protas (2009)