Text-based over-representation analysis of microarray gene lists with annotation bias - A major challenge in microarray data analysis is the functional interpretation of gene lists.
TARGeT: a web-based pipeline for retrieving and characterizing gene and transposable element families from genomic sequences - Gene families compose a large proportion of eukaryotic genomes. The rapidly expanding genomic sequence database provides a good opportunity to study gene family evolution and function.
Textual data compression in computational biology: a synopsis - Motivation: Textual data compression, and the associated techniques coming from information theory, are often perceived as being of interest for data communication and storage.
Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner - Motivation: The most accurate way to determine the intron–exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines.
Approximate Bayesian feature selection on a large meta-dataset offers novel insights on factors that effect siRNA potency - Motivation: Short interfering RNA (siRNA)-induced RNA interference is an endogenous pathway in sequence-specific gene silencing.
Augmented training of hidden Markov models to recognize remote homologs via simulated evolution - Motivation: While profile hidden Markov models (HMMs) are successful and powerful methods to recognize homologous proteins, they can break down when homology becomes too distant due to lack of sufficien
A practical algorithm for finding maximal exact matches in large sequence datasets using sparse suffix arrays - Motivation: High-throughput sequencing technologies place ever increasing demands on existing algorithms for sequence analysis.
Assessment of the optimization of affinity and specificity at protein-DNA interfaces - The biological functions of DNA-binding proteins often require that they interact with their targets with high affinity and/or high specificity.
Genome-wide de novo prediction of cis-regulatory binding sites in prokaryotes - Although cis-regulatory binding sites (CRBSs) are at least as important as the coding sequences in a genome, our general understanding of them in most sequenced genomes is very limited due to the lack of efficient and accurate exper
Constrained mixture estimation for analysis and robust classification of clinical time series - Motivation: Personalized medicine based on molecular aspects of diseases, such as gene expression profiling, has become increasingly popular.