Textual data compression in computational biology: a synopsis - Motivation: Textual data compression, and the associated techniques coming from information theory, are often perceived as being of interest for data communication and storage.
Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner - Motivation: The most accurate way to determine the intron–exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines.
Approximate Bayesian feature selection on a large meta-dataset offers novel insights on factors that effect siRNA potency - Motivation: Short interfering RNA (siRNA)-induced RNA interference is an endogenous pathway in sequence-specific gene silencing.
Augmented training of hidden Markov models to recognize remote homologs via simulated evolution - Motivation: While profile hidden Markov models (HMMs) are successful and powerful methods to recognize homologous proteins, they can break down when homology becomes too distant due to lack of sufficien
A practical algorithm for finding maximal exact matches in large sequence datasets using sparse suffix arrays - Motivation: High-throughput sequencing technologies place ever increasing demands on existing algorithms for sequence analysis.