Efficient computation of all perfect repeats in genomic sequences of up to half a gigabyte, with a case study on the human genome - Motivation: There is a significant ongoing research to identify the number and types of repetitive DNA sequences.
ESG: extended similarity group method for automated protein function prediction - Motivation: Importance of accurate automatic protein function prediction is ever increasing in the face of a large number of newly sequenced genomes and proteomics data that are awaiting biological interpretation.
Data structures and compression algorithms for genomic sequence data - Motivation: The continuing exponential accumulation of full genome data, including full diploid human genomes, creates new challenges not only for understanding genomic structure, function and evolution, but also for the storage,
SOrt-ITEMS: Sequence orthology based approach for improved taxonomic estimation of metagenomic sequences - Motivation:One of the first steps in metagenomic analysis is the assignment of reads/contigs obtained from various sequencing technologies to their correct taxonomic bins.
Hierarchical hidden Markov model with application to joint analysis of ChIP-chip and ChIP-seq data - Motivation: Chromatin immunoprecipitation (ChIP) experiments followed by array hybridization, or ChIP-chip, is a powerful approach for identifying transcription factor binding sites (TFBS) and has be