Bioinformatics

Syndicate content
Updated: 6 hours 40 min ago

Semantic Disease Gene Embeddings (SmuDGE): phenotype-based disease gene prioritization without phenotypes

Sat, 2018-09-08 02:00
Motivation
In the past years, several methods have been developed to incorporate information about phenotypes into computational disease gene prioritization methods. These methods commonly compute the similarity between a disease’s (or patient’s) phenotypes and a database of gene-to-phenotype associations to find the phenotypically most similar match. A key limitation of these methods is their reliance on knowledge about phenotypes associated with particular genes which is highly incomplete in humans as well as in many model organisms such as the mouse.
Results
We developed SmuDGE, a method that uses feature learning to generate vector-based representations of phenotypes associated with an entity. SmuDGE can be used as a trainable semantic similarity measure to compare two sets of phenotypes (such as between a disease and gene, or a disease and patient). More importantly, SmuDGE can generate phenotype representations for entities that are only indirectly associated with phenotypes through an interaction network; for this purpose, SmuDGE exploits background knowledge in interaction networks comprised of multiple types of interactions. We demonstrate that SmuDGE can match or outperform semantic similarity in phenotype-based disease gene prioritization, and furthermore significantly extends the coverage of phenotype-based methods to all genes in a connected interaction network.
Availability and implementation
https://github.com/bio-ontology-research-group/SmuDGE
Categories: Bioinformatics, Journals

An ontology-based method for assessing batch effect adjustment approaches in heterogeneous datasets

Sat, 2018-09-08 02:00
Motivation
International consortia such as the Genotype-Tissue Expression (GTEx) project, The Cancer Genome Atlas (TCGA) or the International Human Epigenetics Consortium (IHEC) have produced a wealth of genomic datasets with the goal of advancing our understanding of cell differentiation and disease mechanisms. However, utilizing all of these data effectively through integrative analysis is hampered by batch effects, large cell type heterogeneity and low replicate numbers. To study if batch effects across datasets can be observed and adjusted for, we analyze RNA-seq data of 215 samples from ENCODE, Roadmap, BLUEPRINT and DEEP as well as 1336 samples from GTEx and TCGA. While batch effects are a considerable issue, it is non-trivial to determine if batch adjustment leads to an improvement in data quality, especially in cases of low replicate numbers.
Results
We present a novel method for assessing the performance of batch effect adjustment methods on heterogeneous data. Our method borrows information from the Cell Ontology to establish if batch adjustment leads to a better agreement between observed pairwise similarity and similarity of cell types inferred from the ontology. A comparison of state-of-the art batch effect adjustment methods suggests that batch effects in heterogeneous datasets with low replicate numbers cannot be adequately adjusted. Better methods need to be developed, which can be assessed objectively in the framework presented here.
Availability and implementation
Our method is available online at https://github.com/SchulzLab/OntologyEval.
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

Computational enhancement of single-cell sequences for inferring tumor evolution

Sat, 2018-09-08 02:00
Motivation
Tumor sequencing has entered an exciting phase with the advent of single-cell techniques that are revolutionizing the assessment of single nucleotide variation (SNV) at the highest cellular resolution. However, state-of-the-art single-cell sequencing technologies produce data with many missing bases (MBs) and incorrect base designations that lead to false-positive (FP) and false-negative (FN) detection of somatic mutations. While computational methods are available to make biological inferences in the presence of these errors, the accuracy of the imputed MBs and corrected FPs and FNs remains unknown.
Results
Using computer simulated datasets, we assessed the robustness performance of four existing methods (OncoNEM, SCG, SCITE and SiFit) and one new method (BEAM). BEAM is a Bayesian evolution-aware method that improves the quality of single-cell sequences by using the intrinsic evolutionary information in the single-cell data in a molecular phylogenetic framework. Overall, BEAM and SCITE performed the best. Most of the methods imputed MBs with high accuracy, but effective detection and correction of FPs and FNs is a challenge, especially for small datasets. Analysis of an empirical dataset shows that computational methods can improve both the quality of tumor single-cell sequences and their utility for biological inference. In conclusion, tumor cells descend from pre-existing cells, which creates evolutionary continuity in single-cell sequencing datasets. This information enables BEAM and other methods to correctly impute missing data and incorrect base assignments, but correction of FPs and FNs remains challenging when the number of SNVs sampled is small relative to the number of cells sequenced.
Availability and implementation
BEAM is available on the web at https://github.com/SayakaMiura/BEAM.
Categories: Bioinformatics, Journals

A Boolean network inference from time-series gene expression data using a genetic algorithm

Sat, 2018-09-08 02:00
Motivation
Inferring a gene regulatory network from time-series gene expression data is a fundamental problem in systems biology, and many methods have been proposed. However, most of them were not efficient in inferring regulatory relations involved by a large number of genes because they limited the number of regulatory genes or computed an approximated reliability of multivariate relations. Therefore, an improved method is needed to efficiently search more generalized and scalable regulatory relations.
Results
In this study, we propose a genetic algorithm-based Boolean network inference (GABNI) method which can search an optimal Boolean regulatory function of a large number of regulatory genes. For an efficient search, it solves the problem in two stages. GABNI first exploits an existing method, a mutual information-based Boolean network inference (MIBNI), because it can quickly find an optimal solution in a small-scale inference problem. When MIBNI fails to find an optimal solution, a genetic algorithm (GA) is applied to search an optimal set of regulatory genes in a wider solution space. In particular, we modified a typical GA framework to efficiently reduce a search space. We compared GABNI with four well-known inference methods through extensive simulations on both the artificial and the real gene expression datasets. Our results demonstrated that GABNI significantly outperformed them in both structural and dynamics accuracies.
Conclusion
The proposed method is an efficient and scalable tool to infer a Boolean network from time-series gene expression data.
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

Scalable and exhaustive screening of metabolic functions carried out by microbial consortia

Sat, 2018-09-08 02:00
Motivation
The selection of species exhibiting metabolic behaviors of interest is a challenging step when switching from the investigation of a large microbiota to the study of functions effectiveness. Approaches based on a compartmentalized framework are not scalable. The output of scalable approaches based on a non-compartmentalized modeling may be so large that it has neither been explored nor handled so far.
Results
We present the Miscoto tool to facilitate the selection of a community optimizing a desired function in a microbiome by reporting several possibilities which can be then sorted according to biological criteria. Communities are exhaustively identified using logical programming and by combining the non-compartmentalized and the compartmentalized frameworks. The benchmarking of 4.9 million metabolic functions associated with the Human Microbiome Project, shows that Miscoto is suited to screen and classify metabolic producibility in terms of feasibility, functional redundancy and cooperation processes involved. As an illustration of a host-microbial system, screening the Recon 2.2 human metabolism highlights the role of different consortia within a family of 773 intestinal bacteria.
Availability and implementation
Miscoto source code, instructions for use and examples are available at: https://github.com/cfrioux/miscoto.
Categories: Bioinformatics, Journals

Higher-order molecular organization as a source of biological function

Sat, 2018-09-08 02:00
Motivation
Molecular interactions have widely been modelled as networks. The local wiring patterns around molecules in molecular networks are linked with their biological functions. However, networks model only pairwise interactions between molecules and cannot explicitly and directly capture the higher-order molecular organization, such as protein complexes and pathways. Hence, we ask if hypergraphs (hypernetworks), that directly capture entire complexes and pathways along with protein–protein interactions (PPIs), carry additional functional information beyond what can be uncovered from networks of pairwise molecular interactions. The mathematical formalism of a hypergraph has long been known, but not often used in studying molecular networks due to the lack of sophisticated algorithms for mining the underlying biological information hidden in the wiring patterns of molecular systems modelled as hypernetworks.
Results
We propose a new, multi-scale, protein interaction hypernetwork model that utilizes hypergraphs to capture different scales of protein organization, including PPIs, protein complexes and pathways. In analogy to graphlets, we introduce hypergraphlets, small, connected, non-isomorphic, induced sub-hypergraphs of a hypergraph, to quantify the local wiring patterns of these multi-scale molecular hypergraphs and to mine them for new biological information. We apply them to model the multi-scale protein networks of bakers yeast and human and show that the higher-order molecular organization captured by these hypergraphs is strongly related to the underlying biology. Importantly, we demonstrate that our new models and data mining tools reveal different, but complementary biological information compared with classical PPI networks. We apply our hypergraphlets to successfully predict biological functions of uncharacterized proteins.
Availability and implementation
Code and data are available online at http://www0.cs.ucl.ac.uk/staff/natasa/hypergraphlets.
Categories: Bioinformatics, Journals

FLYCOP: metabolic modeling-based analysis and engineering microbial communities

Sat, 2018-09-08 02:00
Motivation
Synthetic microbial communities begin to be considered as promising multicellular biocatalysts having a large potential to replace engineered single strains in biotechnology applications, in pharmaceutical, chemical and living architecture sectors. In contrast to single strain engineering, the effective and high-throughput analysis and engineering of microbial consortia face the lack of knowledge, tools and well-defined workflows. This manuscript contributes to fill this important gap with a framework, called FLYCOP (FLexible sYnthetic Consortium OPtimization), which contributes to microbial consortia modeling and engineering, while improving the knowledge about how these communities work. FLYCOP selects the best consortium configuration to optimize a given goal, among multiple and diverse configurations, in a flexible way, taking temporal changes in metabolite concentrations into account.
Results
In contrast to previous systems optimizing microbial consortia, FLYCOP has novel characteristics to face up to new problems, to represent additional features and to analyze events influencing the consortia behavior. In this manuscript, FLYCOP optimizes a Synechococcus elongatus-Pseudomonas putida consortium to produce the maximum amount of bio-plastic (PHA, polyhydroxyalkanoate), and highlights the influence of metabolites exchange dynamics in a four auxotrophic Escherichia coli consortium with parallel growth. FLYCOP can also provide an explanation about biological evolution driving evolutionary engineering endeavors by describing why and how heterogeneous populations emerge from monoclonal ones.
Availability and implementation
Code reproducing the study cases described in this manuscript are available on-line: https://github.com/beatrizgj/FLYCOP
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

Single cell network analysis with a mixture of Nested Effects Models

Sat, 2018-09-08 02:00
Motivation
New technologies allow for the elaborate measurement of different traits of single cells under genetic perturbations. These interventional data promise to elucidate intra-cellular networks in unprecedented detail and further help to improve treatment of diseases like cancer. However, cell populations can be very heterogeneous.
Results
We developed a mixture of Nested Effects Models (M&NEM) for single-cell data to simultaneously identify different cellular subpopulations and their corresponding causal networks to explain the heterogeneity in a cell population. For inference, we assign each cell to a network with a certain probability and iteratively update the optimal networks and cell probabilities in an Expectation Maximization scheme. We validate our method in the controlled setting of a simulation study and apply it to three data sets of pooled CRISPR screens generated previously by two novel experimental techniques, namely Crop-Seq and Perturb-Seq.
Availability and implementation
The mixture Nested Effects Model (M&NEM) is available as the R-package mnem at https://github.com/cbg-ethz/mnem/.
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

Hierarchical HotNet: identifying hierarchies of altered subnetworks

Sat, 2018-09-08 02:00
Motivation
The analysis of high-dimensional ‘omics data is often informed by the use of biological interaction networks. For example, protein–protein interaction networks have been used to analyze gene expression data, to prioritize germline variants, and to identify somatic driver mutations in cancer. In these and other applications, the underlying computational problem is to identify altered subnetworks containing genes that are both highly altered in an ‘omics dataset and are topologically close (e.g. connected) on an interaction network.
Results
We introduce Hierarchical HotNet, an algorithm that finds a hierarchy of altered subnetworks. Hierarchical HotNet assesses the statistical significance of the resulting subnetworks over a range of biological scales and explicitly controls for ascertainment bias in the network. We evaluate the performance of Hierarchical HotNet and several other algorithms that identify altered subnetworks on the problem of predicting cancer genes and significantly mutated subnetworks. On somatic mutation data from The Cancer Genome Atlas, Hierarchical HotNet outperforms other methods and identifies significantly mutated subnetworks containing both well-known cancer genes and candidate cancer genes that are rarely mutated in the cohort. Hierarchical HotNet is a robust algorithm for identifying altered subnetworks across different ‘omics datasets.
Availability and implementation
http://github.com/raphael-group/hierarchical-hotnet.
Supplementary information
Supplementary materialSupplementary material are available at Bioinformatics online.
Categories: Bioinformatics, Journals

Understanding the evolution of functional redundancy in metabolic networks

Sat, 2018-09-08 02:00
Motivation
Metabolic networks have evolved to reduce the disruption of key metabolic pathways by the establishment of redundant genes/reactions. Synthetic lethals in metabolic networks provide a window to study these functional redundancies. While synthetic lethals have been previously studied in different organisms, there has been no study on how the synthetic lethals are shaped during adaptation/evolution.
Results
To understand the adaptive functional redundancies that exist in metabolic networks, we here explore a vast space of ‘random’ metabolic networks evolved on a glucose environment. We examine essential and synthetic lethal reactions in these random metabolic networks, evaluating over 39 billion phenotypes using an efficient algorithm previously developed in our lab, Fast-SL. We establish that nature tends to harbour higher levels of functional redundancies compared with random networks. We then examined the propensity for different reactions to compensate for one another and show that certain key metabolic reactions that are necessary for growth in a particular growth medium show much higher redundancies, and can partner with hundreds of different reactions across the metabolic networks that we studied. We also observe that certain redundancies are unique to environments while some others are observed in all environments. Interestingly, we observe that even very diverse reactions, such as those belonging to distant pathways, show synthetic lethality, illustrating the distributed nature of robustness in metabolism. Our study paves the way for understanding the evolution of redundancy in metabolic networks, and sheds light on the varied compensation mechanisms that serve to enhance robustness.
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

iTOP: inferring the topology of omics data

Sat, 2018-09-08 02:00
Motivation
In biology, we are often faced with multiple datasets recorded on the same set of objects, such as multi-omics and phenotypic data of the same tumors. These datasets are typically not independent from each other. For example, methylation may influence gene expression, which may, in turn, influence drug response. Such relationships can strongly affect analyses performed on the data, as we have previously shown for the identification of biomarkers of drug response. Therefore, it is important to be able to chart the relationships between datasets.
Results
We present iTOP, a methodology to infer a topology of relationships between datasets. We base this methodology on the RV coefficient, a measure of matrix correlation, which can be used to determine how much information is shared between two datasets. We extended the RV coefficient for partial matrix correlations, which allows the use of graph reconstruction algorithms, such as the PC algorithm, to infer the topologies. In addition, since multi-omics data often contain binary data (e.g. mutations), we also extended the RV coefficient for binary data. Applying iTOP to pharmacogenomics data, we found that gene expression acts as a mediator between most other datasets and drug response: only proteomics clearly shares information with drug response that is not present in gene expression. Based on this result, we used TANDEM, a method for drug response prediction, to identify which variables predictive of drug response were distinct to either gene expression or proteomics.
Availability and implementation
An implementation of our methodology is available in the R package iTOP on CRAN. Additionally, an R Markdown document with code to reproduce all figures is provided as Supplementary MaterialSupplementary Material.
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

Comparative Network Reconstruction using mixed integer programming

Sat, 2018-09-08 02:00
Motivation
Signal-transduction networks are often aberrated in cancer cells, and new anti-cancer drugs that specifically target oncogenes involved in signaling show great clinical promise. However, the effectiveness of such targeted treatments is often hampered by innate or acquired resistance due to feedbacks, crosstalks or network adaptations in response to drug treatment. A quantitative understanding of these signaling networks and how they differ between cells with different oncogenic mutations or between sensitive and resistant cells can help in addressing this problem.
Results
Here, we present Comparative Network Reconstruction (CNR), a computational method to reconstruct signaling networks based on possibly incomplete perturbation data, and to identify which edges differ quantitatively between two or more signaling networks. Prior knowledge about network topology is not required but can straightforwardly be incorporated. We extensively tested our approach using simulated data and applied it to perturbation data from a BRAF mutant, PTPN11 KO cell line that developed resistance to BRAF inhibition. Comparing the reconstructed networks of sensitive and resistant cells suggests that the resistance mechanism involves re-establishing wild-type MAPK signaling, possibly through an alternative RAF-isoform.
Availability and implementation
CNR is available as a python module at https://github.com/NKI-CCB/cnr. Additionally, code to reproduce all figures is available at https://github.com/NKI-CCB/CNR-analyses.
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

ISMB 2018 PROCEEDINGS PAPERS COMMITTEE

Wed, 2018-06-27 02:00
PROCEEDINGS COMMITTEE
Categories: Bioinformatics, Journals

A graph-based approach to diploid genome assembly

Wed, 2018-06-27 02:00
Abstract
Motivation
Constructing high-quality haplotype-resolved de novo assemblies of diploid genomes is important for revealing the full extent of structural variation and its role in health and disease. Current assembly approaches often collapse the two sequences into one haploid consensus sequence and, therefore, fail to capture the diploid nature of the organism under study. Thus, building an assembler capable of producing accurate and complete diploid assemblies, while being resource-efficient with respect to sequencing costs, is a key challenge to be addressed by the bioinformatics community.
Results
We present a novel graph-based approach to diploid assembly, which combines accurate Illumina data and long-read Pacific Biosciences (PacBio) data. We demonstrate the effectiveness of our method on a pseudo-diploid yeast genome and show that we require as little as 50× coverage Illumina data and 10× PacBio data to generate accurate and complete assemblies. Additionally, we show that our approach has the ability to detect and phase structural variants.
Availability and implementation
https://github.com/whatshap/whatshap
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

Strand-seq enables reliable separation of long reads by chromosome via expectation maximization

Wed, 2018-06-27 02:00
Abstract
Motivation
Current sequencing technologies are able to produce reads orders of magnitude longer than ever possible before. Such long reads have sparked a new interest in de novo genome assembly, which removes reference biases inherent to re-sequencing approaches and allows for a direct characterization of complex genomic variants. However, even with latest algorithmic advances, assembling a mammalian genome from long error-prone reads incurs a significant computational burden and does not preclude occasional misassemblies. Both problems could potentially be mitigated if assembly could commence for each chromosome separately.
Results
To address this, we show how single-cell template strand sequencing (Strand-seq) data can be leveraged for this purpose. We introduce a novel latent variable model and a corresponding Expectation Maximization algorithm, termed SaaRclust, and demonstrates its ability to reliably cluster long reads by chromosome. For each long read, this approach produces a posterior probability distribution over all chromosomes of origin and read directionalities. In this way, it allows to assess the amount of uncertainty inherent to sparse Strand-seq data on the level of individual reads. Among the reads that our algorithm confidently assigns to a chromosome, we observed more than 99% correct assignments on a subset of Pacific Bioscience reads with 30.1× coverage. To our knowledge, SaaRclust is the first approach for the in silico separation of long reads by chromosome prior to assembly.
Availability and implementation
https://github.com/daewoooo/SaaRclust
Categories: Bioinformatics, Journals

Scalable preprocessing for sparse scRNA-seq data exploiting prior knowledge

Wed, 2018-06-27 02:00
Abstract
Motivation
Single cell RNA-seq (scRNA-seq) data contains a wealth of information which has to be inferred computationally from the observed sequencing reads. As the ability to sequence more cells improves rapidly, existing computational tools suffer from three problems. (i) The decreased reads-per-cell implies a highly sparse sample of the true cellular transcriptome. (ii) Many tools simply cannot handle the size of the resulting datasets. (iii) Prior biological knowledge such as bulk RNA-seq information of certain cell types or qualitative marker information is not taken into account. Here we present UNCURL, a preprocessing framework based on non-negative matrix factorization for scRNA-seq data, that is able to handle varying sampling distributions, scales to very large cell numbers and can incorporate prior knowledge.
Results
We find that preprocessing using UNCURL consistently improves performance of commonly used scRNA-seq tools for clustering, visualization and lineage estimation, both in the absence and presence of prior knowledge. Finally we demonstrate that UNCURL is extremely scalable and parallelizable, and runs faster than other methods on a scRNA-seq dataset containing 1.3 million cells.
Availability and implementation
Source code is available at https://github.com/yjzhang/uncurl_python.
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

Asymptotically optimal minimizers schemes

Wed, 2018-06-27 02:00
Abstract
Motivation
The minimizers technique is a method to sample k-mers that is used in many bioinformatics software to reduce computation, memory usage and run time. The number of applications using minimizers keeps on growing steadily. Despite its many uses, the theoretical understanding of minimizers is still very limited. In many applications, selecting as few k-mers as possible (i.e. having a low density) is beneficial. The density is highly dependent on the choice of the order on the k-mers. Different applications use different orders, but none of these orders are optimal. A better understanding of minimizers schemes, and the related local and forward schemes, will allow designing schemes with lower density and thereby making existing and future bioinformatics tools even more efficient.
Results
From the analysis of the asymptotic behavior of minimizers, forward and local schemes, we show that the previously believed lower bound on minimizers schemes does not hold, and that schemes with density lower than thought possible actually exist. The proof is constructive and leads to an efficient algorithm to compare k-mers. These orders are the first known orders that are asymptotically optimal. Additionally, we give improved bounds on the density achievable by the three type of schemes.
Categories: Bioinformatics, Journals

Predicting CTCF-mediated chromatin loops using CTCF-MP

Wed, 2018-06-27 02:00
Abstract
Motivation
The three dimensional organization of chromosomes within the cell nucleus is highly regulated. It is known that CCCTC-binding factor (CTCF) is an important architectural protein to mediate long-range chromatin loops. Recent studies have shown that the majority of CTCF binding motif pairs at chromatin loop anchor regions are in convergent orientation. However, it remains unknown whether the genomic context at the sequence level can determine if a convergent CTCF motif pair is able to form a chromatin loop.
Results
In this article, we directly ask whether and what sequence-based features (other than the motif itself) may be important to establish CTCF-mediated chromatin loops. We found that motif conservation measured by ‘branch-of-origin’ that accounts for motif turn-over in evolution is an important feature. We developed a new machine learning algorithm called CTCF-MP based on word2vec to demonstrate that sequence-based features alone have the capability to predict if a pair of convergent CTCF motifs would form a loop. Together with functional genomic signals from CTCF ChIP-seq and DNase-seq, CTCF-MP is able to make highly accurate predictions on whether a convergent CTCF motif pair would form a loop in a single cell type and also across different cell types. Our work represents an important step further to understand the sequence determinants that may guide the formation of complex chromatin architectures.
Availability and implementation
The source code of CTCF-MP can be accessed at: https://github.com/ma-compbio/CTCF-MP
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

Versatile genome assembly evaluation with QUAST-LG

Wed, 2018-06-27 02:00
Abstract
Motivation
The emergence of high-throughput sequencing technologies revolutionized genomics in early 2000s. The next revolution came with the era of long-read sequencing. These technological advances along with novel computational approaches became the next step towards the automatic pipelines capable to assemble nearly complete mammalian-size genomes.
Results
In this manuscript, we demonstrate performance of the state-of-the-art genome assembly software on six eukaryotic datasets sequenced using different technologies. To evaluate the results, we developed QUAST-LG—a tool that compares large genomic de novo assemblies against reference sequences and computes relevant quality metrics. Since genomes generally cannot be reconstructed completely due to complex repeat patterns and low coverage regions, we introduce a concept of upper bound assembly for a given genome and set of reads, and compute theoretical limits on assembly correctness and completeness. Using QUAST-LG, we show how close the assemblies are to the theoretical optimum, and how far this optimum is from the finished reference.
Availability and implementation
http://cab.spbu.ru/software/quast-lg
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

Optimization and profile calculation of ODE models using second order adjoint sensitivity analysis

Wed, 2018-06-27 02:00
Abstract
Motivation
Parameter estimation methods for ordinary differential equation (ODE) models of biological processes can exploit gradients and Hessians of objective functions to achieve convergence and computational efficiency. However, the computational complexity of established methods to evaluate the Hessian scales linearly with the number of state variables and quadratically with the number of parameters. This limits their application to low-dimensional problems.
Results
We introduce second order adjoint sensitivity analysis for the computation of Hessians and a hybrid optimization-integration-based approach for profile likelihood computation. Second order adjoint sensitivity analysis scales linearly with the number of parameters and state variables. The Hessians are effectively exploited by the proposed profile likelihood computation approach. We evaluate our approaches on published biological models with real measurement data. Our study reveals an improved computational efficiency and robustness of optimization compared to established approaches, when using Hessians computed with adjoint sensitivity analysis. The hybrid computation method was more than 2-fold faster than the best competitor. Thus, the proposed methods and implemented algorithms allow for the improvement of parameter estimation for medium and large scale ODE models.
Availability and implementation
The algorithms for second order adjoint sensitivity analysis are implemented in the Advanced MATLAB Interface to CVODES and IDAS (AMICI, https://github.com/ICB-DCM/AMICI/). The algorithm for hybrid profile likelihood computation is implemented in the parameter estimation toolbox (PESTO, https://github.com/ICB-DCM/PESTO/). Both toolboxes are freely available under the BSD license.
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals