Bioinformatics

Syndicate content
Updated: 12 hours 36 min ago

Boosting the extraction of elementary flux modes in genome-scale metabolic networks using the linear programming approach

Fri, 2020-07-10 02:00
Abstract
Motivation
Elementary flux modes (EFMs) are a key tool for analyzing genome-scale metabolic networks, and several methods have been proposed to compute them. Among them, those based on solving linear programming (LP) problems are known to be very efficient if the main interest lies in computing large enough sets of EFMs.
Results
Here, we propose a new method called EFM-Ta that boosts the efficiency rate by analyzing the information provided by the LP solver. We base our method on a further study of the final tableau of the simplex method. By performing additional elementary steps and avoiding trivial solutions consisting of two cycles, we obtain many more EFMs for each LP problem posed, improving the efficiency rate of previously proposed methods by more than one order of magnitude.
Availability and implementation
Software is freely available at https://github.com/biogacop/Boost_LP_EFM.
Contact
fguil@um.es
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

FastTargetPred: a program enabling the fast prediction of putative protein targets for input chemical databases

Sat, 2020-06-27 02:00
Abstract
Summary
Several web‐based tools predict the putative targets of a small molecule query compound by similarity to molecules with known bioactivity data using molecular fingerprints. In numerous situations, it would however be valuable to be able to run such computations on a local computer. We present FastTargetPred, a new program for the prediction of protein targets for small molecule queries. Structural similarity computations rely on a large collection of confirmed protein–ligand activities extracted from the curated ChEMBL 25 database. The program allows to annotate an input chemical library of ∼100k compounds within a few hours on a simple personal computer.
Availability and implementation
FastTargetPred is written in Python 3 (≥3.7) and C languages. Python code depends only on the Python Standard Library. The program can be run on Linux, MacOS and Windows operating systems. Pre-compiled versions are available at https://github.com/ludovicchaput/FastTargetPred. FastTargetPred is licensed under the GNU GPLv3. The program calls some scripts from the free chemistry toolkit MayaChemTools.
Contact
bruno.villoutreix@inserm.fr
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

Capybara: equivalence ClAss enumeration of coPhylogenY event-BAsed ReconciliAtions

Thu, 2020-06-18 02:00
Abstract
Motivation
Phylogenetic tree reconciliation is the method of choice in analyzing host-symbiont systems. Despite the many reconciliation tools that have been proposed in the literature, two main issues remain unresolved: (i) listing suboptimal solutions (i.e. whose score is ‘close’ to the optimal ones) and (ii) listing only solutions that are biologically different ‘enough’. The first issue arises because the optimal solutions are not always the ones biologically most significant; providing many suboptimal solutions as alternatives for the optimal ones is thus very useful. The second one is related to the difficulty to analyze an often huge number of optimal solutions. In this article, we propose Capybara that addresses both of these problems in an efficient way. Furthermore, it includes a tool for visualizing the solutions that significantly helps the user in the process of analyzing the results.
Availability and implementation
The source code, documentation and binaries for all platforms are freely available at https://capybara-doc.readthedocs.io/.
Contact
yishu.wang@univ-lyon1.fr or blerina.sinaimeri@inria.fr
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

Higher-order Markov models for metagenomic sequence classification

Tue, 2020-06-09 02:00
Abstract
Motivation
Alignment-free, stochastic models derived from k-mer distributions representing reference genome sequences have a rich history in the classification of DNA sequences. In particular, the variants of Markov models have previously been used extensively. Higher-order Markov models have been used with caution, perhaps sparingly, primarily because of the lack of enough training data and computational power. Advances in sequencing technology and computation have enabled exploitation of the predictive power of higher-order models. We, therefore, revisited higher-order Markov models and assessed their performance in classifying metagenomic sequences.
Results
Comparative assessment of higher-order models (HOMs, 9th order or higher) with interpolated Markov model, interpolated context model and lower-order models (8th order or lower) was performed on metagenomic datasets constructed using sequenced prokaryotic genomes. Our results show that HOMs outperform other models in classifying metagenomic fragments as short as 100 nt at all taxonomic ranks, and at lower ranks when the fragment size was increased to 250 nt. HOMs were also found to be significantly more accurate than local alignment which is widely relied upon for taxonomic classification of metagenomic sequences. A novel software implementation written in C++ performs classification faster than the existing Markovian metagenomic classifiers and can therefore be used as a standalone classifier or in conjunction with existing taxonomic classifiers for more robust classification of metagenomic sequences.
Availability and implementation
The software has been made available at https://github.com/djburks/SMM.
Contact
Rajeev.Azad@unt.edu
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

HiGwas: how to compute longitudinal GWAS data in population designs

Fri, 2020-06-05 02:00
Abstract
Summary
Genome-wide association studies (GWAS), particularly designed with thousands and thousands of single-nucleotide polymorphisms (SNPs) (big p) genotyped on tens of thousands of subjects (small n), are encountered by a major challenge of p ≪ n. Although the integration of longitudinal information can significantly enhance a GWAS’s power to comprehend the genetic architecture of complex traits and diseases, an additional challenge is generated by an autocorrelative process. We have developed several statistical models for addressing these two challenges by implementing dimension reduction methods and longitudinal data analysis. To make these models computationally accessible to applied geneticists, we wrote an R package of computer software, HiGwas, designed to analyze longitudinal GWAS datasets. Functions in the package encompass single SNP analyses, significance-level adjustment, preconditioning and model selection for a high-dimensional set of SNPs. HiGwas provides the estimates of genetic parameters and the confidence intervals of these estimates. We demonstrate the features of HiGwas through real data analysis and vignette document in the package.
Availability and implementation
https://github.com/wzhy2000/higwas.
Contact
rwu@phs.psu.edu
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

Unipept CLI 2.0: adding support for visualizations and functional annotations

Wed, 2020-06-03 02:00
Abstract
Summary
Unipept is an ecosystem of tools developed for fast metaproteomics data-analysis consisting of a web application, a set of web services (application programming interface, API) and a command-line interface (CLI). After the successful introduction of version 4 of the Unipept web application, we here introduce version 2.0 of the API and CLI. Next to the existing taxonomic analysis, version 2.0 of the API and CLI provides access to Unipept’s powerful functional analysis for metaproteomics samples. The functional analysis pipeline supports retrieval of Enzyme Commission numbers, Gene Ontology terms and InterPro entries for the individual peptides in a metaproteomics sample. This paves the way for other applications and developers to integrate these new information sources into their data processing pipelines, which greatly increases insight into the functions performed by the organisms in a specific environment. Both the API and CLI have also been expanded with the ability to render interactive visualizations from a list of taxon ids. These visualizations are automatically made available on a dedicated website and can easily be shared by users.
Availability and implementation
The API is available at http://api.unipept.ugent.be. Information regarding the CLI can be found at https://unipept.ugent.be/clidocs. Both interfaces are freely available and open-source under the MIT license.
Contact
pieter.verschaffelt@ugent.be
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

scTPA: a web tool for single-cell transcriptome analysis of pathway activation signatures

Thu, 2020-05-21 02:00
Abstract
Motivation
At present, a fundamental challenge in single-cell RNA-sequencing data analysis is functional interpretation and annotation of cell clusters. Biological pathways in distinct cell types have different activation patterns, which facilitates the understanding of cell functions using single-cell transcriptomics. However, no effective web tool has been implemented for single-cell transcriptome data analysis based on prior biological pathway knowledge.
Results
Here, we present scTPA, a web-based platform for pathway-based analysis of single-cell RNA-seq data in human and mouse. scTPA incorporates four widely-used gene set enrichment methods to estimate the pathway activation scores of single cells based on a collection of available biological pathways with different functional and taxonomic classifications. The clustering analysis and cell-type-specific activation pathway identification were provided for the functional interpretation of cell types from a pathway-oriented perspective. An intuitive interface allows users to conveniently visualize and download single-cell pathway signatures. Overall, scTPA is a comprehensive tool for the identification of pathway activation signatures for the analysis of single cell heterogeneity.
Availability and implementation
http://sctpa.bio-data.cn/sctpa.
Contact
sujz@wmu.edu.cn or yufulong421@gmail.com or zgj@zjut.edu.cn
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

ThETA: transcriptome-driven efficacy estimates for gene-based TArget discovery

Thu, 2020-05-21 02:00
Abstract
Summary
Estimating efficacy of gene–target-disease associations is a fundamental step in drug discovery. An important data source for this laborious task is RNA expression, which can provide gene–disease associations on the basis of expression fold change and statistical significance. However, the simply use of the log-fold change can lead to numerous false-positive associations. On the other hand, more sophisticated methods that utilize gene co-expression networks do not consider tissue specificity. Here, we introduce Transcriptome-driven Efficacy estimates for gene-based TArget discovery (ThETA), an R package that enables non-expert users to use novel efficacy scoring methods for drug–target discovery. In particular, ThETA allows users to search for gene perturbation (therapeutics) that reverse disease-gene expression and genes that are closely related to disease-genes in tissue-specific networks. ThETA also provides functions to integrate efficacy evaluations obtained with different approaches and to build an overall efficacy score, which can be used to identify and prioritize gene(target)–disease associations. Finally, ThETA implements visualizations to show tissue-specific interconnections between target and disease-genes, and to indicate biological annotations associated with the top selected genes.
Availability and implementation
ThETA is freely available for academic use at https://github.com/vittoriofortino84/ThETA.
Contact
vittorio.fortino@uef.fi
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

ProteinFishing: a protein complex generator within the ModelX toolsuite

Thu, 2020-05-21 02:00
Abstract
Summary
Accurate 3D modelling of protein–protein interactions (PPI) is essential to compensate for the absence of experimentally determined complex structures. Here, we present a new set of commands within the ModelX toolsuite capable of generating atomic-level protein complexes suitable for interface design. Among these commands, the new tool ProteinFishing proposes known and/or putative alternative 3D PPI for a given protein complex. The algorithm exploits backbone compatibility of protein fragments to generate mutually exclusive protein interfaces that are quickly evaluated with a knowledge-based statistical force field. Using interleukin-10-R2 co-crystalized with interferon-lambda-3, and a database of X-ray structures containing interleukin-10, this algorithm was able to generate interleukin-10-R2/interleukin-10 structural models in agreement with experimental data.
Availability and implementation
ProteinFishing is a portable command-line tool included in the ModelX toolsuite, written in C++, that makes use of an SQL (tested for MySQL and MariaDB) relational database delivered with a template SQL dump called FishXDB. FishXDB contains the empty tables of ModelX fragments and the data used by the embedded statistical force field. ProteinFishing is compiled for Linux-64bit, MacOS-64bit and Windows-32bit operating systems. This software is a proprietary license and is distributed as an executable with its correspondent database dumps. It can be downloaded publicly at http://modelx.crg.es/. Licenses are freely available for academic users after registration on the website and are available under commercial license for for-profit organizations or companies.
Contact
javier.delgado@crg.eu or luis.serrano@crg.eu
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

PLIDflow: an open-source workflow for the online analysis of protein–ligand docking using galaxy

Sat, 2020-05-16 02:00
Abstract
Motivation
Molecular docking is aimed at predicting the conformation of small-molecule (ligands) within an identified binding site (BS) in a target protein (receptor). Protein–ligand docking plays an important role in modern drug discovery and biochemistry for protein engineering. However, efficient docking analysis of proteins requires prior knowledge of the BS, which is not always known. The process which covers BS identification and protein–ligand docking usually requires the combination of different programs, which require several input parameters. This is furtherly aggravated when factoring in computational demands, such as CPU-time. Therefore, these types of simulation experiments can become a complex process for researchers without a background in computer sciences.
Results
To overcome these problems, we have designed an automatic computational workflow (WF) to process protein–ligand complexes, which runs from the identification of the possible BSs positions to the prediction of the experimental binding modes and affinities of the ligand. This open-access WF runs under the Galaxy platform that integrates public domain software. The results of the proposed method are in close agreement with state-of-the-art docking software.
Availability and implementation
Software is available at: https://pistacho.ac.uma.es/galaxy-bitlab.
Contact
euv@uma.es
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

Stereo3D: using stereo images to enrich 3D visualization

Sat, 2020-05-16 02:00
Abstract
Summary
Visualization in 3D space is a standard but critical process for examining the complex structure of high-dimensional data. Stereoscopic imaging technology can be adopted to enhance 3D representation of many complex data, especially those consisting of points and lines. We illustrate the simple steps that are involved and strongly recommend others to implement it in designing visualization software. To facilitate its application, we created a new software that can convert a regular 3D scatterplot or network figure to a pair of stereo images.
Availability and implementation
Stereo3D is freely available as an open source R package released under an MIT license at https://github.com/bioinfoDZ/Stereo3D. Others can integrate the codes and implement the method in academic software.
Contact
deyou.zheng@einsteinmed.org
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

MetaviralSPAdes: assembly of viruses from metagenomic data

Fri, 2020-05-15 02:00
Abstract
Motivation
Although the set of currently known viruses has been steadily expanding, only a tiny fraction of the Earth’s virome has been sequenced so far. Shotgun metagenomic sequencing provides an opportunity to reveal novel viruses but faces the computational challenge of identifying viral genomes that are often difficult to detect in metagenomic assemblies.
Results
We describe a MetaviralSPAdes tool for identifying viral genomes in metagenomic assembly graphs that is based on analyzing variations in the coverage depth between viruses and bacterial chromosomes. We benchmarked MetaviralSPAdes on diverse metagenomic datasets, verified our predictions using a set of virus-specific Hidden Markov Models and demonstrated that it improves on the state-of-the-art viral identification pipelines.
Availability and implementation
MetaviralSPAdes includes ViralAssembly, ViralVerify and ViralComplete modules that are available as standalone packages: https://github.com/ablab/spades/tree/metaviral_publication, https://github.com/ablab/viralVerify/ and https://github.com/ablab/viralComplete/.
Contact
d.antipov@spbu.ru
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

SOMM4mC: a second-order Markov model for DNA N4-methylcytosine site prediction in six species

Fri, 2020-05-15 02:00
Abstract
Motivation
DNA N4-methylcytosine (4mC) modification is an important epigenetic modification in prokaryotic DNA due to its role in regulating DNA replication and protecting the host DNA against degradation. An efficient algorithm to identify 4mC sites is needed for downstream analyses.
Results
In this study, we propose a new prediction method named SOMM4mC based on a second-order Markov model, which makes use of the transition probability between adjacent nucleotides to identify 4mC sites. The results show that the first-order and second-order Markov model are superior to the three existing algorithms in all six species (Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, Escherichia coli, Geoalkalibacter subterruneus and Geobacter pickeringii) where benchmark datasets are available. However, the classification performance of SOMM4mC is more outstanding than that of first-order Markov model. Especially, for E.coli and C.elegans, the overall accuracy of SOMM4mC are 91.8% and 87.6%, which are 8.5% and 6.1% higher than those of the latest method 4mcPred-SVM, respectively. This shows that more discriminant sequence information is captured by SOMM4mC through the dependency between adjacent nucleotides.
Availability and implementation
The web server of SOMM4mC is freely accessible at www.insect-genome.com/SOMM4mC.
Contact
chenyuanyuan@njau.edu.cn or piancong@njau.edu.cn
Categories: Bioinformatics, Journals

BioStructures.jl: read, write and manipulate macromolecular structures in Julia

Thu, 2020-05-14 02:00
Abstract
Summary
Robust, flexible and fast software to read, write and manipulate macromolecular structures is a prerequisite for productively doing structural bioinformatics. We present BioStructures.jl, the first dedicated package in the Julia programming language for dealing with macromolecular structures and the Protein Data Bank. BioStructures.jl builds on the lessons learned with similar packages to provide a large feature set, a flexible object representation and high performance.
Availability and implementation
BioStructures.jl is freely available under the MIT license. Source code and documentation are available at https://github.com/BioJulia/BioStructures.jl. BioStructures.jl is compatible with Julia versions 0.6 and later and is system-independent.
Contact
j.greener@ucl.ac.uk
Categories: Bioinformatics, Journals

iBioProVis: interactive visualization and analysis of compound bioactivity space

Thu, 2020-05-14 02:00
Abstract
Summary
iBioProVis is an interactive tool for visual analysis of the compound bioactivity space in the context of target proteins, drugs and drug candidate compounds. iBioProVis tool takes target protein identifiers and, optionally, compound SMILES as input, and uses the state-of-the-art non-linear dimensionality reduction method t-Distributed Stochastic Neighbor Embedding (t-SNE) to plot the distribution of compounds embedded in a 2D map, based on the similarity of structural properties of compounds and in the context of compounds’ cognate targets. Similar compounds, which are embedded to proximate points on the 2D map, may bind the same or similar target proteins. Thus, iBioProVis can be used to easily observe the structural distribution of one or two target proteins’ known ligands on the 2D compound space, and to infer new binders to the same protein, or to infer new potential target(s) for a compound of interest, based on this distribution. Principal component analysis (PCA) projection of the input compounds is also provided, Hence the user can interactively observe the same compound or a group of selected compounds which is projected by both PCA and embedded by t-SNE. iBioProVis also provides detailed information about drugs and drug candidate compounds through cross-references to widely used and well-known databases, in the form of linked table views. Two use-case studies were demonstrated, one being on angiotensin-converting enzyme 2 (ACE2) protein which is Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Spike protein receptor. ACE2 binding compounds and seven antiviral drugs were closely embedded in which two of them have been under clinical trial for Coronavirus disease 19 (COVID-19).
Availability and implementation
iBioProVis and its carefully filtered dataset are available at https://ibpv.kansil.org/ for public use.
Contact
vatalay@metu.edu.tr
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

genozip: a fast and efficient compression tool for VCF files

Thu, 2020-05-14 02:00
Abstract
Motivation
genozip is a new lossless compression tool for Variant Call Format (VCF) files. By applying field-specific algorithms and fully utilizing the available computational hardware, genozip achieves the highest compression ratios amongst existing lossless compression tools known to the authors, at speeds comparable with the fastest multi-threaded compressors.
Availability and implementation
genozip is freely available to non-commercial users. It can be installed via conda-forge, Docker Hub, or downloaded from github.com/divonlan/genozip.
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

EasyVS: a user-friendly web-based tool for molecule library selection and structure-based virtual screening

Tue, 2020-05-12 02:00
Abstract
Summary
EasyVS is a web-based platform built to simplify molecule library selection and virtual screening. With an intuitive interface, the tool allows users to go from selecting a protein target with a known structure and tailoring a purchasable molecule library to performing and visualizing docking in a few clicks. Our system also allows users to filter screening libraries based on molecule properties, cluster molecules by similarity and personalize docking parameters.
Availability and implementation
EasyVS is freely available as an easy-to-use web interface at http://biosig.unimelb.edu.au/easyvs.
Contact
douglas.pires@unimelb.edu.au or david.ascher@unimelb.edu.au
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

ipcoal: an interactive Python package for simulating and analyzing genealogies and sequences on a species tree or network

Tue, 2020-05-12 02:00
Abstract
Summary
ipcoal is a free and open source Python package for simulating and analyzing genealogies and sequences. It automates the task of describing complex demographic models (e.g. with divergence times, effective population sizes, migration events) to the msprime coalescent simulator by parsing a user-supplied species tree or network. Genealogies, sequences and metadata are returned in tabular format allowing for easy downstream analyses. ipcoal includes phylogenetic inference tools to automate gene tree inference from simulated sequence data, and visualization tools for analyzing results and verifying model accuracy. The ipcoal package is a powerful tool for posterior predictive data analysis, for methods validation and for teaching coalescent methods in an interactive and visual environment.
Availability and implementation
Source code is available from the GitHub repository (https://github.com/pmckenz1/ipcoal/) and is distributed for packaged installation with conda. Complete documentation and interactive notebooks prepared for teaching purposes, including an empirical example, are available at https://ipcoal.readthedocs.io/.
Contact
p.mckenzie@columbia.edu
Categories: Bioinformatics, Journals

rScudo: an R package for classification of molecular profiles using rank-based signatures

Tue, 2020-05-12 02:00
Abstract
Summary
The classification of biological samples by means of their respective molecular profiles is a topic of great interest for its potential diagnostic, prognostic and investigational applications. rScudo is an R package for the classification of molecular profiles based on a radically new approach consisting in the analysis of the similarity of rank-based sample-specific signatures. The validity of rScudo unconventional approach has been validated through direct comparison with current methods in the international SBV IMPROVER Diagnostic Signature Challenge. Due to its novelty, there is ample room for conceptual improvements and for exploring additional applications. The rScudo package has been specifically designed to facilitate experimenting with the rank-based signature approach, to test its application to different types of molecular profiles and to simplify direct comparison with existing methods.
Availability and implementation
The package is available as part of the Bioconductor suite at https://bioconductor.org/packages/rScudo.
Categories: Bioinformatics, Journals