Bioinformatics

Syndicate content
Updated: 4 hours 3 min ago

JEPEGMIX2: improved gene-level joint analysis of eQTLs in cosmopolitan cohorts

Thu, 2017-09-14 02:00
Abstract
Motivation
To increase detection power, researchers use gene level analysis methods to aggregate weak marker signals. Due to gene expression controlling biological processes, researchers proposed aggregating signals for expression Quantitative Trait Loci (eQTL). Most gene-level eQTL methods make statistical inferences based on (i) summary statistics from genome-wide association studies (GWAS) and (ii) linkage disequilibrium patterns from a relevant reference panel. While most such tools assume homogeneous cohorts, our Gene-level Joint Analysis of functional SNPs in Cosmopolitan Cohorts (JEPEGMIX) method accommodates cosmopolitan cohorts by using heterogeneous panels. However, JEPGMIX relies on brain eQTLs from older gene expression studies and does not adjust for background enrichment in GWAS signals.
Results
We propose JEPEGMIX2, an extension of JEPEGMIX. When compared to JPEGMIX, it uses (i) cis-eQTL SNPs from the latest expression studies and (ii) brains specific (sub)tissues and tissues other than brain. JEPEGMIX2 also (i) avoids accumulating averagely enriched polygenic information by adjusting for background enrichment and (ii) to avoid an increase in false positive rates for studies with numerous highly enriched (above the background) genes, it outputs gene q-values based on Holm adjustment of P-values.
Availability and implementation
https://github.com/Chatzinakos/JEPEGMIX2.
Contact
chris.chatzinakos@vcuhealth.org
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

SNPDelScore: combining multiple methods to score deleterious effects of noncoding mutations in the human genome

Thu, 2017-09-14 02:00
Abstract
Summary
Addressing deleterious effects of noncoding mutations is an essential step towards the identification of disease-causal mutations of gene regulatory elements. Several methods for quantifying the deleteriousness of noncoding mutations using artificial intelligence, deep learning and other approaches have been recently proposed. Although the majority of the proposed methods have demonstrated excellent accuracy on different test sets, there is rarely a consensus. In addition, advanced statistical and artificial learning approaches used by these methods make it difficult porting these methods outside of the labs that have developed them. To address these challenges and to transform the methodological advances in predicting deleterious noncoding mutations into a practical resource available for the broader functional genomics and population genetics communities, we developed SNPDelScore, which uses a panel of proposed methods for quantifying deleterious effects of noncoding mutations to precompute and compare the deleteriousness scores of all common SNPs in the human genome in 44 cell lines. The panel of deleteriousness scores of a SNP computed using different methods is supplemented by functional information from the GWAS Catalog, libraries of transcription factor-binding sites, and genic characteristics of mutations. SNPDelScore comes with a genome browser capable of displaying and comparing large sets of SNPs in a genomic locus and rapidly identifying consensus SNPs with the highest deleteriousness scores making those prime candidates for phenotype-causal polymorphisms.
Availability and implementation
https://www.ncbi.nlm.nih.gov/research/snpdelscore/
Contact
ovcharen@nih.gov
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

SiNoPsis: Single Nucleotide Polymorphisms selection and promoter profiling

Thu, 2017-09-14 02:00
Abstract
Motivation
The selection of a single nucleotide polymorphism (SNP) using bibliographic methods can be a very time-consuming task. Moreover, a SNP selected in this way may not be easily visualized in its genomic context by a standard user hoping to correlate it with other valuable information. Here we propose a web form built on top of Circos that can assist SNP-centered screening, based on their location in the genome and the regulatory modules they can disrupt. Its use may allow researchers to prioritize SNPs in genotyping and disease studies.
Results
SiNoPsis is bundled as a web portal. It focuses on the different structures involved in the genomic expression of a gene, especially those found in the core promoter upstream region. These structures include transcription factor binding sites (for promoter and enhancer signals), histones and promoter flanking regions. Additionally, the tool provides eQTL and linkage disequilibrium (LD) properties for a given SNP query, yielding further clues about other indirectly associated SNPs. Possible disruptions of the aforementioned structures affecting gene transcription are reported using multiple resource databases. SiNoPsis has a simple user-friendly interface, which allows single queries by gene symbol, genomic coordinates, Ensembl gene identifiers, RefSeq transcript identifiers and SNPs. It is the only portal providing useful SNP selection based on regulatory modules and LD with functional variants in both textual and graphic modes (by properly defining the arguments and parameters needed to run Circos).
Availability and implementation
SiNoPsis is freely available at https://compgen.bio.ub.edu/SiNoPsis/
Contact
danielboloc@gmail.com
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

Ms2lda.org: web-based topic modelling for substructure discovery in mass spectrometry

Thu, 2017-09-14 02:00
Abstract
Motivation
We recently published MS2LDA, a method for the decomposition of sets of molecular fragment data derived from large metabolomics experiments. To make the method more widely available to the community, here we present ms2lda.org, a web application that allows users to upload their data, run MS2LDA analyses and explore the results through interactive visualizations.
Results
Ms2lda.org takes tandem mass spectrometry data in many standard formats and allows the user to infer the sets of fragment and neutral loss features that co-occur together (Mass2Motifs). As an alternative workflow, the user can also decompose a data set onto predefined Mass2Motifs. This is accomplished through the web interface or programmatically from our web service.
Availability and implementation
The website can be found at http://ms2lda.org, while the source code is available at https://github.com/sdrogers/ms2ldaviz under the MIT license.
Contact
simon.rogers@glasgow.ac.uk
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

CRISPR-RT: a web application for designing CRISPR-C2c2 crRNA with improved target specificity

Thu, 2017-09-14 02:00
Abstract
Summary
CRISPR-Cas systems have been successfully applied in genome editing. Recently, the CRISPR-C2c2 system has been reported as a tool for RNA editing. Here we describe CRISPR-RT (CRISPR RNA-Targeting), the first web application to help biologists design crRNAs with improved target specificity for the CRISPR-C2c2 system. CRISPR-RT allows users to set up a wide range of parameters, making it highly flexible for current and future research in CRISPR-based RNA editing. CRISPR-RT covers major model organisms and can be easily extended to cover other species. CRISPR-RT will empower researchers in RNA editing.
Availability and implementation
Freely available at http://bioinfolab.miamioh.edu/CRISPR-RT.
Contact
liangc@miamioh.edu
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

PySCeSToolbox: a collection of metabolic pathway analysis tools

Thu, 2017-09-14 02:00
Abstract
Summary
PySCeSToolbox is an extension to the Python Simulator for Cellular Systems (PySCeS) that includes tools for performing generalized supply–demand analysis, symbolic metabolic control analysis, and a framework for investigating the kinetic and thermodynamic aspects of enzyme-catalyzed reactions. Each tool addresses a different aspect of metabolic behaviour, control, and regulation; the tools complement each other and can be used in conjunction to better understand higher level system behaviour.
Availability and implementation
PySCeSToolbox is available on Linux, Mac OS X and Windows. It is licensed under the BSD 3-clause licence. Code, setup instructions and a link to documentation can be found at https://github.com/PySCeS/PyscesToolbox.
Contact
jr@sun.ac.za
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

Interactive network visualization in Jupyter notebooks: visJS2jupyter

Thu, 2017-09-14 02:00
Abstract
Motivation
Network biology is widely used to elucidate mechanisms of disease and biological processes. The ability to interact with biological networks is important for hypothesis generation and to give researchers an intuitive understanding of the data. We present visJS2jupyter, a tool designed to embed interactive networks in Jupyter notebooks to streamline network analysis and to promote reproducible research.
Results
The tool provides functions for performing and visualizing useful network operations in biology, including network overlap, network propagation around a focal set of genes, and co-localization of two sets of seed genes. visJS2jupyter uses the JavaScript library vis.js to create interactive networks displayed within Jupyter notebook cells with features including drag, click, hover, and zoom. We demonstrate the functionality of visJS2jupyter applied to a biological question, by creating a network propagation visualization to prioritize risk-related genes in autism.
Availability and implementation
The visJS2jupyter package is distributed under the MIT License. The source code, documentation and installation instructions are freely available on GitHub at https://github.com/ucsd-ccbb/visJS2jupyter. The package can be downloaded at https://pypi.python.org/pypi/visJS2jupyter.
Contact
sbrosenthal@ucsd.edu
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

Kfits: a software framework for fitting and cleaning outliers in kinetic measurements

Thu, 2017-09-14 02:00
Abstract
Motivation
Kinetic measurements have played an important role in elucidating biochemical and biophysical phenomena for over a century. While many tools for analysing kinetic measurements exist, most require low noise levels in the data, leaving outlier measurements to be cleaned manually. This is particularly true for protein misfolding and aggregation processes, which are extremely noisy and hence difficult to model. Understanding these processes is paramount, as they are associated with diverse physiological processes and disorders, most notably neurodegenerative diseases. Therefore, a better tool for analysing and cleaning protein aggregation traces is required.
Results
Here we introduce Kfits, an intuitive graphical tool for detecting and removing noise caused by outliers in protein aggregation kinetics data. Following its workflow allows the user to quickly and easily clean large quantities of data and receive kinetic parameters for assessment of the results. With minor adjustments, the software can be applied to any type of kinetic measurements, not restricted to protein aggregation.
Availability and implementation
Kfits is implemented in Python and available online at http://kfits.reichmannlab.com, in source at https://github.com/odedrim/kfits/, or by direct installation from PyPI (`pip install kfits`)
Contact
danare@mail.huji.ac.il
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC

Thu, 2017-09-14 02:00
Abstract
Motivation
Being responsible for initiating transaction of a particular gene in genome, promoter is a short region of DNA. Promoters have various types with different functions. Owing to their importance in biological process, it is highly desired to develop computational tools for timely identifying promoters and their types. Such a challenge has become particularly critical and urgent in facing the avalanche of DNA sequences discovered in the postgenomic age. Although some prediction methods were developed, they can only be used to discriminate a specific type of promoters from non-promoters. None of them has the ability to identify the types of promoters. This is due to the facts that different types of promoters may share quite similar consensus sequence pattern, and that the promoters of same type may have considerably different consensus sequences.
Results
To overcome such difficulty, using the multi-window-based PseKNC (pseudo K-tuple nucleotide composition) approach to incorporate the short-, middle-, and long-range sequence information, we have developed a two-layer seamless predictor named as ‘iPromoter-2 L’. The first layer serves to identify a query DNA sequence as a promoter or non-promoter, and the second layer to predict which of the following six types the identified promoter belongs to: σ24, σ28, σ32, σ38, σ54 and σ70.
Availability and implementation
For the convenience of most experimental scientists, a user-friendly and publicly accessible web-server for the powerful new predictor has been established at http://bioinformatics.hitsz.edu.cn/iPromoter-2L/. It is anticipated that iPromoter-2 L will become a very useful high throughput tool for genome analysis.
Contact
bliu@hit.edu.cn or dshuang@tongji.edu.cn or kcchou@gordonlifescience.org
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

LinkageMapView—rendering high-resolution linkage and QTL maps

Wed, 2017-09-13 02:00
Abstract
Motivation
Linkage and quantitative trait loci (QTL) maps are critical tools for the study of the genetic basis of complex traits. With the advances in sequencing technology over the past decade, linkage map densities have been increasing dramatically, while the visualization tools have not kept pace. LinkageMapView is a free add-on package written in R that produces high resolution, publication-ready visualizations of linkage and QTL maps. While there is software available to generate linkage map graphics, none are freely available, produce publication quality figures, are open source and can run on all platforms. LinkageMapView can be integrated into map building pipelines as it seamlessly incorporates output from R/qtl and also accepts simple text or comma delimited files. There are numerous options within the package to build highly customizable maps, allow for linkage group comparisons, and annotate QTL regions.
Availability and implementation
https://cran.r-project.org/web/packages/LinkageMapView/
Contact
louellet@uncc.edu
Categories: Bioinformatics, Journals

NavMol 3.0: enabling the representation of metabolic reactions by blind users

Wed, 2017-09-13 02:00
Abstract
Summary
The representation of metabolic reactions strongly relies on visualization, which is a major barrier for blind users. The NavMol software renders the communication and interpretation of molecular structures and reactions accessible by integrating chemoinformatics and assistive technology. NavMol 3.0 provides a molecular editor for metabolic reactions. The user can start with templates of reactions and build from such cores. Atom-to-atom mapping enables changes in the reactants to be reflected in the products (and vice-versa) and the reaction centres to be automatically identified. Blind users can easily interact with the software using the keyboard and text-to-speech technology.
Availability and implementation
NavMol 3.0 is free and open source under the GNU general public license (GPLv3), and can be downloaded at http://sourceforge.net/projects/navmol as a JAR file.
Contact
joao@airesdesousa.com
Categories: Bioinformatics, Journals

A utility maximizing and privacy preserving approach for protecting kinship in genomic databases

Tue, 2017-09-12 02:00
Abstract
Motivation
Rapid and low cost sequencing of genomes enabled widespread use of genomic data in research studies and personalized customer applications, where genomic data is shared in public databases. Although the identities of the participants are anonymized in these databases, sensitive information about individuals can still be inferred. One such information is kinship.
Results
We define two routes kinship privacy can leak and propose a technique to protect kinship privacy against these risks while maximizing the utility of shared data. The method involves systematic identification of minimal portions of genomic data to mask as new participants are added to the database. Choosing the proper positions to hide is cast as an optimization problem in which the number of positions to mask is minimized subject to privacy constraints that ensure the familial relationships are not revealed. We evaluate the proposed technique on real genomic data. Results indicate that concurrent sharing of data pertaining to a parent and an offspring results in high risks of kinship privacy, whereas the sharing data from further relatives together is often safer. We also show arrival order of family members have a high impact on the level of privacy risks and on the utility of sharing data.
Availability and implementation
https://github.com/tastanlab/Kinship-Privacy
Contact
erman@cs.bilkent.edu.tr or oznur.tastan@cs.bilkent.edu.tr
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

NetProphet 2.0: mapping transcription factor networks by exploiting scalable data resources

Tue, 2017-09-12 02:00
Abstract
Motivation
Cells process information, in part, through transcription factor (TF) networks, which control the rates at which individual genes produce their products. A TF network map is a graph that indicates which TFs bind and directly regulate each gene. Previous work has described network mapping algorithms that rely exclusively on gene expression data and ‘integrative’ algorithms that exploit a wide range of data sources including chromatin immunoprecipitation sequencing (ChIP-seq) of many TFs, genome-wide chromatin marks, and binding specificities for many TFs determined in vitro. However, such resources are available only for a few major model systems and cannot be easily replicated for new organisms or cell types.
Results
We present NetProphet 2.0, a ‘data light’ algorithm for TF network mapping, and show that it is more accurate at identifying direct targets of TFs than other, similarly data light algorithms. In particular, it improves on the accuracy of NetProphet 1.0, which used only gene expression data, by exploiting three principles. First, combining multiple approaches to network mapping from expression data can improve accuracy relative to the constituent approaches. Second, TFs with similar DNA binding domains bind similar sets of target genes. Third, even a noisy, preliminary network map can be used to infer DNA binding specificities from promoter sequences and these inferred specificities can be used to further improve the accuracy of the network map.
Availability and implementation
Source code and comprehensive documentation are freely available at https://github.com/yiming-kang/NetProphet_2.0.
Contact
brent@wustl.edu
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

Omics AnalySIs System for PRecision Oncology (OASISPRO): a web-based omics analysis tool for clinical phenotype prediction

Tue, 2017-09-12 02:00
Abstract
Summary
Precision oncology is an approach that accounts for individual differences to guide cancer management. Omics signatures have been shown to predict clinical traits for cancer patients. However, the vast amount of omics information poses an informatics challenge in systematically identifying patterns associated with health outcomes, and no general purpose data mining tool exists for physicians, medical researchers and citizen scientists without significant training in programming and bioinformatics. To bridge this gap, we built the Omics AnalySIs System for PRecision Oncology (OASISPRO), a web-based system to mine the quantitative omics information from The Cancer Genome Atlas (TCGA). This system effectively visualizes patients’ clinical profiles, executes machine-learning algorithms of choice on the omics data and evaluates the prediction performance using held-out test sets. With this tool, we successfully identified genes strongly associated with tumor stage, and accurately predicted patients’ survival outcomes in many cancer types, including adrenocortical carcinoma. By identifying the links between omics and clinical phenotypes, this system will facilitate omics studies on precision cancer medicine and contribute to establishing personalized cancer treatment plans.
Availability and implementation
This web-based tool is available at http://tinyurl.com/oasispro; source codes are available at http://tinyurl.com/oasisproSourceCode.
Contact
khyu@stanford.edu or mpsnyder@stanford.edu
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

biospear: an R package for biomarker selection in penalized Cox regression

Tue, 2017-09-12 02:00
Abstract
Summary
The R package biospear allows selecting the biomarkers with the strongest impact on survival and on the treatment effect in high-dimensional Cox models, and estimating expected survival probabilities. Most of the implemented approaches are based on penalized regression techniques.
Availability and implementation
The package is available on the CRAN. (https://CRAN.R-project.org/package=biospear)
Contact
stefan.michiels@gustaveroussy.fr
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

MAJIQ-SPEL: web-tool to interrogate classical and complex splicing variations from RNA-Seq data

Mon, 2017-09-11 02:00
Abstract
Summary
Analysis of RNA sequencing (RNA-Seq) data have highlighted the fact that most genes undergo alternative splicing (AS) and that these patterns are tightly regulated. Many of these events are complex, resulting in numerous possible isoforms that quickly become difficult to visualize, interpret and experimentally validate. To address these challenges we developed MAJIQ-SPEL, a web-tool that takes as input local splicing variations (LSVs) quantified from RNA-Seq data and provides users with visualization and quantification of gene isoforms associated with those. Importantly, MAJIQ-SPEL is able to handle both classical (binary) and complex, non-binary, splicing variations. Using a matching primer design algorithm it also suggests to users possible primers for experimental validation by RT-PCR and displays those, along with the matching protein domains affected by the LSV, on UCSC Genome Browser for further downstream analysis.
Availability and implementation
Program and code will be available at http://majiq.biociphers.org/majiq-spel.
Contact
yosephb@upenn.edu
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals

Enhanced guide-RNA design and targeting analysis for precise CRISPR genome editing of single and consortia of industrially relevant and non-model organisms

Fri, 2017-09-08 02:00
Abstract
Motivation
Genetic diversity of non-model organisms offers a repertoire of unique phenotypic features for exploration and cultivation for synthetic biology and metabolic engineering applications. To realize this enormous potential, it is critical to have an efficient genome editing tool for rapid strain engineering of these organisms to perform novel programmed functions.
Results
To accommodate the use of CRISPR/Cas systems for genome editing across organisms, we have developed a novel method, named CRISPR Associated Software for Pathway Engineering and Research (CASPER), for identifying on- and off-targets with enhanced predictability coupled with an analysis of non-unique (repeated) targets to assist in editing any organism with various endonucleases. Utilizing CASPER, we demonstrated a modest 2.4% and significant 30.2% improvement (F-test, P < 0.05) over the conventional methods for predicting on- and off-target activities, respectively. Further we used CASPER to develop novel applications in genome editing: multitargeting analysis (i.e. simultaneous multiple-site modification on a target genome with a sole guide-RNA requirement) and multispecies population analysis (i.e. guide-RNA design for genome editing across a consortium of organisms). Our analysis on a selection of industrially relevant organisms revealed a number of non-unique target sites associated with genes and transposable elements that can be used as potential sites for multitargeting. The analysis also identified shared and unshared targets that enable genome editing of single or multiple genomes in a consortium of interest. We envision CASPER as a useful platform to enhance the precise CRISPR genome editing for metabolic engineering and synthetic biology applications.
Availability and implementation
https://github.com/TrinhLab/CASPER.
Contact
ctrinh@utk.edu
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics, Journals