It has been suggested that the mammalian genome is composed mainly of long compositionally homogeneous domains. Such domains are frequently identified using recursive segmentation algorithms based on the Jensen–Shannon divergence. However, a common difficulty with such methods is deciding when to halt the recursive partitioning and what criteria to use in deciding whether a detected boundary between two segments is real or not. We demonstrate that commonly used halting criteria are intrinsically biased, and propose IsoPlotter, a parameter-free segmentation algorithm that overcomes such biases by using a simple dynamic halting criterion and tests the homogeneity of the inferred domains. IsoPlotter was compared with an alternative segmentation algorithm, DJS, using two sets of simulated genomic sequences. Our results show that IsoPlotter was able to infer both long and short compositionally homogeneous domains with low GC content dispersion, whereas DJS failed to identify short compositionally homogeneous domains and sequences with low compositional dispersion. By segmenting the human genome with IsoPlotter, we found that one-third of the genome is composed of compositionally nonhomogeneous domains and the remaining is a mixture of many short compositionally homogeneous domains and relatively few long ones.
A novel template design for single-molecule sequencing is introduced, a structure we refer to as a SMRTbellTM template. This structure consists of a double-stranded portion, containing the insert of interest, and a single-stranded hairpin loop on either end, which provides a site for primer binding. Structurally, this format resembles a linear double-stranded molecule, and yet it is topologically circular. When placed into a single-molecule sequencing reaction, the SMRTbell template format enables a consensus sequence to be obtained from multiple passes on a single molecule. Furthermore, this consensus sequence is obtained from both the sense and antisense strands of the insert region. In this article, we present a universal method for constructing these templates, as well as an application of their use. We demonstrate the generation of high-quality consensus accuracy from single molecules, as well as the use of SMRTbell templates in the identification of rare sequence variants.
While it has been established that microRNAs (miRNAs) play key roles throughout development and are dysregulated in many human pathologies, the specific processes and pathways regulated by individual miRNAs are mostly unknown. Here, we use computational target predictions in order to automatically infer the processes affected by human miRNAs. Our approach improves upon standard statistical tools by addressing specific characteristics of miRNA regulation. Our analysis is based on a novel compendium of experimentally verified miRNA-pathway and miRNA-process associations that we constructed, which can be a useful resource by itself. Our method also predicts novel miRNA-regulated pathways, refines the annotation of miRNAs for which only crude functions are known, and assigns differential functions to miRNAs with closely related sequences. Applying our approach to groups of co-expressed genes allows us to identify miRNAs and genomic miRNA clusters with functional importance in specific stages of early human development. A full list of the predicted mRNA functions is available at http://acgt.cs.tau.ac.il/fame/.
We previously demonstrated high-frequency, targeted DNA addition mediated by the homology-directed DNA repair pathway. This method uses a zinc-finger nuclease (ZFN) to create a site-specific double-strand break (DSB) that facilitates copying of genetic information into the chromosome from an exogenous donor molecule. Such donors typically contain two ~750 bp regions of chromosomal sequence required for homology-directed DNA repair. Here, we demonstrate that easily-generated linear donors with extremely short (50 bp) homology regions drive transgene integration into 5–10% of chromosomes. Moreover, we measure the overhangs produced by ZFN cleavage and find that oligonucleotide donors with single-stranded 5' overhangs complementary to those made by ZFNs are efficiently ligated in vivo to the DSB. Greater than 10% of all chromosomes directly incorporate this exogenous DNA via a process that is dependent upon and guided by complementary 5' overhangs on the donor DNA. Finally, we extend this non-homologous end-joining (NHEJ)-based technique by directly inserting donor DNA comprising recombinase sites into large deletions created by the simultaneous action of two separate ZFN pairs. Up to 50% of deletions contained a donor insertion. Targeted DNA addition via NHEJ complements our homology-directed targeted integration approaches, adding versatility to the manipulation of mammalian genomes.
As the field of synthetic biology expands, strategies and tools for the rapid construction of new biochemical pathways will become increasingly valuable. Purely rational design of complex biological pathways is inherently limited by the current state of our knowledge. Selection of optimal arrangements of genetic elements from randomized libraries may well be a useful approach for successful engineering. Here, we propose the construction and optimization of metabolic pathways using the inherent gene shuffling activity of a natural bacterial site-specific recombination system, the integron. As a proof of principle, we constructed and optimized a functional tryptophan biosynthetic operon in Escherichia coli. The trpA-E genes along with ‘regulatory’ elements were delivered as individual recombination cassettes in a synthetic integron platform. Integrase-mediated recombination generated thousands of genetic combinations overnight. We were able to isolate a large number of arrangements displaying varying fitness and tryptophan production capacities. Several assemblages required as many as six recombination events and produced as much as 11-fold more tryptophan than the natural gene order in the same context.
We developed a powerful expression system to produce aptamers and other types of functional RNA in yeast to examine their effects. Utilizing the intron homing process, the aptamer-coding sequences were integrated into hundreds of rRNA genes, and the aptamers were transcribed at high levels by RNA polymerase I without any additional promoter being introduced into the cell. We used this system to express an aptamer against the heat shock factor 1 (HSF1), a conserved transcription factor responsible for mobilizing specific genomic expression programs in response to stressful conditions such as elevated temperature. We observed a temperature sensitive growth retardation phenotype and specific decrease of heat shock gene expression. As HSF1 enables and promotes malignant growth and metastasis in mammals, and this aptamer binds yeast HSF1 and its mammalian ortholog with equal affinity, the results presented here attest to the potential of this aptamer as a specific and effective inhibitor of HSF1 activity.
High-throughput sequencing techniques are becoming attractive to molecular biologists and ecologists as they provide a time- and cost-effective way to explore diversity patterns in environmental samples at an unprecedented resolution. An issue common to many studies is the definition of what fractions of a data set should be considered as rare or dominant. Yet this question has neither been satisfactorily addressed, nor is the impact of such definition on data set structure and interpretation been fully evaluated. Here we propose a strategy, MultiCoLA (Multivariate Cutoff Level Analysis), to systematically assess the impact of various abundance or rarity cutoff levels on the resulting data set structure and on the consistency of the further ecological interpretation. We applied MultiCoLA to a 454 massively parallel tag sequencing data set of V6 ribosomal sequences from marine microbes in temperate coastal sands. Consistent ecological patterns were maintained after removing up to 35–40% rare sequences and similar patterns of beta diversity were observed after denoising the data set by using a preclustering algorithm of 454 flowgrams. This example validates the importance of exploring the impact of the definition of rarity in large community data sets. Future applications can be foreseen for data sets from different types of habitats, e.g. other marine environments, soil and human microbiota.
We herein report the design of a dumbbell-shaped DNA probe that integrates target-binding, amplification and signaling within one multifunctional design. The dumbbell probe can initiate rolling circle amplification (D-RCA) in the presence of specific microRNA (miRNA) targets. This D-RCA-based miRNA strategy allows quantification of miRNA with very low quantity of RNA samples. The femtomolar sensitivity of D-RCA compares favorably with other existing technologies. More significantly, the dynamic range of D-RCA is extremely large, covering eight orders of magnitude. We also demonstrate miRNA quantification with this highly sensitive and inexpensive D-RCA strategy in clinical samples.
The availability of high resolution array comparative genomic hybridization (CGH) platforms has led to increasing complexities in data analysis. Specifically, defining contiguous regions of alterations or segmentation can be computationally intensive and popular algorithms can take hours to days for the processing of arrays comprised of hundreds of thousands to millions of elements. Additionally, tumors tend to demonstrate subtle copy number alterations due to heterogeneity, ploidy and hybridization effects. Thus, there is a need for fast, sensitive array CGH segmentation and alteration calling algorithms. Here, we describe Fast Algorithm for Calling After Detection of Edges (FACADE), a highly sensitive and easy to use algorithm designed to rapidly segment and call high resolution array data.
The mutagenic threat of hydrolytic DNA cytosine deamination is met mostly by uracil DNA glycosylases (UDG) initiating base excision repair. However, several sequenced genomes of archaeal organisms are devoid of genes coding for homologues of the otherwise ubiquitous UDG superfamily of proteins. Previously, two possible solutions to this problem were offered by (i) a report of a newly discovered family of uracil DNA glycosylases exemplified by MJ1434, a protein found in the hyperthermophilic archaeon Methanocaldococcus jannaschii, and (ii) the description of TTC0482, an EndoIV homologue from the hyperthermophilic bacterium Thermus thermophilus HB27, as being able to excise uracil from DNA. Sequence homologues of both proteins can be found throughout the archaeal domain of life. Three proteins orthologous to MJ1434 and the family founder itself were tested for but failed to exhibit DNA uracil glycosylase activity when produced in an Ung-deficient Escherichia coli host. Likewise, no DNA uracil processing activity could be detected to be associated with TTC0482, while the protein was fully active as an AP endonuclease. We propose that the uracil processing activities formerly found were due to contaminations with Ung enzyme. Use of ung-strains as hosts for production of putatively DNA-U processing enzymes provides a simple safeguard.
The vascular endothelial growth factor receptor, Flt1 is a transmembrane receptor co-expressed with an alternate transcript encoding a secreted form, sFlt1, that functions as a competitive inhibitor of Flt1. Despite shared transcription start sites and upstream regulatory elements, sFlt1 is in far greater excess of Flt1 in the human placenta. Phorbol myristic acid and dimethyloxalylglycine differentially stimulate sFlt1 compared to Flt1 expression in vascular endothelial cells and in cytotrophoblasts. An FLT1 minigene construct containing exon 13, 14 and the intervening region, recapitulates mRNA processing when transfected into COS-7, with chimeric intronic sFlt1 transcripts arising by intronic polyadenylation and other Flt1/sFlt1 transcripts by alternate splicing. Inclusion of exon 15 but not 14 had a modest stimulatory effect on the abundance of sFlt1. The intronic region containing the distal poly(A) signal sequences, when transferred to a heterologous minigene construct, inhibited splicing but only when cloned in sense orientation, consistent with the presence of a directional cis-element. Serial deletional and targeted mutational analysis of cis-elements within intron 13 identified intronic poly(A) signal sequences and adjacent cis-elements as the principal determinants of the relative ratio of intronic sFlt1 and spliced Flt1. We conclude that intronic signals reciprocally regulate splicing and polyadenylation and control sFlt1 expression.
Recent studies showed that small interfering RNAs (siRNAs) and Piwi-interacting RNA (piRNA) in mammalian germ cells play important roles in retrotransposon silencing and gametogenesis. However, subsequent contribution of those small RNAs to early mammalian development remains poorly understood. We investigated the expression profiles of small RNAs in mouse metaphase II oocytes, 8–16-cell stage embryos, blastocysts and the pluripotent inner cell mass (ICM) using high-throughput pyrosequencing. Here, we show that during pre-implantation development a major small RNA class changes from retrotransposon-derived small RNAs containing siRNAs and piRNAs to zygotically synthesized microRNAs (miRNAs). Some siRNAs and piRNAs are transiently upregulated and directed against specific retrotransposon classes. We also identified miRNAs expression profiles characteristic of the ICM and trophectoderm (TE) cells. Taken together, our current study reveals a major reprogramming of functional small RNAs during early mouse development from oocyte to blastocyst.
Despite the critical role of pre-mRNA splicing in generating proteomic diversity and regulating gene expression, the sequence composition and function of intronic splicing regulatory elements (ISREs) have not been well elucidated. Here, we employed a high-throughput in vivo Screening PLatform for Intronic Control Elements (SPLICE) to identify 125 unique ISRE sequences from a random nucleotide library in human cells. Bioinformatic analyses reveal consensus motifs that resemble splicing regulatory elements and binding sites for characterized splicing factors and that are enriched in the introns of naturally occurring spliced genes, supporting their biological relevance. In vivo characterization, including an RNAi silencing study, demonstrate that ISRE sequences can exhibit combinatorial regulatory activity and that multiple trans-acting factors are involved in the regulatory effect of a single ISRE. Our work provides an initial examination into the sequence characteristics and function of ISREs, providing an important contribution to the splicing code.
RNA exosomes are large multisubunit assemblies involved in controlled RNA processing. The archaeal exosome possesses a heterohexameric processing chamber with three RNase-PH-like active sites, capped by Rrp4- or Csl4-type subunits containing RNA-binding domains. RNA degradation by RNA exosomes has not been studied in a quantitative manner because of the complex kinetics involved, and exosome features contributing to efficient RNA degradation remain unclear. Here we derive a quantitative kinetic model for degradation of a model substrate by the archaeal exosome. Markov Chain Monte Carlo methods for parameter estimation allow for the comparison of reaction kinetics between different exosome variants and substrates. We show that long substrates are degraded in a processive and short RNA in a more distributive manner and that the cap proteins influence degradation speed. Our results, supported by small angle X-ray scattering, suggest that the Rrp4-type cap efficiently recruits RNA but prevents fast RNA degradation of longer RNAs by molecular friction, likely by RNA contacts to its unique KH-domain. We also show that formation of the RNase-PH like ring with entrapped RNA is not required for high catalytic efficiency, suggesting that the exosome chamber evolved for controlled processivity, rather than for catalytic chemistry in RNA decay.
Ribosome synthesis involves the concomitance of pre-rRNA processing and ribosomal protein assembly. In eukaryotes, this is a complex process that requires the participation of specific sequences and structures within the pre-rRNAs, at least 200 trans-acting factors and the ribosomal proteins. There is little information on the function of individual 60S ribosomal proteins in ribosome synthesis. Herein, we have analysed the contribution of ribosomal protein L35 in ribosome biogenesis. In vivo depletion of L35 results in a deficit in 60S ribosomal subunits and the appearance of half-mer polysomes. Pulse-chase, northern hybridization and primer extension analyses show that processing of the 27SB to 7S pre-rRNAs is strongly delayed upon L35 depletion. Most likely as a consequence of this, release of pre-60S ribosomal particles from the nucleolus to the nucleoplasm is also blocked. Deletion of RPL35A leads to similar although less pronounced phenotypes. Moreover, we show that L35 assembles in the nucleolus and binds to early pre-60S ribosomal particles. Finally, flow cytometry analysis indicated that L35-depleted cells mildly delay the G1 phase of the cell cycle. We conclude that L35 assembly is a prerequisite for the efficient cleavage of the internal transcribed spacer 2 at site C2.
KSRP is a multi-domain RNA-binding protein that recruits the exosome-containing mRNA degradation complex to mRNAs coding for cellular proliferation and inflammatory response factors. The selectivity of this mRNA degradation mechanism relies on KSRP recognition of AU-rich elements in the mRNA 3'UTR, that is mediated by KSRP’s KH domains. Our structural analysis shows that the inter-domain linker orients the two central KH domains of KSRP—and their RNA-binding surfaces—creating a two-domain unit. We also show that this inter-domain arrangement is important to the interaction with KSRP’s RNA targets.
The lateral stalk of ribosome is responsible for kingdom-specific binding of translation factors and activation of GTP hydrolysis that drives protein synthesis. In eukaryotes, the stalk is composed of acidic ribosomal proteins P0, P1 and P2 that constitute a pentameric P-complex in 1: 2: 2 ratio. We have determined the solution structure of the N-terminal dimerization domain of human P2 (NTD-P2), which provides insights into the structural organization of the eukaryotic stalk. Our structure revealed that eukaryotic stalk protein P2 forms a symmetric homodimer in solution, and is structurally distinct from the bacterial counterpart L12 homodimer. The two subunits of NTD-P2 form extensive hydrophobic interactions in the dimeric interface that buries 2400 Å2 of solvent accessible surface area. We have showed that P1 can dissociate P2 homodimer spontaneously to form a more stable P1/P2 1 : 1 heterodimer. By homology modelling, we identified three exposed polar residues on helix-3 of P2 are substituted by conserved hydrophobic residues in P1. Confirmed by mutagenesis, we showed that these residues on helix-3 of P1 are not involved in the dimerization of P1/P2, but instead play a vital role in anchoring P1/P2 heterodimer to P0. Based on our results, models of the eukaryotic stalk complex were proposed.
Electrospray mass spectrometry was used to investigate the mechanism of tetramolecular G-quadruplex formation by the DNA oligonucleotide dTG5T, in ammonium acetate. The intermediates and products were separated according to their mass (number of strands and inner cations) and quantified. The study of the temporal evolution of each species allows us to propose the following formation mechanism. (i) Monomers, dimers and trimers are present at equilibrium already in the absence of ammonium acetate. (ii) The addition of cations promotes the formation of tetramers and pentamers that incorporate ammonium ions and therefore presumably have stacked guanine quartets in their structure. (iii) The pentamers eventually disappear and tetramers become predominant. However, these tetramers do not have their four strands perfectly aligned to give five G-quartets: the structures contain one ammonium ion too few, and ion mobility spectrometry shows that their conformation is more extended. (iv) At 4°C, the rearrangement of the kinetically trapped tetramers with presumably slipped strand(s) into the perfect G-quadruplex structure is extremely slow (not complete after 4 months). We also show that the addition of methanol to the monomer solution significantly accelerates the cation-induced G-quadruplex assembly.
Direct targeting of critical DNA-binding elements of a repressor by its cognate antirepressor is an effective means to sequester the repressor and remove a transcription initiation block. Structural descriptions for this, though often proposed for bacterial and phage repressor–antirepressor systems, are unavailable. Here, we describe the structural and functional basis of how the Myxococcus xanthus CarS antirepressor recognizes and neutralizes its cognate repressors to turn on a photo-inducible promoter. CarA and CarH repress the carB operon in the dark. CarS, produced in the light, physically interacts with the MerR-type winged-helix DNA-binding domain of these repressors leading to activation of carB. The NMR structure of CarS1, a functional CarS variant, reveals a five-stranded, antiparallel β-sheet fold resembling SH3 domains, protein–protein interaction modules prevalent in eukaryotes but rare in prokaryotes. NMR studies and analysis of site-directed mutants in vivo and in vitro unveil a solvent-exposed hydrophobic pocket lined by acidic residues in CarS, where the CarA DNA recognition helix docks with high affinity in an atypical ligand-recognition mode for SH3 domains. Our findings uncover an unprecedented use of the SH3 domain-like fold for protein–protein recognition whereby an antirepressor mimics operator DNA in sequestering the repressor DNA recognition helix to activate transcription.
Antigene RNAs (agRNAs) are small RNA duplexes that target non-coding transcripts rather than mRNA and specifically suppress or activate gene expression in a sequence-dependent manner. For many applications in vivo, it is likely that agRNAs will require chemical modification. We have synthesized agRNAs that contain different classes of chemical modification and have tested their ability to modulate expression of the human progesterone receptor gene. We find that both silencing and activating agRNAs can retain activity after modification. Both guide and passenger strands can be modified and functional agRNAs can contain 2'F-RNA, 2'OMe-RNA, and locked nucleic acid substitutions, or combinations of multiple modifications. The mechanism of agRNA activity appears to be maintained after chemical modification: both native and modified agRNAs modulate recruitment of RNA polymerase II, have the same effect on promoter-derived antisense transcripts, and must be double-stranded. These data demonstrate that agRNA activity is compatible with a wide range of chemical modifications and may facilitate in vivo applications.