X-Seq Story: On Failure of Molecular Biology and Being Fat
Thursday, February 2, 2017 - 09:25

By: Ivan Lukic - Field Application Scientist, Partek Incorporated

I recently had dinner with an old friend, a scientist and former faculty member at a top university and now at a biotechnology entrepreneurship. On our way there he challenged me by verbalizing his disappointment in modern biology. In his opinion, there has been no significant discovery in biology since the 60s and the deciphering of the genetic code. While that viewpoint could be considered extreme, it made me wonder what the next epic breakthrough might be. I believe it will be the regulation of gene expression. However, we may need some specific tools, such as next generation sequencing (NGS), to make it happen.

While the technology of next-generation sequencing is over a decade old, it has not yet reached its full maturity, particularly in data analysis. One of the first NGS success stories was ChIP-Seq, a method used to analyze interactions between proteins and DNA. Soon, a number of related methods emerged, collectively referred to as X-Seq.

Which now brings me to obesity. There is a gene, the fat mass and obesity-associated gene (FTO), named after the fact that its homozygous variations were initially identified as the strongest genetic factor predisposed to non-monogenic obesity. Take a guess at its biological role. Is it a) an antagonist of insulin, suppressing the uptake of lipids by fat cells consequently leading to deposits of lipids in other tissues, b) a modifier of leptin receptors (leptin is a protein hormone that acts on the brain to decrease food intake), or c) a simple lipase responsible for the quick breakdown of fat in the gut and fast absorption of lipid molecules?

It turns out that FTO is an enzyme (those of you who thought of a lipase get half a credit), a DNA and RNA demethylase that catalyzes oxidative demethylation of nucleotides. So, what does this have to do with obesity? N6-methyladenosine (m6A) is a common modification of RNA and serves as another mechanism of post-translational regulation of gene expression. It has been detected in the mRNAs of over 7,000 genes and in over 300 non-coding RNAs (Meyer KD et al., 2012). In other words, by controlling the m6A methylation status of RNA, the FTO regulates gene expression. Given that homozygous variations in the human FTO gene are present in one billion people, and linked to several clinical conditions in addition to obesity (e.g. attention deficit and alcohol dependence), the growing interest in FTO and m6A in general is hardly a surprise.

The problem is, or was until NGS came along, the lack of technology to study m6A modifications. m6A exhibits the same base pairing preference as unmodified adenosine, so it cannot be simply detected by Sanger-sequencing or hybridisation assays. Furthermore, bisulfite treatment, which worked fine for methylated cytosine (5mC), does not work for m6A. There are some radiolabeling techniques available, but NGS provides an elegant, radiation-free alternative: methylated RNA immunoprecipitation (MeRIP-Seq).

This assay relies on an antibody targeting m6A. After immunoprecipitation of these RNAs, they are then retro-transcribed into cDNA and fragmented (either before or after retrotranscription). Fragments of suitable size are then selected (roughly 200 bases) and subjected to NGS from one (single-end) or both directions (paired-end). Using the same analysis principles as in ChIP-Seq, the short sequencing reads (typical size is 50 - 100 bases) are then aligned to the reference genome and a bioinformatic algorithm, such as Model-based Analysis for ChIP-Seq 2 (MACS2), is applied to identify regions of the genome that harbor a number of reads (“enriched” regions).

An excellent illustration of MeRIP-Seq is a study by Jens Bruening and his colleagues published in Nature Neuroscience (Hess ME et al., 2013). I downloaded the raw data from GEO (GSE47217) and used Partek® Flow® to analyze it. Figure 1 shows the sequencing peaks within the dopamine receptor D3 (Drd3) gene, in Fto-deficient mice (blue) and the control mice (red). Elevated peaks in the Fto-deficient animals are a consequence of the lack of Fto’s activity and increased number of m6A sites in the Drd3 mRNA.

Partek Flow chromosome viewer’s display of MeRIP-seq reads

Figure 1: Partek Flow chromosome viewer’s display of MeRIP-seq reads in the mRNA of dopamine receptor D3 (Drd3) gene in a mouse without fat mass and obesity associated gene (Fto) (blue) and a control mouse (red)


 

Browser views, such as this one, are almost a must for any paper or presentation including NGS data. They are, however, inevitably limited by screen resolution so what may seem as a high peak at low-power magnification, might just be an artifact of binning and disappointingly decompose in a series of small peaks at high-power zoom.

Pro tip: if you want to have a bird’s eye overview of an entire chromosome and show enriched regions without the resolution bias, use a plot that’s more informative; a Hilbert curve (Figure 2). A Hilbert curve is a continuous fractal space-filling curve, characterized by good preservation of locality. Although ubiquitous browser views are a straightforward way of plotting sequencing peaks, they work well only for short sections of the genome (e.g. several hundred bases). As soon as the number of bases exceeds the screen resolution, some binning needs to be applied (i.e. bases need to be merged), leading to data distortion and bias. Hilbert curves do not represent the genome linearly, but in a fractalized, convoluted way, thus enabling to easily fit an entire chromosome in a relatively confined space. Consequently, no binning is needed (as all bases can be plotted) and sequencing peaks are shown with the help of a heat map, with blue meaning no reads and red meaning a tall stack of peaks. Analyzing figure 2, all the heat radiating from the top panel is due to higher sequencing peaks in Fto-deficient mice, which is consistent with lack of demethylation.

Partek Hilbert curve visualisation of MeRIP-seq reads

Figure 2: Partek Hilbert curve visualisation of MeRIP-seq reads aligning to an entire chromosome of a mouse without fat mass and obesity associated gene (Fto-deficient) and a control mouse. Stacks of sequencing peaks are depicted by color gradient (blue: low, red: high), thus avoiding screen resolution bias inherent to browser views. The track at the bottom represents mouse chromosome 2 with the highlighted (red) cytoband framed in black in the curve tracks above.



Since m6A modification is a regulator of gene expression, if you have both MeRIP-Seq data and matching mRNA-Seq data from your study, you can easily integrate them with Partek tools and directly explore biological consequences of mRNA methylation.

But even if you only have MeRIP-Seq data, you can extend your analysis further by annotating the enriched regions by the genes nearest to them. Grouping these genes lets you identify and then visualize the pathways that are affected by RNA methylation.

In their manuscript, Hess and his colleagues revealed increased adenosine methylation in a subset of mRNAs important for neuronal signaling (including many involved in dopaminergic signaling, like Drd3) so I chose to show you the KEGG Neuroactive Ligand-Receptor Signaling pathway in Figure 3. Without going into details, you can immediately tell a pattern by looking at the number of m6A methylated mRNAs (yellow boxes) in Fto-deficient mice, compared to the control mice.

Partek Pathway visualization of the KEGG Neuroactive Ligand-Receptor Signaling Pathway

Figure 3: Partek Pathway visualization of the KEGG Neuroactive Ligand-Receptor Signaling Pathway. Each gene in the pathway is a box. Yellow boxes are genes overlapping methylation peaks in a MeRIP-seq experiment on a mouse without fat mass and obesity associated gene (Fto-deficient) and a control mouse. Black boxes are genes in the pathway, but not overlapping the sequencing peaks.



Interestingly, there is an alternative strategy for analysis of MeRIP-Seq data. As the sequencing peaks align to coding regions of the genome, MeRIP-Seq data can be handled as bona fide RNA-Seq data, another application supported by Partek Flow. For more on this topic I suggest a recent review by Zambelli and Pavesi.

Coming back to the discussion I had with my friend, I assert great deeds require great tools (Myron did need that chisel, didn’t he?) and that we are witnessing the moment of their development. Although I am a firm believer in technology, let’s not forget that some discoveries can be made by common sense alone. For instance, if I told you that I am fat and have the attention span of a fruit fly, would you really need MeRIP-Seq to tell me about my FTO activity?