Researcher identified a novel cancer prognostic marker using 6,260 small non-coding RNA samples
Friday, April 28, 2017 - 12:13
Victor Martinez
Victor Martinez, Ph.D.


We recently caught up with researcher Victor Martinez, Ph.D.
of BC Cancer Agency to discuss his work with piRNAs using
Partek Flow software, which allowed him to efficiently analyze
one of the worlds biggest cohorts of sncRNAs.

Customer Interview by Cherry Ignacio - Field Application Scientist, Partek Incorporated
Paraphrased for clarity

What is the focus of your work and the research you’re trying to do?
Basically, we’re investigating the role of small non-coding RNA (sncRNAs) in the development of different tumor types. Usually, people look for microRNAs, but there are a number of other classes of sncRNAs that haven’t been studied as deeply as microRNAs. We are focused on a specific class of RNA called PIWI-interacting RNAs, or piRNAs. When we started to design this study we realized that all different species of small non-coding RNAs should be present in RNA-Seq data. So we decided to search for data available in the public domain and see if we were able to find other piRNAs and other classes of sncRNAs. That’s how we started to explore The Cancer Genome Atlas (TCGA) project. It is one of the largest sequencing data repositories in the world with data for multiple tumor types. That’s also where Partek comes into the picture.
At first, we were thinking to do a small project using a few hundred samples from TCGA and our own cohort, which was around 200 samples. At that point, I was just learning how to code and use terminal-based tools, so it was a major challenge. Then during our brainstorming sessions, we asked ourselves if can we do this bigger. Bigger meaning, can we analyze different tissue types and thousands of samples? We decided to take that idea and use all the samples available from the TCGA cohort at that time (early 2015). We had to apply to gain access to the raw data and use in this project. Then we started testing tools suitable for the analysis of this number of samples. At that time, we had a new server for analysis. I remember it was around January 2015 when I first got a test version of Partek Flow software. The learning curve was fast; it was pretty intuitive to work with. We tested other platforms too, but Partek Flow gave us the possibility to customize a lot. For example, we could customize the annotation files containing the chromosomal locations that we needed for piRNAs. This was key to quantify the samples against those locations.
Using Partek Flow, we were able to reprocess everything from the actual fastq files, filter by read quality, remap to the human genome, and select by specific size within the small reads, among other things. After we generated almost 7,000 piRNA transcriptomes and characterized their expression patterns in different human tissues, the main goal was to unlock its clinical potential. piRNAs, as any other small RNA, are usually stable in body fluids making them very good candidates for biomarkers. They can be found not only in biopsies but also in blood, saliva, and any tissue that can be used for clinical tests. We started collecting and merging all the expression data we had with the clinical data available from TCGA and other projects, exploring if piRNAs are predictors of patient survival, overall survival, or disease-free survival. Currently we are trying to refine those findings.
Before using Partek Flow, were you having problems finding software that would take annotations from piRNAs?
Yes. We were clear that piRNAs were one of the targets for this analysis. The other platforms we tested could do mostly the same, but only with known microRNAs. In this case we were trying to collect information from different, uncharacterized species. We needed to customize chromosomal coordinates (extracted and compiled from the public domain) for piRNAs and quantify them.
What made Partek Flow a good fit for you?
First, I’m not a bioinformatician. I like to perform computational analyses, but I am more interested in the biological meaning of the analysis. So one of the things I like is an intuitive interface. With Partek Flow you can experiment with different things within your data. The contextual menu that appears when you click on the nodes suggests different tasks that you could perform on that data node. The second thing is the customization. You can test different parameters and see an explanation of what a parameter is doing on a specific test. Third, it allows you to schedule and perform multiple tasks so that things will happen even when you’re not in front of the computer. It’s for those reasons we chose Partek Flow. When we started with Partek Flow, we were like; “this is cool, we can work with this. It will probably take some time, but it will be ok to analyze these 7,000 samples.”
Did you say you analyzed over 7,000 samples using Partek Flow?
Yes, yes. If you look at your records from two years ago, you’ll see I was calling Partek every week. We were just starting, so we had problems…a lot of problems. Usually, the source of the problem was the project library, because it was so huge. Every time I called Partek, you guys would ask me why I was processing such a large amount of data. But overall, it was able to handle those 7,000 samples. I was learning the computational basis of the analysis and testing a lot of things in a pilot period before running the whole set of samples, which we could not do previously. We were having a lot of fun, and yes, it worked which is the most important part.
Moving forward, you mentioned clinical applications, can you tell us what you’re doing now?
We are working on specific tissue types such as lung cancer, focusing on the most relevant clinical questions. We are validating our findings in different cohorts (including ours), so it’s not just TCGA. We are currently searching for markers that will allow us to identify which tumors are more aggressive and hopefully prioritize treatment for specific patients based on the characteristics of their tumors. In non-small cell lung cancer, nodules can be identified by a CT scan. However, the main challenge is to identify which nodules will turn into aggressive tumors. If we can find molecular patterns or signatures based on a small amount of RNA from the biopsy, other tissues, or blood, that could give us an indication if that nodule will actually progress to an aggressive tumor, allowing the patient to be treated or monitored accordingly. That is the main area we are focusing on right now. We have an advantage on lung cancer because we have our own cohort of well-characterized samples and access to more samples from collaborators. We also have projects in head and neck, prostate, gastric, and liver cancer, where we are trying to address specific clinical questions.
Are you planning to use Partek Flow in this project?
Yes. Actually, in addition to the known core functions of Partek Flow, we are actively exploring new features, such as user added tasks, so we can integrate our own scripts into the pipeline. We aim to identify all classes of small non-coding RNAs, so we can decipher the role of the whole RNA class in cancer development. Now, we’re using different publically available algorithms to generate some of the results. All the post-processing and analysis, including normalization and differential expression analysis, is performed in Partek Flow. The goal is to put everything in a single pipeline powered by Partek Flow.
If you had to pick just one thing, what do you like best about Partek Flow?
One of the most powerful characteristics of Partek Flow is the high degree of customization you’re allowed. If you are studying things that are already well-established, there are a lot of platforms where you can process your data. We are interested in exploring things that people haven’t yet given much attention. We need a lot of freedom to tweak things like the annotation file or the standard mapping algorithm for the quantification. Partek Flow lets us do that. That’s what I like the most in comparison to other platforms. The user interface is nice, but to be honest, all the platforms have worked on that. However, they don’t have the degree of customization we were looking for, so that’s why we have decided to stay with Partek Flow for the last few years. And of course, I also need to mention the customer support. If I ever have a problem I know I can contact your tech support and have, on the same day, a phone call or web meeting and the problem will be solved. I have sometimes been on the phone for two hours with Partek support, but the problem always gets solved. That’s a huge plus.
Is there anything else you would like to add?
Basically I’d like to say, if you aren’t a bioinformatics lab don’t be afraid to take on this type of challenge (e.g. analyze 10,000 samples), because you can do it. You can keep the focus on the biology and let Partek Flow perform most of your bioinformatics tasks. Believe me, when we were brainstorming this project there was a lot of concern regarding our computational capacity. Of course, it would have been easier to start the analysis with a few dozen samples, but we were ambitious, and fortunately, we succeed. There is so much public data available that the problem is not generating data, but how to analyze it for your own purposes. That is probably one of the main advantages for people who are just starting projects mixing bioinformatics and biology. We want to unlock the clinical potential for a broader spectrum of small non-coding RNAs, mimicking what an actual cell does. The cell is not using just one type of sncRNA or the other, but rather the whole machinery available for a given biological process. Our goal is to integrate all the data we have generated; the already available public data as well as any other source and translate it into clinical benefits for cancer patients. 

To learn more about Dr. Martinez work, watch our webinar where he discusses it in detail. 

Watch Webinar