GREIN is an interactive web platform that provides user-friendly options to explore and analyze GEO RNA-seq data. GREIN is powered by the back-end computational pipeline for uniform processing of RNA-seq data and the large number (>6,000) of already processed datasets. These datasets were retrieved from GEO and reprocessed consistently by the back-end GEO RNA-seq experiments processing pipeline (GREP2).
GREP2 is available as an R package in CRAN. Both GREP2 and GREIN are simultaneously running on different Docker containers. The pipeline workflow can be summarized briefly as follows:
Retrieve metadata for a given GEO series accession using Bioconductor package GEOquery.
Download the associated run files for each sample from SRA database using ascp
utility of aspera connect.
Generate FASTQ files from each SRA file using SRA Toolkit.
Get rid of the adapter sequences if necessary using Trimmomatic.
Quality control (QC) reports are generated for each of the FASTQ files using FastQC.
Run Salmon to quantify transcript abundances for each sample. These transcript level estimates are then summarized to gene level using tximport. We use lengthScaledTPM
option in the summarization step which gives estimated counts scaled up to library size while taking into account for transcript length. We obtained gene annotation for Homo sapiens (GRCh38), Mus musculus (GRCm38), and Rattus norvegicus (Rnor_6.0) from Ensemble (release-91).
Compile FastQC reports and Salmon log files into a single interactive HTML report using MultiQC.
If you use GREIN, please cite:
Al Mahi, N., Najafabadi, M. F., Pilarczyk, M., Kouril, M., & Medvedovic, M. (2019). GREIN: An interactive web platform for re-analyzing GEO RNA-seq data. Scientific reports, 9(1), 7580. https://doi.org/10.1038/s41598-019-43935-8
Source code of GREIN is available in GitHub. You can post any comments, suggestions, or bug reports here.