Guo Lab at VUMC - Research

Dr. Guo received an NIH grant to conduct a large transcriptome-wide assocation studies (TWAS) of colorectal cancer (R37CA227130, with funding support over 10 years). Our group has developed a TWAS analysis pipeline to systematically investigate the transcriptome for disease risk. This funded study uses existing GWAS data from two large genetic consortia conducted in European and Asian-ancestry populations. In a recent analysis, we have identified 25 genes for which, their genetically predicted expressions were associated with CRC risk in European populations. This work has been published at Gastroenterology in 2020 (PMID: 33058866). In a follow-up work, we reported inaugural results from a large CRC TWAS among 23,572 CRC cases and 48,700 controls of East Asian ancestry from the Asia Colorectal Cancer Consortium (ACCC). We genotyped DNA samples from 364 East Asian CRC patients and conducted RNA-sequencing on their tumor-adjacent normal colon tissues to build statistical models of genetically predicted gene expression. We have submitted this work for publication.

Statistical/Deep learning Software Development Activities
TF Mixed linear model: statistical approach to identify disease risk transcription factors (He et al. Nature communications, 2022; PMID: 36402776)

s TF-TWAS: statistical tool to improve disease risk gene discovery (He et al. Nature communications, 2022; PMID: 36402776)

trans TF-TWAS: statistical tool to improve disease risk gene discovery (He et al., Nucleic Acids Research, in 2024; PMID: 37873299)

Tissue-specific Transfer Learning: ENFORMER-based deep learning for omics data (doi: https://doi.org/10.1101/2023.09.11.557208)

Codes and pipelines please see our github at Xingyi Guo's Lab

Our group has established a cloud-based computational platform and developed several bioinformatics/statistical approaches for genomics and genetics studies. We have established AWS platform to manage a cluster of Elastic Compute Cloud (EC2) instances to process data stored in the Simple Storage Service (S3) (vumc-research1, Dr. Guo as the administrator). This platform is critical for large-scale genetic epidemiology studies, which has significantly contributed to a large breast cancer genetic consortium in African-ancestry women (PI: Dr. Zheng, R01CA202981; handling over 5000 deep WGS data). We have developed extensive bioinformatic pipelines and methodologies to analyze WGS, WES, RNA-seq, Chromatin immunoprecipitation assays with sequencing (ChIP-seq), and Transposase Accessible Chromatin sequencing (ATAC) data.

Using these approaches, we have identified rare coding and structure variants associated with breast cancer risk (Human Molecular Genetics, PMID: 29325031; Cancer Epidemiology Biomarkers Prevention, PMID: 31160347; International Journal of Cancer, PMID: 31837001) and illustrated the underlying mechanisms of transcriptional and epigenetic dysregulations caused by cancer driver mutations (Molecular Cancer, PMID: 25890285).

Currently, we have established structure variant (SV) analysis pipeline to identify SV deletions using six SV callers, including GenomeSTRiP, LUMPY, DELLY , Manta, Pindel and Canvas based on large WGS data. We developed our analytical strategy to merge and compare SVs from different callers. To generate a consensus call from different SV callers, the initial merging step was implemented using the SURVIVOR tool using recommended parameters, including 1) the maximum distance of 1kb for breakpoints, 2) detected by at least two SV callers, 3) being on the same strand and 4) the minimum length of 50 bp for SV deletion 31. For a consensus set of SV deletion for each sample, we further performed re-genotyping by analyzing informatics reads (i.e., split reads) from the analysis-ready BAM file using the SVTyper tool. We filtered deletions if there were less than seven mapped reads around the deletion. We also calculated a ratio of mapped reads in the deletion region relative to its flanking 1kb region using the duphold tool and filtered deletions if the ratio > 0.7. We further merged the remaining SV deletions across all samples using SURVIVOR, with the recommended parameters. In the end, we generated a Variant Call Format (VCF) file of the high-confident SV deletions for the study. We are preparing this work for publication.

Genome-wide association studies (GWAS) have identified approximately 1000 genetic variants associated with risk of human cancers. Approximately 90% of these GWAS-identified risk variants are located in noncoding regions. However, target genes and biological mechanisms for these identified associations remain largely unknown. We have developed an expression quantitative trait loci (eQTL) analysis pipeline that uses transcriptome and genotype data to identify putative susceptibility genes for GWAS-identified risk variants for breast cancer. We identified 101 putative susceptibility genes, and more than half of them were not previously reported. This work was published in 2018 in American Journal of Human Genetics (AJHG, PMID: 29727689).

We have developed an innovative meta-analysis strategy to identify putative susceptibility genes for six other major cancer types – colorectal, lung, ovary, prostate, pancreas, and melanoma. Using this approach, he identified 264 putative cancer susceptibility genes, including 107 that were not reported previously. This work was selected as an oral platform presentation at the American Society of Human Genetics (ASHG) Annual meeting in 2018, and was published in AJHG in 2019 ( PMID: 31402092).

We conducted a multi-omics analysis using transcriptome and/or DNA methylation data from the Genotype-Tissue Expression (GTEx), The Cancer Genome Atlas (TCGA), and the Colonomics projects. We identified 116 putative target genes for 45 GWAS-identified variants. Using summary-data-based Mendelian Randomization approach (SMR), we demonstrated that the CRC susceptibility for 29 out of the 45 CRC variants may be mediated by cis-effects on gene regulation. At a cutoff of the Bonferroni-corrected PSMR < 0.05, we determined 66 putative susceptibility genes, including 39 genes that have not been previously reported. This work has been published in human molecular genetics in 2021 (PMID: 33481017).

Project I: Spectrum of Somatic Cancer Gene Variations Among Adults With Appendiceal Cancer by Age at Disease Onset. This work was published in JAMA Netw Open. (Collabration with Dr. Andreana N Holowatyj, PMID: 33295976)

Project II: The mutational landscape of early- and typical-onset oral tongue squamous cell carcinoma. This work was published in Cancer, 2020 (Collabration with Drs. Lang Kuhs KA and Campbell BR, PMID: 33146897)

Preoject I: Mediation analysis approach applied in human cancers (Published in BMC cancer, PMID: 32928150)
" From tobacco smoking to cancer mutational signature: a mediation analysis strategy to explore the role of epigenetic changes"

Project II: Computational epigenetics and statistical approach to imporve susceptiblity gene discovery (Under review in Nature Communications)

Project III: Structure variant (SV) analysis pipeline

Codes and pipelines please see our github at Xingyi Guo's Lab