Dr. Guo received an NIH grant to conduct a large transcriptome-wide assocation studies (TWAS) of colorectal cancer (R37CA227130). Our group has developed a TWAS analysis pipeline
to systematically investigate the transcriptome for disease risk. This funded study
uses existing GWAS data from two large genetic consortia conducted in European
and Asian-ancestry populations. In a recent analysis, we have identified 25 genes for
which, their genetically predicted expressions were associated with CRC risk in European populations. This
work has been published at Gastroenterology in 2020 (PMID: 33058866)
In a
follow-up work, we reported inaugural results from a large CRC TWAS among
23,572 CRC cases and 48,700 controls of East Asian ancestry from the Asia
Colorectal Cancer Consortium (ACCC). We genotyped DNA samples from 364 East Asian
CRC patients and conducted RNA-sequencing on their tumor-adjacent normal colon
tissues to build statistical models of genetically predicted gene expression.
We applied these predictive models and GWAS summary statistics from East Asian
patients (23,572 cases and 48,700 controls) to investigate associations of
predicted gene expression with CRC risk. We have submitted this work for publication.
Improved TWAS
using the putative regulatory genetic variants: We built gene-expression prediction
models using only the putative regulatory genetic variants (flanking 1Mb region) located in the
binding sites of these risk-associated TFs at p < 0.05 reported by BCAC (n = 30k SNPs) based on the
transcriptome data from GTEx. though we only used 30k SNPs occupied by the
22 selected TFs which shown nominally significant association with breast
cancer risk, we were able to predict gene expression at R2>0.01
for 7,538 genes, which is only slightly less than the total number (n=9,109) of
predicted genes using all genetic variants in our previous study. We further focused only on genes that can be
predicted by the same set of local genetic variants from either of TCGA
and the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) at R2 > 0.01. we identified 76 genes with predicted expressions that
were associated with breast cancer risk at P < 5 × 10-6,
at a Bonferroni-corrected significance level as used in our previous study in which we identified 48 genes with regular TWAS.
Specifically,
we identified 22 genes that are located in regions not yet identified by GWAS
(1Mb away). In addition, we uncovered 23 putative breast cancer risk genes in known GWAS
loci that had not been previously reported (Wen et al., Nature communcations 2021).
Our group has established a cloud-based
computational platform and developed several bioinformatics/statistical
approaches for genomics and genetics studies. We have established AWS platform to manage a cluster of Elastic Compute Cloud (EC2)
instances to process data stored in the Simple Storage Service (S3)
(vumc-research1, Dr. Guo as the administrator). This platform is critical for large-scale genetic epidemiology studies, which has significantly
contributed to a large breast cancer genetic consortium in African-ancestry
women (PI: Dr. Zheng, R01CA202981; handling over 5000 deep WGS data). We have developed extensive bioinformatic
pipelines and methodologies to analyze WGS, WES, RNA-seq, Chromatin
immunoprecipitation assays with sequencing (ChIP-seq), and Transposase
Accessible Chromatin sequencing (ATAC) data.
Using these approaches, we have identified
rare coding and structure variants associated with breast cancer risk (Human Molecular
Genetics, PMID: 29325031; Cancer
Epidemiology Biomarkers Prevention, PMID: 31160347; International Journal of Cancer, PMID: 31837001) and illustrated the underlying mechanisms of transcriptional
and epigenetic dysregulations caused by cancer driver mutations (Molecular Cancer, PMID: 25890285).
Currently, we have established structure variant (SV) analysis pipeline to identify SV deletions using six SV callers, including GenomeSTRiP, LUMPY, DELLY , Manta, Pindel and Canvas based on large WGS data. We developed our analytical strategy to merge and compare SVs from different callers. To generate a consensus call from different SV callers, the initial merging step was implemented using the SURVIVOR tool using recommended parameters, including 1) the maximum distance of 1kb for breakpoints, 2) detected by at least two SV callers, 3) being on the same strand and 4) the minimum length of 50 bp for SV deletion 31. For a consensus set of SV deletion for each sample, we further performed re-genotyping by analyzing informatics reads (i.e., split reads) from the analysis-ready BAM file using the SVTyper tool. We filtered deletions if there were less than seven mapped reads around the deletion. We also calculated a ratio of mapped reads in the deletion region relative to its flanking 1kb region using the duphold tool and filtered deletions if the ratio > 0.7. We further merged the remaining SV deletions across all samples using SURVIVOR, with the recommended parameters. In the end, we generated a Variant Call Format (VCF) file of the high-confident SV deletions for the study. We are preparing this work for publication.
Genome-wide association studies (GWAS) have identified approximately 1000 genetic variants associated with risk of human cancers. Approximately 90% of these GWAS-identified risk variants are located in noncoding regions. However, target genes and biological mechanisms for these identified associations remain largely unknown. We have developed an expression quantitative trait loci (eQTL) analysis pipeline that uses transcriptome and genotype data to identify putative susceptibility genes for GWAS-identified risk variants for breast cancer. We identified 101 putative susceptibility genes, and more than half of them were not previously reported. This work was published in 2018 in American Journal of Human Genetics (AJHG, PMID: 29727689).
We have developed an innovative meta-analysis strategy to identify putative susceptibility genes for six other major cancer types – colorectal, lung, ovary, prostate, pancreas, and melanoma. Using this approach, he identified 264 putative cancer susceptibility genes, including 107 that were not reported previously. This work was selected as an oral platform presentation at the American Society of Human Genetics (ASHG) Annual meeting in 2018, and was published in AJHG in 2019 ( PMID: 31402092).
We conducted a multi-omics analysis using transcriptome and/or DNA methylation data from the Genotype-Tissue Expression (GTEx), The Cancer Genome Atlas (TCGA), and the Colonomics projects. We identified 116 putative target genes for 45 GWAS-identified variants. Using summary-data-based
Mendelian Randomization approach (SMR), we demonstrated that the CRC susceptibility for 29 out of the 45 CRC variants may be mediated by cis-effects on gene regulation. At a cutoff of the Bonferroni-corrected PSMR < 0.05, we determined 66 putative susceptibility genes, including 39 genes that have not been previously reported. This work has been published in human molecular genetics in 2021 (PMID: 33481017).
Project I: Spectrum of Somatic Cancer Gene Variations Among Adults With Appendiceal Cancer by Age at Disease Onset. This work was published in JAMA Netw Open. (Collabration with Dr. Andreana N Holowatyj, PMID: 33295976)
Project II: The mutational landscape of early- and typical-onset oral tongue squamous cell carcinoma. This work was published in Cancer, 2020 (Collabration with Drs. Lang Kuhs KA and Campbell BR, PMID: 33146897)
Preoject I: Mediation analysis approach applied in human cancers (Published in BMC cancer, PMID: 32928150)
" From tobacco smoking to cancer mutational signature: a mediation analysis strategy to explore the role of epigenetic changes"
Project II: Computational epigenetics and statistical approach to imporve susceptiblity gene discovery (Under review in Nature Communications)
Project III: Structure variant (SV) analysis pipeline
Codes and pipelines please see our github at Xingyi Guo's Lab
created with
Website Builder Software .