Fig. 6.1
Typical primary analysis pipeline of WES (a) and WTS (b). Briefly, data processing is done as follows: data cleansing, mapping to reference genome, and variant call/read counting and annotation. Various methods are used to detect somatic mutations or deferential expressed genes in the resulting sequence reads. Representative tools are described
Considering different patterns in base substitutions and mutant allele frequencies among cancer types and individual subjects, there are limitations in detecting mutations correctly by using the above software for detection of single nucleotide polymorphisms (SNPs). Tools developed exclusively for detection of somatic mutations, such as MuTect [34] or VarScan [35], are often used with somatic mutation calling for tumor/normal-paired sample. Briefly, MuTect has high sensitivity and is good at detecting low allele mutation, while VarScan has high specificity in detecting somatic mutations [36].
Usuyama et al. have recently developed a novel way of somatic mutation called HapMuC, using heterozygous germ line variants near candidate mutations. The algorithm had superior specificity and sensitivity compared with previous methods [37].
Gene polymorphism databases such as dbSNP and 1000 Genomes Project are usually used to remove SNPs from SNVs and/or short indel. Panel of normal samples is used to further filter and to remove false-positive somatic mutations caused by sequencing errors in normal samples.
Hundreds to thousands of somatic mutations are often detected in cancer genome. Among these, several tools, such as MutSigCV, OncodriveFM, and OncodriveCLUST, have been proposed to identify driver genes.
MutSigCV allows researchers to calculate the significance of genomic mutational status for cancer association by using not only background mutation rate but also DNA replication timing and transcriptional activity of the gene [38].
OncodriveFM provides a functional impact using three well-known methods (SIFT, PolyPhen2, and MutationAssessor) [39]: It is based on the assumption that any bias toward the accumulation of variants with high functional impact is an indication of positive selection and can thus be used to detect candidate driver genes or gene modules.
OncodriveCLUST (http://bg.upf.edu/oncodrive-clust) is a method to identify genes in which mutations accumulate within specific regions of the protein, because this denotes events selected by affecting the tumor [40]. It computes a score measuring the mutation clustering of a gene across the protein sequence and then compares it with a background model.
6.2.2 Transcriptome Sequencing
RNA expression profiling is possible by using whole transcriptome sequencing (WTS/RNA-seq) to replace the usage of microarray methods. WTS also allows scientists to look at alternative gene-spliced transcripts, posttranscriptional modifications, gene fusion, mutations/SNPs, and changes in gene expression [41]. RNA expression of each gene or transcript is often measured by RPKM/FPKM (reads/fragments per kilobase per million mapped reads) or TPM (transcripts per million). RPKM/FPKM is the normalized value of mapped read/fragment counts normalized by transcript length and total reads/fragments.
Bowtie or BWA is used for mapping of WTS reads. Alignment that can take account of splicing variants is needed to determine WTS data compared with exome analysis. So in WTS analysis, TopHat software in conjunction with Bowtie is used for mapping, to detect fusion gene and call SNVs (Fig. 6.1b) [42]. HISAT will be the core of the next version of TopHat (http://nextgenseek.com/2015/03/hisat-a-fast-and-memory-lean-rna-seq-aligner/). Cufflinks yield not only transcriptome assembly in conjunction with the splicing variants using genome annotation file (GTF format, as usual), FPKM, but also differentially expressed genes (DEGs) between two specified groups.
The STAR software package performs this task with higher levels of accuracy and speed. In addition to detecting annotated and novel splice junctions, STAR can discover more complex RNA sequence arrangements, such as chimeric and circular RNA [43].
Normalization of tag count data strategies is updated day by day using R packages: TCC (an acronym for Tag Count Comparison) is an R/Bioconductor package that provides a series of functions for differential expression analysis of tag count data. The package incorporates multistep normalization methods to remove potential DEGs before performing data normalization. TCC provides a simple unified interface that can perform such analyses with combinations of functions provided by edgeR, DESeq, and baySeq [44].
6.3 Public Databases and Tools for NGS Analysis
After identification of genomic/transcriptome alterations, several databases are available to extract the biomarker or therapeutic target for cancer (Table 6.1).
Table 6.1
Public database and tools for NGS analysis
Database | Contents | URL |
---|---|---|
COSMIC | SNV, insertion, deletion, gene fusion, genomic rearrangement, copy number, and differential expression data | |
cBioPortal | Mutations, CNV, RNA/protein expression, clinical data, and their correlations | |
DAVID | Characteristics of involved gene set using GSEA | |
DGIdb | Drug-gene interactions | |
Genomics of Drug Sensitivity in Cancer | Cell line drug sensitivity data | |
Mitelman Database | Fusion genes or chromosomal aberrations | |
RefEx | Expression profile of gene of interest in each normal organ and cell line | |
PrognoScan | Meta-analysis of the prognostic value of genes |
6.3.1 COSMIC
The Catalogue of Somatic Mutations in Cancer (COSMIC) is the most popular database: It includes SNV, insertion, deletion, gene fusion, genomic rearrangement, copy number, and differential expression data from over one million cancer genomes [45]. It can be confirmed whether detected alterations are known somatic mutations or not by comparing them with COSMIC database. Mutational frequency and mutational status of genes involved in tumors/cell lines from the dataset are also available.
6.3.2 cBioPortal
The cBioPortal for Cancer Genomics makes possible visualization and analysis of large-scale cancer genomic dataset, not only of mutations, CNV, RNA/protein expression, and clinical data but also their correlations. The database contained data from 105 cancer genomic studies in October 2015 [46, 47].
6.3.3 DAVID
The Database for Annotation, Visualization and Integrated Discovery (DAVID) software can aid in knowing the characteristics of involved gene set using GSEA (Gene Set Enrichment Analysis) method [48]. DAVID software interprets annotated data with OMIM, gene ontology, and pathway.
6.3.4 DGIdb
The DGIdb is used to look at drug-gene interactions and potentially “druggable” genes [49]. Information on clinical trial status is also available.
6.3.5 Genomics of Drug Sensitivity in Cancer
This database provides cell line drug sensitivity data for 140 drugs representing >48,000 cell line-drug interactions [50]. Drug sensitivity data have been correlated with mutations in cancer genes in order to identify genetic factors associated with drug sensitivity or resistance.
6.3.6 Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer
This database can be used to confirm the frequency of detected fusion genes or chromosomal aberrations. In total, 10,026 gene fusions of 65,975 cases have been registered in 2015 (Mitelman F, Johansson B, and Mertens F (Eds.), Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer (2015). http://cgap.nci.nih.gov/Chromosomes/Mitelman).
6.3.7 RefEx
RefEx (Reference Expression dataset; http://refex.dbcls.jp/) attempts to achieve the reference of mammalian tissue gene expression data using various methods, such as expressed sequence tag (EST), microarray (GeneChip), and CAGE (cap analysis gene expression) and WTS/RNA-seq. This database is useful to find out the expression profiles of genes of interest in each normal organ, and the recent update of RefEx in collaboration with FANTOM5 project enables us to browse gene expression profiles in cell lines, primary cultures, and adult and fetal tissues from human and mouse [51].
6.3.8 PrognoScan
PrognoScan is an online biomarker validation tool for meta-analysis of the prognostic value of genes [52]. 8626 cases of 14 cancer types in 74 datasets are registered in this database.
6.4 Identification of Novel Therapeutic Targets for Lung Cancer by NGS
Recent studies using NGS platform have revealed novel driver genes affecting lung cancer carcinogenesis and have also formulated new therapeutic target repressing oncogenic addiction (Table 6.2).
Table 6.2
Recent studies using NGS in lung cancer
Author | Histology | No. of patients | Methods using NGS | Summary of novel findings | References |
---|---|---|---|---|---|
Kohno | ADC | 30 | WTS | KIF5B-RET fusion gene | [53] |
Imielinski | ADC | 183 | WGS, WES | Mutations of U2AF1, RBM10, ARID1A genes. Structural variants of EGFR, SIK2 genes | [58] |
Seo | ADC | 87 | WES, WTS | Mutations of LMTK2, ARID1A, NOTCH2, SMARCA4 genes. Fusion of ALK, RET, ROS1, FGFR2, AXL, PDGFRA genes | [59] |
TCGA | ADC | 230 | WGS, WES, WTS | NF1, RIT1 mutations. MGA mutations that occur mutually exclusive of MYC amplification | [60] |
Fernandez-Cuesta | ADC | 25 | WTS | CD74-NRG1 gene fusion in mucinous subtype | [61] |
Jang | ADC | 153 | WTS | SND1-BRAF fusion gene | [62] |
TCGA | SQC | 178 | WGS, WES | Significantly altered pathways included NFE2L2 and KEAP1 and/or deletion or mutation of CUL3. Amplification of FGFR1 and WHSC1L1 | [63] |
Kim | SQC | 104 | WES | Similar spectrum of alterations between Korean and North American lung squamous cell carcinoma. FGFR3-TACC3 fusion gene | [64] |
Peifer | SCLC | 29 | WGS, WES, WTS | TP53, RB1 inactivation in all cases. Frequent mutation of CREBBP, EP300, MLL, PTEN, SLIT2, EPHA7 genes. FGFR1 gene amplification | [67] |
Rudin | SCLC | 53 | WGS, WES, WTS | 22 significantly mutated genes. Frequent amplification of SOX2 gene | [68] |
George | SCLC | 110 | WGS, WTS | Frequent mutation in TP73 and NOTCH family genes. Chromothripsis affecting | [69] |
Govindan | NSCLC (16 ADC, 1 LCC) | 17 | WGS,WTS | Ten times higher mutation frequency among smokers than in nonsmokers. EGFR and KRAS mutations play initiation role in lung cancer carcinogenesis both in smokers and never-smokers | [70] |
6.4.1 Lung Adenocarcinoma
KIF5-RET fusion gene was identified as a new driver gene by using WTS and/or whole-genome sequencing (WGS) of lung adenocarcinoma (LADC) patients in 2012 [53–56]. RET fusion was found in 1–2 % of LADC patients from both Asia and Europe, numbering approximately 12,000 lung cancer patients per year worldwide. The occurrence of RET fusion was found more frequently among the young and was specific among LADC patients [57].
Imielinski et al. identified U2AF1, RBM10, and ARID1A as novel driver genes of lung cancer using WES of surgically resected 183 lung adenocarcinomas. The genomic rearrangements in EGFR and SIK2 genes were discovered by WGS analysis of 24 lung adenocarcinomas [58].
Seo et al. identified novel driver mutations in LMTK2, ARID1A, NOTCH2, and SMARCA4 genes using WES of surgical specimens from 76 lung adenocarcinoma patients. WTS of 77 cases revealed fusion genes involving tyrosine kinase genes such as FGFR2, AXL, and PDGFRA in addition to ALK, RET, and ROS1, previously known fusion genes in lung adenocarcinomas [59].
TCGA researchers investigated omics landscape of 230 resected lung adenocarcinomas: They identified 18 genes as statistically significant mutated genes, including RIT1 activating mutations and newly described loss-of-function MGA mutations, which are mutually exclusive with focal MYC amplification. Aberrations in NF1, MET, ERBB2, and RIT1 occurred in 13 % of cases and were enriched in samples otherwise lacking an activated oncogene, suggesting a driver role for these events in some tumors. MAPK and PI(3)K pathway activity at the protein level was explained by known mutations in only a fraction of cases, suggesting additional, unexplained mechanisms of pathway activation [60].
Fernandez-Cuesta et al. discovered CD74-NRG1 fusion gene using WTS of 25 lung adenocarcinomas of never-smokers. In addition to 102 pan-negative lung adenocarcinoma patients of never-smokers, five cases carried CD74-NRG1 fusion gene. All positive cases were female of the invasive mucinous subtype [61].
Jang et al. identified SND1-BRAF fusion in 5/153 never-smoker lung adenocarcinoma patients by using WTS. Ectopic expression of SND1-BRAF in H1299 cells showed upregulated phosphorylation levels of MEK/ERK, cell proliferation, and spheroid formation compared with parental mock-transfected control [62].
6.4.2 Squamous Cell Lung Cancer
TCGA researchers revealed a comprehensive genomic landscape of squamous cell lung cancer in 2012. About 178 cases of squamous cell lung cancer patients were analyzed using WES and WTS. They detected novel loss-of-function mutation in the HLA-A class I major histocompatibility gene. Significantly altered pathways included NFE2L2 and KEAP1 and/or deletion or mutation of CUL3 in 34 % of tumors. They identified actionable alterations for therapeutic targets in most tumors [63].
Kim et al. clarified a similar spectrum of alterations between Korean and North American lung squamous cell carcinoma, in contrast to the differences seen in lung adenocarcinoma. They also identified recurrence of therapeutically actionable FGFR3-TACC3 fusion in lung squamous cell carcinoma [64].
FGFR1 gene is amplified in up to ~20 % of squamous cell lung cancer patients. Clinical trials with FGFR inhibitors are currently underway [65, 66] (My Cancer Genome http://www.mycancergenome.org/content/disease/lung-cancer/fgfr1/58/ (Updated November 15)). Precision medicine targeting FGFR pathway will improve the prognoses of patients with lung squamous cell carcinoma.
6.4.3 Small-Cell Lung Cancer
Peifer et al. used 29 SCLCs for WES, WTS, and/or WGS analyses: All cases showed signatures of inactivation of p53 and Rb. In addition to frequent mutations in CREBBP, EP300, and MLL, which encode histone modification protein, frequent mutations in PTEN, SLIT2, and EPHA7 genes and focal amplification of FGFR1 gene were observed [67].
Rudin et al. identified 22 candidate driver genes by WES and WTS analysis of 36 primary SCLC and 17 SCLC cell lines. SOX2 amplification was detected in ~27 % of SCLC tumors, and utility was confirmed as the therapeutic target for SCLC [68].
We identified frequent mutation in TP73 and NOTCH family genes by WGS and WTS of 110 SCLCs; rearrangement of TP73 gene induced oncogenic transcript TP73Δex2/3; and WGS revealed that chromothripsis on chromosomes 3 and 11 affects tumors with wild-type RB1 [69].
6.4.4 Tobacco Smoking and Lung Cancer Genome
Govindan et al. sequenced entire genome and transcriptome of 17 tumor-adjacent normal sample pairs from non-small cell lung cancer (NSCLC) patients: The samples revealed ten times higher mutation frequency among smokers than in never-smokers. EGFR and KRAS mutations were found in the foundation clones both among smokers and never-smokers by using deep digital sequencing, suggesting that EGFR and KRAS mutations play initiation roles in lung cancer carcinogenesis. In addition, 54 genes were identified as targetable mutations for therapy [70].
Gou et al. analyzed several datasets of NGS study in a total of 739 lung cancer tumors (390 adenocarcinomas, 282 squamous cell carcinomas, and 67 small cell carcinomas). They also demonstrated that smokers have many more somatic mutations than nonsmokers (nonsmokers ADC, 0.98/Mb; smokers ADC, 12.67/Mb; SQC, 8.75/Mb; SCLC, 15.87/Mb). The cancer genomes of smokers were more complicated when compared with nonsmokers [71].
6.5 NGS for Precision Medicine
6.5.1 Cancer Immunotherapy
Rizvi et al. used WES for 34 NSCLC cases treated with pembrolizumab, an antibody targeting programmed cell death-1 (PD-1) therapy, which revealed that a higher nonsynonymous mutation burden in tumors was associated with improved objective response, durable clinical benefit, and progression-free survival [72].
6.5.2 Clinical Sequencing
Clinical sequencing to make personalized treatment strategies for cancer patients has been used in lung cancer. Mutations of EGFR and ALK were widely used in screening for administration of EGFR-TKI and crizotinib, respectively. WES or targeting >100 s mutation sequencing using NGS rapidly is becoming a common method of clinical sequencing. Actionable mutations are used not only as targets of molecular therapy but also as markers to achieve better stratifications for clinical trials. In the USA, extensive clinical trials (master protocol) using >1000 squamous cell lung cancers have begun [73]. The “Foundation One” platform is a mutational screening using NGS for consulting adaptation in five clinical trials, including FGFR-TKI and anti-PDL1 antibody treatment.