The Cancer Genome



The Cancer Genome


Yardena Samuels

Alberto Bardelli

Carlos López-Otín



There is a broad consensus that cancer is, in essence, a genetic disease, and that accumulation of molecular alterations in the genome of somatic cells is the basis of cancer progression (Fig. 1.1).1 In the past 5 years the availability of the human genome sequence and progress in DNA sequencing technologies has dramatically improved knowledge of this disease. These new insights are transforming the field of oncology at multiple levels:



  • The genomic maps are redesigning the tumor taxonomy by moving it from a histologic- to a genetic-based level.


  • The success of cancer drugs designed to target the molecular alterations underlying tumorigenesis has proven that somatic genetic alterations are legitimate targets for therapy.


  • Tumor genotyping is helping clinicians to individualize treatments by matching patients with the best treatment for their tumors.


  • Tumor-specific DNA alterations represent highly sensitive biomarkers for disease detection and monitoring.


  • Finally, the ongoing analyses of multiple cancer genomes will identify additional targets, whose pharmacological exploitation will undoubtedly result in new therapeutic approaches.

This chapter will review the progress that has been made in understanding the genetic basis of sporadic cancers. The topic of familial cancer is covered in Chapter 12. The emphasis of this chapter is an introduction to novel integrated genomic approaches that allow a comprehensive and systematic evaluation of genetic alterations that occur during the progression of cancer. Using these powerful tools, cancer research, diagnosis, and treatment are poised for a transformation in the next decade.


CANCER GENES AND THEIR MUTATIONS

Cancer genes are broadly grouped into oncogenes and tumor suppressor genes. Using a classical analogy, oncogenes can be considered as the car accelerator, so that a mutation in an oncogene would be the equivalent of having the accelerator continuously pressed.2 Tumor suppressor genes, in contrast, act as “brakes,”2 so that when they are not mutated they function to inhibit tumorigenesis. Oncogene and tumor suppressor genes may be classified by the nature of their somatic mutations in tumors.1 Mutations in oncogenes typically occur at specific hotspots, often affecting the same codon or clustered at neighboring codons in different tumors. Furthermore, mutations in oncogenes are almost always missense, and the mutations usually affect only one allele, making them heterozygous. In contrast, tumor suppressor genes are usually mutated throughout the gene; a large number of the mutations may truncate the encoded protein and generally affect both alleles, causing loss of heterozygosity. Major types of somatic mutations present in malignant tumors include nucleotide substitutions, small insertions and deletions (indels), chromosomal rearrangements, and copy number alterations (further described in Chapter 2).


IDENTIFICATION OF CANCER GENES

The completion of the human genome project has marked a new era in biomedical sciences.3 Knowledge of the sequence and organization of the human genome allows the systematic analysis of the genetic alterations underlying the origin and evolution of tumors. Before elucidation of the human genome, several cancer genes, such as KRAS, TP53, and APC, were successfully discovered using approaches based on oncovirus analysis, linkage studies, loss of heterozygosity, and cytogenetics.4,5 The completion of the Human Genome Project in 2004,3 which provided a sequence-based map of the normal human genome, together with the construction of the HapMap, containing single nucleotide polymorphisms (SNPs), and the underlying genomic structure of natural human genomic variation,6,7 allowed an extraordinary throughput in cataloging somatic mutations in cancer. These projects now offer an unprecedented opportunity: the identification of all the genetic changes associated with a human cancer. This ambitious goal is for the first time within reach of the scientific community. Already a number of studies have demonstrated the usefulness of strategies aimed at the systematic identification of somatic mutations associated with cancer progression. Notably, the Human Genome Project, the HapMap project, as well as the candidate and family gene approaches described below, utilized capillary-based DNA sequencing (first-generation sequencing, also known as Sanger sequencing).8 Figure 1.2 clearly illustrates the developments in the search of cancer genes, its increased pace, as well as the most relevant findings in this field.







FIGURE 1.1 Schematic representation of the genomic and histopathological steps associated to tumor progression: from the occurrence of the initiating mutation in the founder cell to metastasis formation. It has been convincingly shown that the genomic landscape of solid tumors such as that of pancreatic and colorectal requires the accumulation of many genetic events, a process which requires decades to complete This timeline offers an incredible window of opportunity for the early detection (often associated to excellent prognosis) of this disease.






FIGURE 1.2 Timeline of seminal hypotheses, research discoveries, and research initiatives that have led to an improved understanding of the genetic etiology of human tumorigenesis within the past century. The consensus cancer gene data were obtained from the Wellcome Trust Sanger Institute Cancer Genome Project website (http://www.sanger.ac.uk/genetics/CGP). Redrawn from ref. 80.



CANCER GENOME INVESTIGATION: TOOLS AND QUALITY CONTROLS

In order to perform mutational analysis of cancer genomes it is imperative to acquire high-quality reagents and to perform several quality controls to verify that the derived data are reliable. To detect somatic (i.e., tumor-specific) mutations in cancer both the tumor DNA and the germline DNA from the same individual are required, especially because knowledge of the variations in the normal human genome is as yet incomplete. Normal genomic DNA from the same individual may be derived either from blood or from tumor neighboring tissue in cases where solid tumors are investigated.

A cancer sample (either from bioptic or surgical origin) typically contains both malignant and nonmalignant (stromal) cells. Most genomic analyses require that samples are highly enriched for tumor tissue. These can either be generated by deriving early passage tumor cell lines, mouse xenografts, or through a pathologist-guided selective macro- or microdissection of neoplastic tissue. This allows the isolation of tumor-derived genomic DNA and sensitive detection of somatic mutations that would otherwise be masked by contamination of normal tissue. Importantly, the quality of the derived genomic DNA may be affected by its source. Surgical resection specimens are usually large and therefore appropriate for these studies. However, biopsies from patients usually contain few cells, thus reducing the quantity of genomic DNA available. Although whole-genome amplification may be a possibility when low genomic DNA amounts are available, this method can give rise to artifactual genetic alterations.9 Another reason that negatively affects the quality of genomic DNA is that cancer samples (for example, liver metastases) often contain significant numbers of necrotic or apoptotic cells. These issues might also be resolved by increased genetic coverage utilizing second-generation sequencing approaches,10 as detailed below.

Prior to genomic analysis multiple key quality controls should be applied to the tumor and normal tissues. These include verification that the tumor sample contains at least 75% cancer cells, a threshold that allows the identification of homozygous and hemizygous deletions, copyneutral loss of heterozygosity, duplication, and amplification.11,12,13 To unequivocally assess the somatic tumor-specific nature of sequence changes, genotyping of SNPs in the tumor and normal tissue is also required to prove that both are derived from the same individual.



Cancer Gene Discovery by Sequencing Candidate Gene Families

The availability of the human genome sequence provides new opportunities to comprehensively search for somatic mutations in cancer on a larger scale than previously possible. Progress in the field has been closely linked to improvements in the throughput of DNA analysis and the continuous reduction in sequencing costs. Below some of the achievements in this research area are described, as well as how they affected knowledge of the cancer genome.

A seminal work in the field was the systematic mutational profiling of the genes involved in the RAF-RAS pathway in multiple tumors. This candidate gene approach led to the discovery that BRAF is frequently mutated in melanomas and is mutated at a lower frequency in other tumor types.14 Follow-up studies quickly revealed that mutations in BRAF are mutually exclusive with alterations in KRAS,14,15 genetically emphasizing that these genes function in the same pathway, a concept that had been previously demonstrated in lower organisms such as Caenorhabditis elegans and Drosophila melanogaster.16,17

In 2003, identification of cancer genes shifted from a candidate gene approach to the mutational analyses of gene families. The first gene families to be completely sequenced were those that involved protein18,19 and lipid phosphorylation.20 The rationale for focusing initially on these gene families was threefold:



  • The corresponding proteins were already known at that time to play a pivotal role in signaling and proliferation of normal and cancerous cells.


  • Multiple members of the protein kinases family had already been linked to tumorigenesis.


  • Kinases are clearly amenable to pharmacological inhibition, making them attractive drug targets.

The mutational analysis of all the tyrosine kinase domains in colorectal cancers revealed that 30% of cases had a mutation in at least one tyrosine kinase gene, and overall mutations were identified in eight different kinases, most of which had not previously been linked to cancer.18 An additional mutational analysis of the coding exons of 518 protein kinase genes in 210 diverse human cancers, including breast, lung, gastric, ovarian, renal, and acute lymphoblastic leukemia, identified approximately 120 mutated genes that probably contribute to oncogenesis.19 A recent somatic mutations interrogation of the protein tyrosine kinases in cutaneous melanoma identified ERBB4 to be mutated in 19% of cases, making it the most highly mutated protein tyrosine kinase in melanoma.21 ERBB4 is a member of the ERBB/HER family of receptor tyrosine kinases. Other family members, including ERBB1 (EGFR) and ERBB2 (HER-2), have been implicated by mutations or amplifications in a number of cancers, including lung, colon, and breast cancers. The high mutation frequency as well as the nonsynonymous (NS) to synonymous (S) ratio, which was 24:3, significantly higher than the NS:S ratio predicted for non-selected mutations (P <.01)22 indicated that ERBB4 mutations are selected for during tumorigenesis and therefore contribute to melanoma tumorigenesis.

As kinase activity is attenuated by enzymes that remove phosphate groups called phosphatases, the rational next step in these studies was to perform a mutation analysis of the protein tyrosine phosphatases. Mutational investigation of this family in colorectal cancer identified
that 25% of cases had mutations in six different phosphatase genes (PTPRF, PTPRG, PTPRT, PTPN3, PTPN13, or PTPN14).23 Combined analysis of the protein tyrosine kinases and the protein tyrosine phosphatases showed that 50% of colorectal cancers had mutations in a tyrosine kinase gene, a protein tyrosine phosphatase gene, or both, further emphasizing the pivotal role of protein phosphorylation in neoplastic progression. Many of the identified genes had previously been linked to human cancer, thus validating the unbiased comprehensive mutation profiling. These landmark studies led to additional gene family surveys.

The phosphatidylinositol 3-kinase (PI3K) gene family, which also plays a role in proliferation, adhesion, survival, and motility, was also comprehensively investigated.24 Sequencing of the exons encoding the kinase domain of all 16 members belonging to this family pinpointed PIK3CA as the only gene to harbor somatic mutations. When the entire coding region was analyzed, PIK3CA was found somatically mutated in 32% of colorectal cancers. At that time, the PIK3CA gene was certainly not a newcomer in the cancer arena, as it had previously been shown to be involved in cell transformation and metastasis.24 Strikingly, its staggering high mutation frequency was discovered only through systematic sequencing of the corresponding gene family.20 Subsequent analysis of PIK3CA in other tumor types identified somatic mutations in this gene in additional cancer types, including 36% of hepatocellular carcinomas, 36% of endometrial carcinomas, 25% of breast carcinomas, 15% of anaplastic oligodendrogliomas, 5% of medulloblastomas and anaplastic astrocytomas, and 27% of glioblastomas.25,26,27,28,29 It is known that PIK3CA is one of the two (the other being KRAS) most commonly mutated oncogenes in human cancers. Further investigation of the PI3K pathway in colorectal cancer showed that 40% of tumors had genetic alterations in one of the PI3K pathway genes, emphasizing the central role of this pathway in colorectal cancer pathogenesis.30 The relevance and the functional role of the PI3K pathway in tumorigenesis is further described in Chapter 5.

Although most cancer genome studies of large gene families have focused on the kinome, recent analyses have revealed that members of other families highly represented in the human genome are also a target of mutational events in cancer. This is the case of proteases, a complex group of enzymes consisting of at least 569 components that constitute the so-called human degradome.31 Proteases exhibit an elaborate interplay with kinases and have traditionally been associated with cancer progression because of their ability to degrade extracellular matrices, thus facilitating tumor invasion and metastasis.32,33 However, recent studies have shown that these enzymes hydrolyze a wide variety of substrates and influence many different steps of cancer, including early stages of tumor evolution.34 These functional studies have also revealed that beyond their initial recognition as prometastatic enzymes, they play dual roles in cancer, as assessed by the identification of a growing number of tumor-suppressive proteases.35

These findings emphasized the possibility that mutational activation or inactivation of protease genes occurs in cancer. The first clear evidence of this is derived from systematic analysis of genetic alterations in breast and colorectal cancers, which revealed that proteases from different catalytic classes were candidate cancer genes that had somatically mutated in cancer.36 These results have prompted the mutational analysis of entire protease families such as MMPs (matrix metallo-proteinases), ADAMs (a disintegrin and metallo-proteinase) and ADAMTSs (ADAMs with thromsbospondin domains) in different tumors. These studies led to identification of protease genes frequently mutated in cancer, such as MMP8, which is mutated and functionally inactivated in 6.3% of human melanomas.37,38 Other MMP genes, including MMP2, MMP9, MMP14, and MMP27, are also somatically mutated in melanomas and other malignant tumors, albeit at low frequency.37,39 Systematic mutational analysis of all members of the ADAM family of membranebound metalloproteases has shown that ADAM7 and ADAM29 are also often mutated in melanoma, whereas parallel studies of the ADAMTS family have revealed that ADAMTS15 is mutated in colorectal carcinomas and ADAMTS18 and ADAMTS20 in melanomas.40,41 Functional analyses have indicated that ADAM7, ADAM29, and ADAMTS18 mutations affect adhesion of melanoma cells to specific extracellular matrix proteins and in some cases increase their migrating and invasive properties, suggesting that these mutated genes play a role in melanoma progression.41,42 In contrast, functional studies of ADAMTS15 mutations in colorectal cancer cells have revealed that this metalloprotease restrains tumor growth and invasion, further validating the concept that secreted proteases may have tumorsuppressor properties.40

The mutational status of caspases has also been extensively analyzed in different tumors as these proteases play a fundamental role in execution of apoptosis, one of the hallmarks of cancer.43 These studies demonstrated that CASP8 is deleted in neuroblastomas and inactivated by somatic mutations in a variety of human malignancies, including head and neck, colorectal, lung, and gastric carcinomas.44,45,46 Likewise CASP3, CASP4, CASP5, CASP6, CASP7, CASP10, and CASP14 are occasionally inactivated by mutation in different human cancers.47,48,49,50,51,52,53,54 Other large protease families whose components are often mutated in cancer are the deubiquitylating enzymes (DUBs), which catalyze the removal of
ubiquitin and ubiquitinlike modifiers of their target proteins.55 Some DUBs were initially identified as oncogenic proteins, but recent work has shown that other deubiquitylases such as CYLD, A20, and BAP1 are tumor suppressors inactivated in cancer. CYLD is mutated in patients with familial cylindromatosis, a disease characterized by the formation of multiple tumors of skin appendages.56 A20 is a DUB family member encoded by the TNFAIP3 gene, which is mutated in a large number of Hodgkin’s lymphomas and primary mediastinal B-cell lymphomas.57,58,59,60 Finally, the BAP1 gene, encoding an ubiquitin C-terminal hydrolase, has been found to be somatically mutated in 86% metastasizing uveal melanomas of the eye.61


Mutational Analysis of Exomes Using Sanger Sequencing

Although the gene family approach for the identification of cancer genes has proven extremely valuable, it still is a candidate approach and thus biased in its nature. The next step forward in the mutational profiling of cancer has been the sequencing of exomes, which is the entire coding portion of the human genome (18,000 proteinencoding genes). As of today the exomes of breast, colorectal, pancreatic, and ovarian clear cell carcinomas, glioblastoma multiforme, and medulloblastoma have been analyzed using Sanger sequencing. These large-scale analyses for the first time allowed researchers to describe and understand the genetic complexity of human cancers.22,36,62,63,64,65 The declared goals of these exome studies were to provide for the first time methods for exome-wide mutational analyses in human tumors, to characterize their spectrum and quantity of somatic mutations, and, finally, to discover new genes involved in tumorigenesis as well as novel pathways that have a role in these tumors. In these studies, sequencing data were complemented with gene expression and copy number analyses, thus providing for the first time a comprehensive view of the genetic complexity of human tumors.62,63,64,65 A number of conclusions can be drawn from these analyses:



  • Cancer genomes have an average of 30 to 100 somatic alterations per tumor, which was a higher number than previously thought. Although the alterations included point mutations, small insertions, deletions, or amplifications, the great majority of the mutations observed were single-base substitutions.62,63


  • Even within a single cancer type, there is a significant intertumor heterogeneity. This means that multiple mutational patterns (encompassing different mutant genes) are present in tumors that cannot be distinguished based on histological analysis. The concept that individual tumors have a unique genetic milieu is highly relevant for personalized medicine, a concept that will be discussed below.


  • The spectrum and nucleotide contexts of mutations differ between different tumor types. For example, over 50% of mutations in colorectal cancer were C:G to T:A transitions, and 10% were C:G to G:C transversions. In contrast, in breast cancers, only 35% of the mutations were C:G to T:A transitions, and 29% were C:G to G:C transversions. Knowledge of mutation spectra is vital as it allows insight into the mechanisms underlying mutagenesis and repair in the various cancers investigated.


  • A considerably larger number of genes that had not been previously reported to be involved in cancer were found to play a role in the disease.


  • Solid tumors arising in children, such as medulloblastoma, harbor on average five to ten times less gene alterations compared to a typical adult solid tumor. These pediatric tumors also harbor fewer amplifications and homozygous deletions within coding genes compared to adult solid tumors.

Importantly, to deal with the large amount of data generated in these genomic projects, it was necessary to develop new statistical and bioinformatic tools. Furthermore, examination of the overall distribution of the identified mutations allowed the development of a novel view of cancer genome landscapes and a novel definition of cancer genes. These new concepts in the understanding of cancer genetics are further discussed below. The compiled conclusions derived from these analyses have led to a paradigm shift in the understanding of cancer genetics.

A clear indication of the power of the unbiased nature of the whole exome surveys was revealed by the discovery of recurrent mutations in the active site of IDH1, a gene with no known link to gliomas, in 12% of tumors analyzed.63 As malignant gliomas are the most common and lethal tumors of the central nervous system, and glioblastoma multiforme (GBM; World Health Organization grade IV astrocytoma) is the most biologically aggressive subtype, the unveiling of IDH1 as a novel GBM gene is extremely significant. Importantly, mutations of IDH1 predominantly occurred in younger patients (median age of 34 versus 56 years for anaplastic astrocytomas and 32 versus 59 years for GBMs) and were associated with a better prognosis, as patients with IDH mutations have a median overall survival of 31 months, and patients with wild type IDH1 and IDH2 have a median 15-month survival.66 Follow-up studies showed that mutations of IDH1 occur early in glioma progression, the R132 somatic mutation is harbored by the majority (greater than 70%) of grades II and III astrocytomas and oligodendrogliomas, as well as in
secondary GBMs that develop from these lower grade lesions.66,67,68,69,70,71,72 In contrast, less than 10% of primary GBMs harbor these alterations. Furthermore, analysis of the associated IDH2 revealed recurrent somatic mutations in the R172 residue, which is the exact analog of the frequently mutated R132 residue of IDH1. These mutations occur mostly in a mutually exclusive manner with IDH1 mutations,66,68 suggesting that they have equivalent phenotypic effects. Subsequently, IDH1 mutations have been reported in additional cancer types such as myeloid leukemia samples,73,74,75 a single case of colorectal cancer, two prostate carcinomas,71 one melanoma case,76 and a few cases of adult supratentorial primitive neuroectodermal tumors.69 Further description of the function of IDH1 and IDH2 mutations in cancer is found in Chapter 8.


Next-Generation Sequencing and Cancer Genome Analysis

The introduction in 1977 of the Sanger method for DNA sequencing with chain-terminating inhibitors has transformed biomedical research.8 Over the past 30 years, this first-generation technology has been universally used for elucidating the nucleotide sequence of DNA molecules. However, the launching of new large-scale projects, including those implicating whole-genome sequencing of cancer samples, has made necessary the development of new methods that are widely known as next-generation sequencing technologies.77,78,79 These approaches have significantly lowered the cost and the time required to determine the sequence of the 3 × 109 nucleotides present in the human genome. Moreover, they have a series of advantages over Sanger sequencing, which are of special interest for the analysis of cancer genomes.80 First, next-generation sequencing approaches are more sensitive than Sanger methods and can detect somatic mutations even when they are present only in a subset of tumor cells.81 Moreover, these new sequencing strategies are quantitative and can be used to simultaneously determine both nucleotide sequence and copy number variations.82 They can also be coupled to other procedures such as those involving paired-end reads, allowing the identification of multiple structural alterations, such as insertions, deletions, and rearrangements, commonly occurring in cancer genomes.81 Nonetheless, next-generation sequencing still presents some limitations mainly derived from the relatively high error rate in the short reads generated during the sequencing process. In addition, these short reads make the task of de novo assembly of the generated sequences and the mapping of the reads to a reference genome extremely complex. To overcome some of these current limitations, deep coverage of each analyzed genome is required and a careful validation of the identified variants must be performed, typically using Sanger sequencing. As a consequence, there is a substantial increase in both cost of the process and time of analysis. Therefore, it can be concluded that whole-genome sequencing of cancer samples is already a feasible task but not yet a routine process. Further technical improvements will be required before the task of decoding the entire genome of any malignant tumor of any cancer patient can be applied to clinical practice.

The number of next-generation sequencing platforms has substantially grown over the past few years and currently includes technologies from Roche/454, Illumina/Solexa, Life/APG’s SOLiD3, Helicos BioSciences/HeliScope, and Pacific Biosciences/PacBio RS.79 Noteworthy also are the recent introduction of the Polonator G.007 instrument, an open source platform with freely available software and protocols, the Ion Torrent’s semiconductor sequencer, as well as those involving self-assembling DNA nanoballs or nanopore technologies.83,84,85 These new machines are driving the field toward the era of third-generation sequencing, which brings enormous clinical interest as it can substantially increase speed and accuracy of analysis at reduced costs and facilitate the possibility of single-molecule sequencing of human genomes. A comparison of next-generation sequencing platforms is shown in Table 1.1. These various platforms differ in the method utilized for template preparation and in the nucleotide sequencing and imaging strategy, which finally result in their different performance. Ultimately, the most suitable approach depends on the specific genome sequencing projects.79

Current methods of template preparation first involve randomly shearing genomic DNA into smaller fragments from which a library of either fragment templates or mate-pair templates are generated. Then, clonally amplified templates from single DNA molecules are prepared by either emulsion polymerase chain reaction (PCR) or solid-phase amplification.86,87 Alternatively, it is possible to prepare single-molecule templates through methods that require less starting material and do not involve PCR amplification reactions, which can be the source of artifactual mutations.88 Once prepared, templates are attached to a solid surface in spatially separated sites, allowing thousands to billions of nucleotide sequencing reactions to be performed simultaneously.

The sequencing methods currently used by the different next-generation sequencing platforms are diverse and have been classified into four groups: cyclic reversible termination, singlenucleotide addition, real-time sequencing, and sequencing by ligation79,89 (Fig. 1.3). These sequencing strategies are coupled with different imaging methods, including those based on measuring bioluminescent signals or involving four-color imaging of single molecular events. Finally, the extraordinary amount of data released from these nucleotide sequencing platforms is stored, assembled, and analyzed using powerful bioinformatic tools that have been developed in parallel with next-generation sequencing technologies.90









TABLE 1.1 COMPARATIVE ANALYSIS OF NEXT-GENERATION SEQUENCING PLATFORMS





































































































Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

May 27, 2016 | Posted by in ONCOLOGY | Comments Off on The Cancer Genome

Full access? Get Clinical Tree

Get Clinical Tree app for offline access

Platform


Library/Template Preparation


Sequencing Method


Average Read-Length (Bases)


Run Time (Days)


Gb Per Run


Instrument Cost (US$)


Comments


Roche


Fragment, Mate-pair


Pyrosequencing


400


0.35


0.45


500,000


Fast run times



454 GS FLX



Emulsion PCR







High reagent cost


Illumina


Fragment, Mate-pair


Reversible terminator


100-125


8 (mate-pair run)


150-200


540,000


Most widely used platform



HiSeq2000



Solid-phase






Low multiplexing capability


Life/APG’s


Fragment, Mate-pair


Cleavable probe, sequencing by ligation


35-75


7 (mate-pair run)


180-300


595,000


Inherent error correction



SOLiD 5500xl



Emulsion PCR





Long run times


Helicos


Fragment, Mate-pair


Reversible terminator


32


8 (fragment run)


37


999,000


Non-bias template representation



BioSciences



Single molecule



HeliScope









Expensive, high error rates


Pacific


Fragment


Real-time sequencing


1,000


1


0.075


NA