Molecular Tools in Cancer Research



Molecular Tools in Cancer Research


Mauro W. Costa and Nadia Rosenthal




Introduction


Since the last edition of this book was published, advances in our understanding of the basic mechanisms of cancer have continued to inform and refine clinical approaches to prevention and therapy. New prognostic and predictive markers derived from molecular biology can now pinpoint specific genetic changes in particular tumors or detect occult malignant cells in normal tissues, leading to improved technologies for tumor screening and early detection. Diagnostic approaches have expanded from morphologic criteria and single gene analysis to whole genome technologies imported from other biological disciplines. A new systemic vision of cancer is emerging in which the importance of individual mutation has been superseded by an appreciation for higher order organization that is disrupted by complex interactions of disease-associated factors and gene-environmental parameters that affect tumor cell behavior. Results from these cross-disciplinary investigations underscore the complexity of carcinogenesis and have profoundly influenced the design of strategies for both cancer prevention and advanced cancer therapy.


This overview will serve as a foundation of conceptual and technical information for understanding the exciting new advances in cancer research described in subsequent chapters. Since the discovery of oncogenes, which provided the first concrete evidence of the genetic basis of cancer, applications of advanced molecular techniques and instrumentation have yielded new insights into normal cell biology as well. A basic fluency in molecular biology has become a necessary prerequisite for clinical oncologists, because many of the new diagnostic and prognostic tools currently in use rely on these fundamental principles of gene, protein, and cell function.



Our Unstable Heredity


Cancer genetics classically has relied on the candidate-gene approach, which entails detecting acquired or inherited changes in specific genetic loci accumulated in a single cell; the cell then proliferates to produce a tumor composed of the cell’s identical clonal progeny. During the early steps of tumor formation, mutations that lead to an intrinsic genetic instability allow additional deleterious genetic alterations to accumulate. These genetic changes confer selective advantages on tumor cell clones by disrupting control of cell proliferation. The identification of specific mutations that characterize a tumor cell has proved invaluable for analyzing the neoplastic progression and remission of the disease. The emergence of cancer cells is a by-product of the necessity for continuous cell division and DNA replication to maintain organ functionality throughout the life cycle.


The highly heterogeneous nature of tumors, each of which is composed of multiple cell types, has led to the formulation of the “cancer stem cell” hypothesis, which posits that only a subpopulation of cancer cells is able to maintain self-renewal, unlimited growth, and the capacity for differentiation into other more specialized cancer cell types. Cancer stem cells display bona fide stem cells markers, in contrast with other cancer cells present in the tumor, which do not have tumorigenic potential. In fact, fewer than 1 in 10,000 cells that are present in persons with acute myeloid leukemia are capable of reinitiating a new tumor when transplanted into animals. Cancer stem cells have been identified in many solid tumors in the brain, colon, ovaries, prostate, and pancreas, which suggests that more effective cancer therapies would target these self-renewing cells rather than the tumor as a whole. The cancer stem cell concept differs from the original clonal evolution hypothesis, which states that every cell in a tumor mass is capable of self-renewal and differentiation, and suggests that detecting and targeting subtle genetic and epigenetic differences that distinguish cancer stem cells may provide a more effective avenue to intervention in disease progression.



Detecting Cancer Mutations


All methods for detecting mutations rely on the manipulation of DNA, the basic building block of heredity in the cell. DNA consists of two long strands of polynucleotides that twist around each other clockwise in a double helix (Fig. 1-1). Nucleic acid bases attached to the sugar groups of each strand face each other within the helix, perpendicular to its axis. Only four bases exist: the purines adenine and guanine (A and G) and the pyrimidines cytosine and thymine (C and T). During assembly of the double helix, stable pairings of nucleotides from either strand are made between A and T or between G and C. Each base pair (bp) forms one of the billions of rungs in the long, unbroken ladder of DNA that forms a chromosome.



The functional unit of inherited information in DNA, the gene, usually is represented by a discrete section of sequence necessary to encode a particular protein structure. Gene expression is initiated by forming a copy of the gene with use of messenger RNA (mRNA); the gene is constructed base by base from the DNA template by a polymerase enzyme. Once transcribed, an mRNA transcript is modified and the processed product is transported out of the nucleus. In the cytoplasm, proteins are then synthesized, or translated, in macromolecular complexes called ribosomes that read the mRNA sequence and convert the nucleic acid code, based on three-base segments or codons, into a 20 amino acid code to form the corresponding protein.



Generating Diversity with Alternate Splicing


In higher organisms, most protein-coding gene sequences are interrupted by stretches of noncoding DNA sequences, called introns. In the nucleus, these introns are removed after mRNA transcription to produce a continuous chain of coding sequences, or exons, which subsequently undergo translation into protein. The splicing process requires absolute precision, because the deletion or addition of a single nucleotide at the splice junction would throw the three-base coding sequence out of frame or lead to exon skipping or addition, thus creating abnormal proteins.


The dramatic increase in genetic complexity conferred by alternate RNA splicing is underscored by the multiple splice patterns of many medically relevant genes, in which different combinations of exons are chosen for the final mRNA transcript, such that one gene can encode many different proteins (Fig. 1-2). The choice of protein isoform to be expressed from a gene with multiple splicing possibilities is a decision that can be perturbed in disease. To date, errors in splicing mechanisms have been associated with a large group of cancers. These errors include mutations in several transcription factors, cell signaling, and membrane proteins. These include the oncogene p53 in more than 12 different types of cancer and mutL homolog 1 protein mutation in hereditary nonpolyposis colorectal cancer. When mutations in the splicing site lead to insertion of novel sequences in the mRNA, the encoded protein can be used as a potential clinical marker, as seen for the transcription factor NSFR in persons with small cell lung cancer. Because of their unique expression in cancer cells, these markers can be further explored as new cancer-specific therapeutic targets.




The Genomics of Cancer


The complete set of DNA sequences carried on all the chromosomes is known as the genome. Although the general map of the genome is shared by all members of a species, the recent sequencing of thousands of individual human genomes has given rise to the new field of genomics, providing us with new tools to reveal the more subtle variations that arise between individuals. These variations are critical, both as a natural engine driving heterogeneity within a species and as a source of predisposition to cancer types. The most common forms of human genetic variations, or alleles, arise as single-nucleotide polymorphisms, or SNPs. Because these allelic dissimilarities are abundant, inherited, and dispersed throughout the genome, SNPs can be used to track racial diversity, personal traits, and susceptibility to common forms of cancer (Fig. 1-3).



How do SNPs arise between individuals? One source of variation in DNA sequence derives from deviations in the strict base-pairing rule underlying the structure, storage, retrieval, and transfer of genetic information. The duplicated genetic information in the two strands of DNA not only permits the repair of a damaged coding sequence but also forms the basis for the replication of DNA. During cell division, polymerase enzymes unwind the DNA strands and copy them, using the base sequences as a template for constructing a new helix so that the dividing cell passes its entire genetic content on to its progeny. Errors in this process are rare, and person-to-person differences constitute only about 0.1% of the human genome. SNPs are inherited if they occur in the germline. Many genetically inherited variations occur in regions that do not encode protein or alter the regulation of nearby genes. Given the disruptive effects that even subtle genetic changes may have on cell function, it is important to distinguish SNPs that represent true mutations from benign polymorphisms.


Our ability to monitor hundreds of thousands of SNPs simultaneously is one of the most important advances in modern medical genetics. Relatively simple genotyping technologies for SNP detection rely largely on the polymerase chain reaction (PCR). In this procedure, two chemically synthesized single-stranded DNA fragments, or primers, are designed to match chromosomal DNA sequences flanking the segment in which an SNP is positioned. With the addition of nucleotide building blocks and a heat-stable DNA polymerase, the primer pairs, or amplicons, initiate synthesis of new DNA strands using the chromosomal material as a template. Each successive copying cycle, initiated by “melting” the resulting double-stranded products with heat, doubles the number of DNA segments in the reaction (Fig. 1-4). The technique is exceptionally sensitive; millions of identical DNA copies can be generated in a matter of hours with PCR using a single DNA molecule as the starting material.



Other novel methods for large-scale SNP detection include single nucleotide primer extension, allele-specific hybridization, oligonucleotide ligation assay, and invasive signal amplification, which detect polymorphisms directly from genomic DNA without the requirement of PCR amplification. The International HapMap project has been established with the objective of identifying those variations (commonly thought to be in the order of 10 million in our genome) in the human population. This project is already in its third phase (HapMap3) and now includes both SNPs and copy number variations observed in 1184 samples from 11 different human populations. Regardless of the method used to characterize them, the collective SNPs in a selected genomic region characterize a haplotype, or a specific combination of alleles at multiple linked genetic loci along a chromosome that are inherited together.


Even when the SNPs within a given haplotype are not directly involved in a disease, they provide markers for clonality and for the loss or rearrangement of specific chromosomal segments in growing tumors. In the human nucleus, each of the 23 tightly compacted chromosomes has a characteristic size and structure and a distinctive base sequence that carries unique protein coding information. Other noncoding DNA sequences are used for directing the transcription of neighboring genes through complex regulatory circuits involving protein binding and modification of the DNA itself, or shifting of its chromosomal packaging. Although genomic instability generally is considered a consequence of tumor formation rather than the initial trigger of cancer, the loss, gain, or rearrangement of chromosomal segments through deletion or translocation is a common form of neoplastic mutation, as protein-coding segments from different genes are combined or regulatory sequences are brought into new proximity to genes they do not normally control, as is seen in persons with chronic myeloid leukemia. In persons with chronic myeloid leukemia, recombination events lead to the fusion of BCR and ABL genes (Philadelphia chromosome). This process results in constitutive activation of the fused gene, leading to loss of proliferative control in myeloid cells and, consequently, cancer. Gross changes in DNA arrangement can be detected by cytogenetic analysis of chromosomal features on metaphase spreads. Fluorescent in situ hybridization provides greater resolution by localizing specific chromosomal DNA sequences corresponding to fluorescently labeled probes (Fig. 1-5) and can be used to track specific alterations in chromosomal structure where known genes are involved.



The plethora of data arising from genome-wide association studies using currently available techniques poses particular challenges to cancer researchers. Discerning the causal genetic variants among genotype-phenotype associations requires extensive replication, control for underlying genetic differences in population cohorts, and consistent classification of clinical outcomes. New technologies must be met with equivalently sophisticated and rigorous analytical methodologies for the true genetic cause of cancer to be teased out from our variable and often unstable heredity.



Building Gene Libraries


The engineering of genes by recombinant DNA technology evolved from methods initially devised to provide sequences in amounts sufficient for biochemical analysis. The original protocol involves clipping the desired segment from the surrounding DNA and inserting it into a bacterial or viral vector, which is then amplified millions of times in a host bacterium. With use of recombinant DNA technology, genetic engineering routinely can produce industrial quantities of pure, clinically useful products in a cost-effective manner. For diagnostic purposes, it is easier and faster to amplify a known genomic DNA sequence directly from a patient sample with PCR, but the classic approach is still applied to the construction of recombinant DNA libraries.


To be useful, a DNA library must be as complete as possible, with recombinant members, or clones, sufficiently numerous to include all the sequences in an individual genome. For certain kinds of gene-linkage analysis that require long, uninterrupted stretches of DNA, special vectors, such as bacterial or yeast artificial chromosomes, can carry foreign DNA fragments of enormous lengths. Chromosomal segments represented in genomic DNA libraries can contain the structure of an entire gene, including the information that regulates its expression, and formed the starting material for sequencing the human genome.


Many genes associated with cancer originally were identified using partial DNA libraries, which contain only the DNA sequences transcribed by a particular tissue or type of cell. The starting material in this case is mRNA. For cloning purposes, the enzyme reverse transcriptase can convert mRNA into complementary DNA (cDNA). The number of clones in a cDNA library is much smaller than in a genomic library, because a cDNA library represents only the genes expressed by the tissue of interest and contains exclusively the coding portion of genes. For this particular reason, this technique has become obsolete for organisms whose genome has now been fully sequenced. New advances in PCR chemistry allow for the direct cloning of increasingly larger cDNA fragments with high specificity and low error rates. Highly accurate PCR technology, coupled with the constant evolving generation of genomic sequence maps in humans and models organisms, has expanded exponentially the availability of candidate genes to be tested in cancer biology.



Losing Control of the Genome


Mutations that lead to oncogenic transformation of a cell invariably affect the expression of the cell’s genetic information that specifies functional products—either RNA molecules or proteins used for various cellular functions. The primary level of gene control is the transcription of DNA into RNA. Gene regulation, or the control of RNA synthesis, represents a complex process that itself is a frequent target of neoplastic mutation.


DNA regulatory sequences do not encode a product, and yet without them, a cell could not coordinate the expression of the hundreds of thousands of genes in its nucleus, select only certain genes for expression, and activate or repress them in response to precise internal or external signals. These control centers of the genome contain binding sites for multiple proteins, called transcription factors, which interact to form regulatory networks that control gene transcription. Their function can be altered by signals that induce modifications such as phosphorylation or by interactions with other regulators such as steroid hormones. Many of the cell’s responses to a wide variety of external stimuli, such as neurotransmitters, antigens, cytokines, and growth factors, are mediated through transcription factors binding to DNA regulatory sequences.


Certain regulatory DNA sequences common to many genes are positioned upstream of the transcription start site (Fig. 1-6). Collectively called the “promoter” of a gene, these proximal sequences constitute binding sites for the RNA polymerase and its numerous cofactors. Whereas the position of the promoter with regard to the transcription start site is relatively inflexible, other DNA regulatory elements, known as enhancers, occur in unpredictable locations, often at a considerable distance from the genes they control. Some transcription factors bind to particular regions of enhancers and drive their associated genes in many types of cells, whereas others, which are active in only a limited variety of cells, maintain a tissue-specific pattern of gene expression. Enhancers often are responsible for the aberrant expression of genes induced by chromosomal translocation-associated specific forms of cancer; for example, a normally quiescent gene promoting cell growth that is dislocated to a position near a strong enhancer may be activated inappropriately, resulting in loss of control of growth.


image
Figure 1-6 Mammalian gene structure and expression. The DNA sequences that are transcribed as RNA are collectively called the gene and include exons (expressed sequences) and introns (intervening sequences). Introns invariably begin with the nucleotide sequence GT and end with AG. An AT-rich sequence in the last exon forms a signal for processing the end of the RNA transcript. Regulatory sequences that make up the promoter and include the TATA box occur close to the site where transcription starts. Enhancer sequences are located at variable distances from the gene. Gene expression begins with the binding of multiple protein factors to enhancer sequences and promoter sequences. These factors help form the transcription-initiation complex, which includes the enzyme RNA polymerase and multiple polymerase-associated proteins. The primary transcript (pre-messenger RNA [mRNA]) includes both exon and intron sequences. Posttranscriptional processing begins with changes at both ends of the RNA transcript. At the 5′ end, enzymes add a special nucleotide cap; at the 3′ end, an enzyme clips the pre-mRNA about 30 base pairs after the AAUAAA sequence in the last exon. Another enzyme adds a polyA tail, which consists of up to 200 adenine nucleotides. Next, spliceosomes remove the introns by cutting the RNA at the boundaries between exons and introns. The process of excision forms lariats of the intron sequences. The spliced mRNA is now mature and can leave the nucleus for protein translation in the cytoplasm. (From Rosenthal N. Regulation of gene expression. N Engl J Med 1994;331:931–2.)

Enhancers and promoters have been assigned specific roles by means of cell culture assays or in transgenic animals in which putative regulatory DNA sequences are linked to test or “reporter” genes, and they are examined for their ability to activate expression of the reporter gene in response to the appropriate signals. By assessing the effects of deleting, adding, or changing DNA sequences within the regulatory element, the precise nucleotides that are critical for recognition by transcription factors can be determined.

Stay updated, free articles. Join our Telegram channel

Jun 13, 2016 | Posted by in ONCOLOGY | Comments Off on Molecular Tools in Cancer Research

Full access? Get Clinical Tree

Get Clinical Tree app for offline access