Cancer Proteomics

Estimated new cases

Estimated deaths

Both genders

Male

Female

Both genders

Male

Female

All sites

1,665,540

855,220

810,320

585,720

310,010

275,710

Oral cavity and pharynx

42,440

30,220

12,220

8390

5730

2660

Digestive system

289,610

162,730

126,880

147,260

84,970

62,290

Respiratory system

242,550

130,000

112,550

163,660

90,280

73,380

Bones and joints

3020

1680

1340

1460

830

630

Soft tissue

12,020

6550

5470

4740

2550

2190

Skin

81,220

46,630

34,590

12,980

8840

4140

Breast

235,030

2360

232,670

40,430

430

40,000

Genital system

338,450

243,460

94,990

59,970

30,180

28,790

Urinary system

141,610

97,420

41,190

30,350

20,610

9740

Eye and orbit

2730

1440

1290

310

130

180

Brain and nervous system

23,380

12,820

10,560

14,320

8090

6230

Endocrine system

65,630

16,600

49,030

2820

1300

1520

Lymphoma

79,990

43,340

36,650

20,170

11,140

9030

Myeloma

24,050

13,500

10,550

11,090

6110

4980

Leukemia

52,380

30,100

22,280

24,090

14,040

10,050

Other

31,430

16,370

15,060

44,680

24,780

19,900

The fifth leading cause of cancer deaths among women is ovarian cancer. The death rate for this cancer illustrates the need for better diagnostic tools. It is estimated that during 2014, 21,980 women will be diagnosed with ovarian cancer during and 14,270 will die from this disease (http://www.ovariancancer.org/about-ovarian-cancer/statistics/) [1]. Funding to support ovarian cancer research in the USA has predominantly come from federal agencies, which committed $112 million in the year 2010 (http://www.ovariancancerresearch.thegcf.org/). Other nonprofit organizations have contributed another $16 million, for a total national commitment of at least $128 million to support ovarian cancer research. Overall this funding level represents greater than $9000 for every woman who will die from ovarian cancer. Unfortunately, even at this fiscal commitment, the prognosis for women diagnosed with ovarian cancer remains poor. This disease does not have a reliable early detection or screening test, resulting in more than 60 % of patients being initially diagnosed with stage III or stage IV cancer. At these stages the cancer has already spread beyond the ovaries. A lack of progress in diagnosing and/or treating ovarian cancer is reflected in Table 6.2 (http://www.ovariancancer.org/about-ovarian-cancer/statistics/). Over the first decade of this century, there was little change in the number of women diagnosed with this disease and no significant change in the number of deaths. Overall the ratio of new cases report and deaths shows no significant change over this 10 year period.

Table 6.2

Yearly ovarian cancer cases and deaths in the USA between 1999 and 2009

Year	Cases	Deaths	Cases/Deaths
2009	21,500	14,600	1.47
2005	19,842	14,787	1.34
2004	20,069	14,716	1.36
2003	20,445	14,657	1.39
2002	19,792	14,682	1.35
2001	19,719	14,414	1.37
2000	19,672	14,060	1.40
1999	19,676	13,627	1.44

Ovarian cancer is typically diagnosed in patients that already show symptoms. The symptoms may include pressure or pain within the pelvis, abdomen, back, or legs, bloating within the abdomen, chronic tiredness, gas, diarrhea, constipation, indigestion, and nausea (http://www.medicinenet.com/ovarian_cancer/article.htm; http://www.mayoclinic.com/health/ovarian-cancer/DS00293/DSECTION=symptoms). Other less common symptoms include unusual vaginal bleeding and urinary frequency. Unfortunately the symptoms of ovarian cancer are nonspecific and are similar to those other more common conditions such as digestive and bladder disorders. Before being diagnosed with ovarian cancer, a woman may be told she has another condition such as irritable bowel syndrome, stress and depression. If the physician suspects ovarian cancer he/she will order additional tests. A physical exam, in which the doctor presses on the abdomen to check for tumors or ascites fluid buildup, as well as a fluid examination, may be conducted. A pelvic exam to check the ovaries and nearby organs for lumps or abnormalities in shape or size may be done along with an ultrasound to get a picture of the ovaries and surrounding tissues. The doctor may also order a blood test to measure the levels of mucinous glycoprotein serum cancer antigen 125 (CA125). After all of these tests are conducted, a biopsy may be required to physically remove tissue from the ovary, as well as surrounding fluid, and have them examined by a pathologist. It is easily seen why most women that receive a diagnosis of ovarian cancer are already at an advanced stage. It would be ideal if the circulating CA125 levels could be used to routinely screen women for ovarian cancer at yearly physicals. However, its measurement lacks the necessary sensitivity and specificity (http://www.cancer.gov/cancertopics/factsheet/Detection/tumor-markers). In fact the US Food and Drug Administration (FDA) only approves the use of CA125 for monitoring the response to ovarian cancer treatment and for detecting its post-treatment recurrence. One of the major issues associated with using CA125 measurement to diagnose ovarian cancer is the large number of false positives that it predicts [2]. Obviously CA125, like the other well-known prostate cancer biomarker prostate-specific antigen, lack the sensitivity and specificity required for diagnosing patients.

6.2 Hypothesis Versus Discovery-Driven Studies

One important change in how cancer research is conducted in the omics-age is perspective. In previous decades where data collection was often challenging most cancer studies were hypothesis driven. In a hypothesis-driven study, the investigator formulates an educated guess to explain a cause-and-effect relationship. A series of experiments are performed to examine this idea and determine if the hypothesis is correct. In a discovery-driven study, the goal is to collect a specific type of data on as many features as possible and use the data to draw a conclusion or formulate a hypothesis. The success of many discovery-driven studies is how well the data can formulate novel specific hypotheses. The advent of technologies that can gather large amounts of data on thousands of biological molecules in clinical samples has driven cancer research towards discovery-driven studies at an exponential rate. A great example is genome-wide association studies (GWAS) [3]. In GWAS, the genomes of different individuals are sequenced to determine how their genes vary. The most common comparisons made in GWAS studies are between healthy individuals and those with a specific disease condition. The aim is to find regions in the genome that can be associated with specific traits such as diseases.

In the field of proteomics, the development of high-throughput technologies has enabled discovery-driven studies as never before. As shown in Fig. 6.1, these studies include biomarker discovery, quantitative proteomics, protein complex characterization, and global proteome characterization. Probably the biggest impact to cancer research has been the ability to conduct discovery driven studies to find cancer-specific biomarkers. The greatest challenge in finding biomarkers for specific cancers is having no a priori information as to the identity (or even character) of a protein biomarker for any cancer. It is even impossible to determine which class of protein (e.g., kinase, phosphatase, membrane, or nuclear) a biomarker likely belongs. Therefore, this type of study is often left to a purely unbiased discovery-driven in which samples from healthy and cancer-affected individuals are compared. In addition, the ability to collect large amounts of data has also impacted how hypothesis-driven proteomic experiments are conducted. For example, protein complexes no longer need to be characterized protein by protein, rather the entire complex can be analyzed in a single experiment. The amount of data collected in an experiment does not necessarily indicate if it is hypothesis-driven or discovery-driven. For example, clinical trials that collect a great deal of patient data are primarily hypothesis-driven as specific outcomes are being monitored. This ability to conduct meaningful discovery-driven studies has provided the basis for the large increase in proteomic biomarker discovery, primarily for cancers. Reasonable or not, biomarker discovery projects begin with essentially no information as to the identity of the protein of interest. The only reliable information available is the source of the samples being analyzed. To understand how proteomics is applied to cancer research it is important to gain an understanding of the major technologies used in this field.

Fig. 6.1

List of discovery-driven and hypothesis-driven studies . In the proteomics era, some studies (such as protein complex characterization and biomarker identification) can be performed using either approach.

6.3 Proteomic Technologies

The previous decade has seen a revolution on how cancer is studied. In times past, cancer research was conducted on a molecule-by-molecule basis, whereas today’s experiments aim to simultaneously uncover large numbers of genes, transcripts, proteins, and metabolites that characterize a specific cancer. The primary driver that introduced this new omics era was technology development. Massively parallel sequencing methods, such as 454 Sequencing, have made surveying entire genomes for genetic aberrations possible [4]. High-density microarrays allow tens of thousands of transcripts to be characterized in a high-throughput fashion [5]. For proteins and metabolites, advances in chromatography and mass spectrometry (MS) have permitted thousands of these molecules to be characterized in complex clinical samples [6, 7].

In the field of proteomics, gone are the days in which cancer was studied at an individual protein level. While fundamental technologies such as Western blotting, enzyme-linked immunosorbent assays (ELISA), immunoprecipitations (IPs), and other methods will continue to be critical in protein analysis, modern proteomic technologies enable cancer to be studied at pathway, network, and global levels. An excellent example of proteomic technologies changing the way a fundamental experiment is conducted is illustrated by the analysis of protein complexes (Fig. 6.2) [8]. Protein complex isolation has been routinely conducted using IPs. Historically, the extracted material has been separated using sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) and then transferred to a polyvinylidene fluoride (PVDF) membrane. To identify a protein interaction, the PVDF membrane is blotted using an antibody directed against a protein hypothesized to interact with the protein targeted in the IP. While this hypothesis-driven strategy has proven fruitful over the years, it provides a limited amount of information and lacks the ability to find truly novel interactions that fall outside the investigators hypotheses.

Fig. 6.2

Identifying protein interactions using hypothesis and discovery-driven methods. In a hypothesis-driven approach a protein complex is isolated using a technique such as immunoprecipitation (IP). The complex is separated by gel electrophoresis and the proteins blotted onto a membrane, which is interrogated using an antibody specific for a protein that is hypothesized to interact with the target protein. In the discovery-driven approach the complex is separated using gel electrophoresis, the protein bands are colorimetrically stained, and individually subjected to in-gel digestion. After extracting the peptides from each gel piece, they are individually analyzed using mass spectrometry (MS). While the hypothesis-driven method results in the identification of a single potential protein interaction per experiment, the discovery-driven approach identifies several potentially novel interactions.

The proteomics era has brought the ability to conduct more discovery-driven approaches. For example, in the characterization of protein-protein interactions no longer requires a hypothesis to identify individual proteins, rather the entire complex can be characterized using mass spectrometry (MS). The procedure for characterizing protein complexes using MS is straightforward and incorporates many of the steps used in the hypothesis-driven strategy (Fig. 6.2). The isolated protein complex is separated using SDS-PAGE and rather than transferring the proteins to a PVDF membrane, they are stained using a colorimetric reagent such as Coomassie blue or silver stain. The protein bands are excised from the gel and are subjected to in-gel digestion with a proteolytic enzyme (usually trypsin). Peptides are extracted from the individual gel pieces and identified using MS. The identified peptides are then compared to a proteomic or translated genomic database to determine the origin of the peptides. This discovery-driven approach offers a number of key advantages over hypothesis-driven methods. Several interacting proteins can be identified within a single study, no antibodies (save those used to isolate the complex) are required, and truly novel protein interactions can be identified since any testing does not rely on a formulated hypothesis.

6.3.1 Mass Spectrometry

The greatest fundamental advance that proteomics has contributed to cancer research is in the identification of proteins. While having been a mainstay in biochemistry for over a half-century, the technology used for identifying proteins has seen enormous leaps in the past decade. In the 1950s Edman degradation enabled the sequencing of purified proteins [9]. While Edman sequencing is still in use today, it has been largely replaced by MS. The two primary reasons are throughput and the fact that MS does not require a pure sample to confidently identify a protein. Mass spectrometry is able to identify a single highly purified protein or identify thousands of proteins within a complex mixture. The two main methods used to identify proteins by MS are peptide mapping [10] and tandem MS (MS/MS) [11].

To identify a protein using peptide mapping, it is first enzymatically digested into peptide fragments (Fig. 6.3). Trypsin is by far the most commonly used protease for this purpose owing to its specificity and the relative distribution of lysine and arginine residues throughout proteins in the proteome. The masses of the tryptic peptides are measured using MS and this list of experimental masses is compared to theoretical masses of the tryptic peptides that would result from a tryptic in silico digest of proteins present in a protein or genomic database. The protein is identified based on the best match between the experimental peptide masses and the digest of the proteins within the database. The drawback with peptide mapping is that, although it does not require a pure protein, the best identifications come from samples in which the target protein is highly enriched.

Fig. 6.3

Identification of a protein via peptide mapping . Proteins are digested into peptides (usually tryptic) and their masses are measured using mass spectrometry (MS). The resulting masses are compared to theoretical mass spectra that are generated by an in silico digest of a suitable proteome database. The protein is identified by the correspondence between the list of experimental peptide masses and the in silico digest of each protein within the database.

When using MS/MS for identification (Fig. 6.4), individual peptides enter the mass spectrometer and are isolated within the instrument and collided with an inert gas (i.e., He, Ar, N₂, etc.). This process, known as collisional induced dissociation (CID) breaks the peptide into fragments that are then sent to detector. Fortunately, the ways in which peptides fragment are fairly well understood with cleavage across the amide bond one of the favored fragmentation pathways. In addition, CID does not cleave every amide bond in every peptide. The fragmentation results in populations of various lengths of residues both from the N-terminus and C-terminus, as well as internally, being recorded for each peptide. Peptides are identified by how well the fragmentation pattern matches the in silico fragmentation patterns calculated from all corresponding peptides within a database. When the sample is primarily comprised of a single protein, the MS/MS spectra of several peptides are used to confidently identify the protein. When very complex mixtures are being analyzed, the MS/MS spectrum of a single peptide can provide evidence for the presence of a specific protein within the sample. However, the rule is that the more peptides identified, the greater the confidence that the protein is present in the sample. The advantage of MS/MS is that it identifies peptides based on sequence information.

Fig. 6.4

Identification of a protein via tandem mass spectrometry (MS/MS). A specific peptide entering the mass spectrometer (large grey circle) is isolated within the instrument and subjected to collisional induced dissociation (CID). The fragments of the peptide produced by CID are then directed onto the detector where the MS/MS spectrum is acquired. The most common fragmentation occurs across the amide bond creating a series of b and y ions. Software programs compare the MS/MS spectra with proteomic databases to determine the sequence of the peptides subjected to CID.

When combined with high resolution liquid chromatography (LC), modern mass spectrometers can sequence thousands of peptides per hour. For these types of studies, a complex sample (e.g., biofluid, tissue, or cell lysate) is digested into tryptic peptides. These peptides are fractionated by a selected number of chromatography methods prior to being eluted directly into the mass spectrometer. The most commonly used modes of fractionation involve strong cation exchange followed by reversed phase LC. As they enter the mass spectrometer, the instrument isolates individual peptides based on their intensity and subjects them to CID (Fig. 6.5). The MS/MS spectra are then analyzed against a suitable database turning the raw data into identified peptides. The identified peptides are binned to their specific protein of origin. The seminal example of using MS to identify large numbers of proteins in a biological sample was developed by Dr. John Yates’ lab in 2001 [12]. A dual strong cation exchange/reversed-phase LC column was used to separate tryptically digested yeast proteins directly on-line with an ion-trap mass spectrometer. This experiment resulted in the identification of almost 1500 proteins. This multidimensional protein identification technology or MudPIT, as it was coined, was subsequently used by different laboratories to interrogate clinical samples with the ultimate goal of discovering biomarkers. This technique, now commonly referred to as shotgun proteomics , has been used to tremendously increase our knowledge of the proteome of peripheral body fluids such as serum, plasma, and urine. While today investigators have been able to identify upwards of 4000 proteins within complex samples [13–15], the mass spectrometer is still unable to identify all of the proteins within a complex mixture.

Fig. 6.5

Data-dependent tandem mass spectrometry identification of peptides in complex mixtures. In this method, peptides observed in the MS scan are selected for MS/MS based on their signal intensity. After the MS/MS spectrum of the most intense peptide in the preceding MS scan is acquired, the next most intense peptide signal is selected and so on. After 5–10 peptides are selected the mass spectrometer acquires another MS scan to find new signals from peptides that have eluted into the instrument. This sequence of MS and MS/MS occurs at a rate of approximately 7000 times per hour resulting in the identification of thousands of peptides from complex mixtures.

6.3.2 Protein Arrays

While MS has been the primary tool for advancing cancer research into the proteomics era, protein arrays represent a more directed and potentially more powerful tool to perform broad proteome surveys of different cellular systems. Owing to its dominance, this chapter is devoted to the role of MS in cancer proteomics. However, it is worth including a section to make the reader aware of protein array technologies. While MS will continue to play a dominant role in biomarker discovery, protein arrays will be increasingly important in validation studies. Like ELISA’s, Western blotting, and immunohistochemistry (IHC), protein arrays make use of affinity reagents (primarily antibodies) to measure changes in proteins present within complex samples. Put simply, protein arrays are essentially an attempt to simultaneously conduct hundreds of microscale IHCs, ELISAs, or Westerns. While protein arrays may seem limited since they target specific proteins (hypothesis-driven), large numbers of different antibodies can be used to provide a comprehensive view of protein changes.

Only gold members can continue reading. Log In or Register to continue