Molecular Diagnostics

SEQUENCING

Chung-Han Lee

Classic Sanger Sequencing
Developed in 1977 by Frederick Sanger & colleagues
Required reagents	DNA template (what you want to seq) DNA primer (known seq) dNTPs ddNTPs DNA polymerase
The principle	Mixing dNTPs & ddNTPs results in random chain termination yielding different size DNA fragments → different size DNA fragments are separated by size → fragments are detected w/radioactively/fluorescently labeled ddNTPs or DNA primers DNA polymerase CAN replicate & extend seq w/dNTPs; DNA polymerase CANNOT extend seq w/ddNTPs
Limitations	Poor seq quality in first 50 bp of seq Seq degradation after >700 bp Known DNA seq needed to design primers Cheap on a per sample bases Expensive on a per genome basis Impractical for large seq projects

Peaks represent abundance of fluorescent readout at specific nucleotide.

Each dNTP labeled a different color (not shown)

Discrimination of nucleotide identity requires peak to be significantly higher than baseline.

Pure samples yield single peaks at specific location.

Heterogenous mixtures will yield multiple color peaks at specific site (eg, 50% of tumor sample carries point Mt → tumor: WT at 1:1 ratio → two different color peaks at 1:1 ratio)

Detection of low frequency point Mts technically difficult (eg, 10% of tumor samples carry point Mt → tumor:

WT at 1:9 ratio → two different color peaks at 1:9 ratio, smaller peak difficult to distinguish from baseline noise)

Figure 5-1 Example of DNA Sequencing by Sanger Method

Next Generation Sequencing
The principle	Simultaneous seq of multiple short DNA (<200 bp) fragments can be done in a high-throughput manner Overlapping seq fragments can be reassembled afterward
Depth of coverage	Measures number of times specific site has been seq ↑ depth of coverage → ↑ quality of seq data 60× coverage does not mean whole genome seq 60×, it means 60× on average Some sites >100× & some sites not seq
Limitations	High initial equipment startup costs Quality of data depends on depth of coverage Sn for rare Mts or tumor heterogeneity depends on depth of coverage

Next Generation Sequencing: Other Concepts

Seq by synthesis
- DNA is extracted → fragmented → attached to a surface → amplified
- Labeled nucleotides are added & detected w/a surface scanner
- Labeled nucleotides either replaced by nonlabeled nucleotide or label is removed
- Process is repeated w/new labeled nucleotides
- Seq will become out of phase w/each successive round → ↓ accuracy + signal
Seq by ligation
- DNA is extracted → fragmented → adapters are added & attached to beads & surface → DNA amplified
- 8-mers w/degenerate primers attached to template
  - Base 1-2: Specific nucleotides that match template
  - Base 3-5: Degenerate nucleotides that match multiple things
  - Base 6-8: Degenerate nucleotides + fluorophore
- Detection of fluorophore
- Cleavage & removal of bases 6-8 of 8-mer
- Repeat w/new 8-mers × 5-7 cycles
- Remove all annealed 8-mers & repeat process but offsetting start site by 1 position
- End result—seq 35 bps twice (extending w/8-mers 7 times × 5 different start sites × 2 different nucleotides assayed each time)
- Each nucleotide assayed on template both as position 1 & position 2 of 8-mer ie, how many times it was seq (Nat Biotechnol 2009;27:1013)

Approaches and Clinical Application of Next Generation Sequencing

Approach

Example

Large scale exome seq for discovery of new Mts in CA

VHL Mt most common Mt ccRCC, but insufficient for carcinogenesis

Large scale exome seq of 3544 genes, 101 ccRCC

VHL Mt 55%

SETD2, KDM5c, KDM6a Mt total ˜ 15%

(Nature 2010;463:360)

Discovery cohort → targeted seq

Targeted exome seq of 7 ccRCC tumors

PBRM1 Mt in 4/7 tumors

Targeted seq of 257 ccRCC tumors → Mt in 88/257 (34%) (Nature 2011;469:539)

Whole exome seq of 7 ccRCC tumors w/paired nl → identification of BAP1 Mt

Targeted seq of 176 tumors, BAP 1 Mt 14%

(Nat Genet 2012;44:751)

Issues illustrated

3 NGS seq projects → 3 sets of Mts discovered

Whole genome seq ≠ whole exome seq

Whole exome seq ≠ targeted exome seq

Cost → use of next generation sequencing (NGS) for discovery → targeted seq (single gene) for validation

Next Generation Sequencing: Challenges

Clinical:
- How do we determine what info is clinically relevant? How do we interpret the data that is obtained? Can we predict natural hx, response to Rx, or resistance? How do we deal w/tumor heterogeneity? Are met sites genetically different from 1° sites? Does selective pressure by prior Rx change a tumor’s genetic profile? Is early detection of resistance genes clinically relevant?
Logistic:
- W/c seq technology do we use? What depth of coverage is necessary? What turnaround time for the assay is necessary to be clinically useful? How do we store the massive amounts of data generated? Who is in charge of safeguarding the info? How do we reduce the costs of the assay? How do we detect Mts that develop w/selective pressure?
Ethical:
- How do you deal w/informed consent? How do you deal w/incidental findings?
- Who has the rights to the info? How do we protect pt privacy?

GENE EXPRESSION PROFILING

Chung-Han Lee

Gene Expression Profiling

Goal: Get a global understanding of cellular functions by the simultaneous measurement of thousands of genes
Rationale:
- DNA is shared among multiple cell types; however, cell fates & physiology varies
- Protein expression & post-translational modification ultimately decide physiology; however, global assessment of protein levels & states remains impractical
- Measuring RNA levels can serve as a proxy for measuring proteins
Assumptions:
- RNA levels correlate to protein expression (ie, mRNA that is transcribed proportional to protein translated)
- Protein levels drive activity > post-translational modification
- Identified RNA → unique protein (ie, assay distinguishes between alternative splice forms)
Basic Experimental Design:
- Two or more experimental conditions are designed
- Samples from the experimental conditions are assayed
- Statistical comparisons are made between expression levels of thousands of genes
- Individual genes vs. clusters of related genes are used to define the roles of experimental conditions
Limitations:
- Expense of assay limits numbers of samples → limits statistical power
- Different genes may have different thresholds for significant changes (eg, changes in PTEN mRNA levels more significant than changes in actin mRNA levels)
- In tightly regulated proteins constant mRNA may not equal protein levels
- Proteins in signal transduction pathways exist in on/off states related to posttranslational modification, localization, & binding partners, mRNA levels give no info regarding those states

DNA Microarrays

AKA: Gene chips
The principle:
- Probes = short unique seq of DNA designed to bind specific cDNA or mRNA
- Targets = cDNA or mRNA from genes of interest
- A microfluidic chip is designed w/DNA probes placed at known locations
- Samples are placed on chip to allow probes to capture targets
- Binding of probe & targets are assayed

	1 Channel Arrays	2 Channel Arrays
Definitions	Samples are labeled w/a single color & run individually on a chip	2 samples are labeled w/2 color & run together on a chip
Benefits	Errors in 1 sample do not affect other samples Easier to compare between samples	Cheaper cost/sample
Examples	Agilent—dual mode platform	Illumina—bead chip

RNA-Seq

AKA: Whole transcriptome seq
The principle:
- Build a cDNA library—coding RNAs identified by 3’ polyA tail, ribosomal RNA removed by collecting RNAs containing polyA tail, reverse transcription generates cDNA
- cDNA is seq using next gen. seq technology

Data Analysis

Controversial & still subject to research

Fold change as cutoff: Easiest, but arbitrary lacks biologic rationale

Statistical testing such as ANOVA: Complicated by large numbers of genes involved, (eg, p-value < 0.01, examining 10000 genes → 100 by chance alone)

Q-value: Proposed by Yoav Benjamini & Yosi Hochberg, analogue of p-value in FDR statistical test, helps balance tradeoff between power & error

Principle Components Analysis (PCA)
- Reduces dimensions on analysis by removing or consolidating data
- Chooses subset of “independent” variables
- Assumes variables w/low variance yields little info & discards info
- Principle component is normalized linear combination of original variables
- Lossy & simplifies data for quick comparison of samples
Self-Organizing Maps
- Nonlinear generalization of principle components analysis (PCA)
- Originally adapted from unsupervised neural network learning algorithms
- Samples compete to become the most representative sample for each variable
- Samples organize by similarity to the representative samples
K-means Clustering
- Investigator identifies number of clusters before clustering (parameter k)
- k number of means are chosen
- Genes & samples clustered based on distance from mean
- Means are recalculated based on new clustering
- Process is repeated until results converge & clusters are stable
- k choice is critical for correct clustering; however, difficult to predict
Hierarchical Clustering
- Initially all groups are considered individual “Clusters”
- Most similar clusters are combined until a single cluster remains
- Produces a clustering tree (dendrogram) showing a hierarchy of clusters
  
  Figure 5-2 Sample of Hierarchical Clustering
Gene Expression Profiling and Oncology:
- Expression profiling can help reclassify tumor types, correlate to physiologic properties
- 8000 individual genes examined in 60 cell lines
- Correlations in gene expression pattern to cell of origin
  
  Only gold members can continue reading. Log In or Register to continue
  
  Related posts:
  
  Radiation Oncology Pain and Palliative Care Lymphomas Radiation Oncology Cancer Survivorship Skin Cancers and Sarcomas
  
  Stay updated, free articles. Join our Telegram channel