Molecular Diagnostics



Molecular Diagnostics





SEQUENCING

Chung-Han Lee





















Classic Sanger Sequencing


Developed in 1977 by Frederick Sanger & colleagues


Required reagents


DNA template (what you want to seq)


DNA primer (known seq)


dNTPs


ddNTPs


DNA polymerase


The principle


Mixing dNTPs & ddNTPs results in random chain termination yielding different size DNA fragments → different size DNA fragments are separated by size → fragments are detected w/radioactively/fluorescently labeled ddNTPs or DNA primers


DNA polymerase CAN replicate & extend seq w/dNTPs; DNA polymerase CANNOT extend seq w/ddNTPs


Limitations


Poor seq quality in first 50 bp of seq


Seq degradation after >700 bp


Known DNA seq needed to design primers


Cheap on a per sample bases


Expensive on a per genome basis


Impractical for large seq projects


Peaks represent abundance of fluorescent readout at specific nucleotide.

Each dNTP labeled a different color (not shown)

Discrimination of nucleotide identity requires peak to be significantly higher than baseline.

Pure samples yield single peaks at specific location.

Heterogenous mixtures will yield multiple color peaks at specific site (eg, 50% of tumor sample carries point Mt → tumor: WT at 1:1 ratio → two different color peaks at 1:1 ratio)

Detection of low frequency point Mts technically difficult (eg, 10% of tumor samples carry point Mt → tumor:

WT at 1:9 ratio → two different color peaks at 1:9 ratio, smaller peak difficult to distinguish from baseline noise)






Figure 5-1 Example of DNA Sequencing by Sanger Method



















Next Generation Sequencing


The principle


Simultaneous seq of multiple short DNA (<200 bp) fragments can be done in a high-throughput manner


Overlapping seq fragments can be reassembled afterward


Depth of coverage


Measures number of times specific site has been seq


↑ depth of coverage → ↑ quality of seq data


60× coverage does not mean whole genome seq 60×, it means 60× on average


Some sites >100× & some sites not seq


Limitations


High initial equipment startup costs


Quality of data depends on depth of coverage


Sn for rare Mts or tumor heterogeneity depends on depth of coverage




Next Generation Sequencing: Other Concepts



  • Seq by synthesis



    • DNA is extracted → fragmented → attached to a surface → amplified


    • Labeled nucleotides are added & detected w/a surface scanner


    • Labeled nucleotides either replaced by nonlabeled nucleotide or label is removed


    • Process is repeated w/new labeled nucleotides


    • Seq will become out of phase w/each successive round → ↓ accuracy + signal


  • Seq by ligation



    • DNA is extracted → fragmented → adapters are added & attached to beads & surface → DNA amplified


    • 8-mers w/degenerate primers attached to template



      • Base 1-2: Specific nucleotides that match template


      • Base 3-5: Degenerate nucleotides that match multiple things


      • Base 6-8: Degenerate nucleotides + fluorophore


    • Detection of fluorophore


    • Cleavage & removal of bases 6-8 of 8-mer


    • Repeat w/new 8-mers × 5-7 cycles


    • Remove all annealed 8-mers & repeat process but offsetting start site by 1 position


    • End result—seq 35 bps twice (extending w/8-mers 7 times × 5 different start sites × 2 different nucleotides assayed each time)


    • Each nucleotide assayed on template both as position 1 & position 2 of 8-mer ie, how many times it was seq (Nat Biotechnol 2009;27:1013)


Approaches and Clinical Application of Next Generation Sequencing

















Approach


Example


Large scale exome seq for discovery of new Mts in CA


VHL Mt most common Mt ccRCC, but insufficient for carcinogenesis


Large scale exome seq of 3544 genes, 101 ccRCC


VHL Mt 55%


SETD2, KDM5c, KDM6a Mt total ˜ 15%


(Nature 2010;463:360)


Discovery cohort → targeted seq


Targeted exome seq of 7 ccRCC tumors


PBRM1 Mt in 4/7 tumors


Targeted seq of 257 ccRCC tumors → Mt in 88/257 (34%) (Nature 2011;469:539)


Whole exome seq of 7 ccRCC tumors w/paired nl → identification of BAP1 Mt


Targeted seq of 176 tumors, BAP 1 Mt 14%


(Nat Genet 2012;44:751)



Issues illustrated

3 NGS seq projects → 3 sets of Mts discovered

Whole genome seq ≠ whole exome seq

Whole exome seq ≠ targeted exome seq

Cost → use of next generation sequencing (NGS) for discovery → targeted seq (single gene) for validation


Next Generation Sequencing: Challenges



  • Clinical:



    • How do we determine what info is clinically relevant? How do we interpret the data that is obtained? Can we predict natural hx, response to Rx, or resistance? How do we deal w/tumor heterogeneity? Are met sites genetically different from sites? Does selective pressure by prior Rx change a tumor’s genetic profile? Is early detection of resistance genes clinically relevant?


  • Logistic:



    • W/c seq technology do we use? What depth of coverage is necessary? What turnaround time for the assay is necessary to be clinically useful? How do we store the massive amounts of data generated? Who is in charge of safeguarding the info? How do we reduce the costs of the assay? How do we detect Mts that develop w/selective pressure?


  • Ethical:



    • How do you deal w/informed consent? How do you deal w/incidental findings?


    • Who has the rights to the info? How do we protect pt privacy?



GENE EXPRESSION PROFILING

Chung-Han Lee


Gene Expression Profiling



  • Goal: Get a global understanding of cellular functions by the simultaneous measurement of thousands of genes


  • Rationale:



    • DNA is shared among multiple cell types; however, cell fates & physiology varies


    • Protein expression & post-translational modification ultimately decide physiology; however, global assessment of protein levels & states remains impractical


    • Measuring RNA levels can serve as a proxy for measuring proteins


  • Assumptions:



    • RNA levels correlate to protein expression (ie, mRNA that is transcribed proportional to protein translated)


    • Protein levels drive activity > post-translational modification


    • Identified RNA → unique protein (ie, assay distinguishes between alternative splice forms)


  • Basic Experimental Design:



    • Two or more experimental conditions are designed


    • Samples from the experimental conditions are assayed


    • Statistical comparisons are made between expression levels of thousands of genes


    • Individual genes vs. clusters of related genes are used to define the roles of experimental conditions


  • Limitations:



    • Expense of assay limits numbers of samples → limits statistical power


    • Different genes may have different thresholds for significant changes (eg, changes in PTEN mRNA levels more significant than changes in actin mRNA levels)


    • In tightly regulated proteins constant mRNA may not equal protein levels


    • Proteins in signal transduction pathways exist in on/off states related to posttranslational modification, localization, & binding partners, mRNA levels give no info regarding those states


DNA Microarrays



  • AKA: Gene chips


  • The principle:



    • Probes = short unique seq of DNA designed to bind specific cDNA or mRNA


    • Targets = cDNA or mRNA from genes of interest


    • A microfluidic chip is designed w/DNA probes placed at known locations


    • Samples are placed on chip to allow probes to capture targets


    • Binding of probe & targets are assayed

























1 Channel Arrays


2 Channel Arrays


Definitions


Samples are labeled w/a single color & run individually on a chip


2 samples are labeled w/2 color & run together on a chip


Benefits


Errors in 1 sample do not affect other samples


Easier to compare between samples


Cheaper cost/sample


Examples


Agilent—dual mode platform


Illumina—bead chip



RNA-Seq



  • AKA: Whole transcriptome seq


  • The principle:



    • Build a cDNA library—coding RNAs identified by 3’ polyA tail, ribosomal RNA removed by collecting RNAs containing polyA tail, reverse transcription generates cDNA


    • cDNA is seq using next gen. seq technology


Data Analysis

Controversial & still subject to research

Fold change as cutoff: Easiest, but arbitrary lacks biologic rationale

Statistical testing such as ANOVA: Complicated by large numbers of genes involved, (eg, p-value < 0.01, examining 10000 genes → 100 by chance alone)

Q-value: Proposed by Yoav Benjamini & Yosi Hochberg, analogue of p-value in FDR statistical test, helps balance tradeoff between power & error




  • Principle Components Analysis (PCA)



    • Reduces dimensions on analysis by removing or consolidating data


    • Chooses subset of “independent” variables


    • Assumes variables w/low variance yields little info & discards info


    • Principle component is normalized linear combination of original variables


    • Lossy & simplifies data for quick comparison of samples


  • Self-Organizing Maps



    • Nonlinear generalization of principle components analysis (PCA)


    • Originally adapted from unsupervised neural network learning algorithms


    • Samples compete to become the most representative sample for each variable


    • Samples organize by similarity to the representative samples


  • K-means Clustering



    • Investigator identifies number of clusters before clustering (parameter k)


    • k number of means are chosen


    • Genes & samples clustered based on distance from mean


    • Means are recalculated based on new clustering


    • Process is repeated until results converge & clusters are stable


    • k choice is critical for correct clustering; however, difficult to predict


  • Hierarchical Clustering



    • Initially all groups are considered individual “Clusters”


    • Most similar clusters are combined until a single cluster remains


    • Produces a clustering tree (dendrogram) showing a hierarchy of clusters






      Figure 5-2 Sample of Hierarchical Clustering


  • Gene Expression Profiling and Oncology:

Jun 19, 2016 | Posted by in ONCOLOGY | Comments Off on Molecular Diagnostics

Full access? Get Clinical Tree

Get Clinical Tree app for offline access