Since the discovery of V(D)J recombination as the process that assembles the germline antigen receptor gene segments into functional genes, one major question was the identity of the enzymatic machinery catalyzing this complex set of reactions. Genetic and biochemical work by a large number of laboratories led to identification of a total of 13 different proteins that have been shown to be directly involved in V(D)J recombination: RAG1, RAG2, HMG1, Ku70, Ku80, DNA-PKcs, Artemis, pol µ, pol λ, TdT, XRCC4, Cernunnos/XRCC4-like factor (XLF), and DNA ligase IV. The only lymphoid-specific factors are RAG1, RAG2, and TdT; all others are ubiquitously expressed in all cell types, and this feature allows investigators to study aspects of V(D)J recombination by ectopically expressing the RAG proteins in nonlymphoid cells. A recent biochemical tour de force study showed that coding joint formation could be recapitulated in vitro
using artificial recombination substrates and highly purified preparations of all 13 proteins.39
The respective coding joints showed all of the features typically observed in vivo
(nucleotide deletion, N nucleotide, and P nucleotide addition), suggesting that most, if not all, of the factors involved in the coding end processing steps of V(D)J recombination have been identified. In contrast, signal joint formation was not observed. This step seems to require the removal of the RAG proteins after the cleavage reaction and is likely to require additional factors as yet unidentified.
Recombination Activating Gene Proteins: Mediators of Early Steps in V(D)J Recombination
A major advance in the investigation of V(D)J recombination was the identification of two genes whose products are critical for this process in the B and T cell lineages. In the pioneering experiments, Schatz and Baltimore40
stably transfected fibroblasts with a construct containing a selectable marker whose expression was dependent on V(D)J recombination; as expected, no measurable recombination occurred in this nonlymphoid cell. However, when either human or murine genomic DNA was transfected into these fibroblasts, a small fraction of recipient cells stably expressed recombinase activity, activating the selectable marker. This suggested that a single transfected genomic DNA fragment was able confer recombinase activity in a fibroblast. (Presumably the fibroblast contained endogenous copies of the same genes, but their expression was repressed by mechanisms that could not repress the transfected genes.) This active fragment was cloned and turned out to contain two closely linked genes, designated RAG1 and RAG2, respectively. Both RAG1 and RAG2 are essential for recombination; therefore, these genes would not have been discovered by this transfection technique if they had not been closely linked in the genome. The genes are notable for having no introns splitting up their open reading frame in most species, and for their opposite transcriptional orientation in all species examined.
A crucial role for the RAG genes in V(D)J recombination was supported by the conservation of these genes in all jawed vertebrate species analyzed thus far, from shark through man. RAG1 and RAG2 are expressed together in developing B and T cells, specifically at the stages at which V(D)J recombinase activity is required for the assembly of Ig and TCR genes. Moreover, mouse strains in which either gene has been eliminated by homologous recombination (gene “knockouts”) have no mature B or T cells, as the result of their inability to initiate V(D)J recombination.41,42
Similarly, a subset of human patients with SCID syndrome characterized by the complete absence of T- or B-lymphocytes have been found to have null mutations in RAG genes.43
Patients with hypomorphic alleles often have a complex set of features (oligoclonal T cells, hepatosplenomegaly, eosinophilia, decreased serum Ig but elevated IgE) known as the Omenn syndrome, which can also be caused by defects in other genes involved in V(D)J recombination. Interestingly, the same RAG mutation in different patients can cause either Omenn syndrome or SCID, depending on unknown factors.44
RAG1 shows intrinsic binding affinity for the RSS nonamer sequence via its nonamer binding domain even in the absence of RAG2. Exhaustive mutational analysis has revealed that RAG1 contains the catalytic center of the RAG complex, composed of three amino acids critical for all enzymatic activity: D600, D708, and E962.45,46
RAG2, on the other hand, serves as a regulatory cofactor; it has no intrinsic binding affinity for RSSs, but once bound to RAG1 improves the strength and specificity of RAG1 RSS contacts.47,48
It is also enhances RAG activity on chromosomal substrates and it restricts V(D)J recombination to the G0/G1 stage of the cell cycle (both features are discussed below).
Attempts to determine the molecular role of the RAG proteins in cell-free recombination assays were initially hampered by poor solubility of the proteins, but functional analyses of truncated RAG genes (using RAG expression vectors cotransfected into fibroblasts along with recombination substrate plasmids) revealed that surprisingly large segments of both proteins could be deleted without eliminating recombinase activity, and some of the remaining core regions were
soluble and could be handled relatively easily in experiments. This work allowed the demonstration that in a cell-free in vitro
system, core regions of the two RAG proteins together are capable of carrying out cleavage of substrate DNAs as well as hairpin formation on the coding end.49
The RAG-mediated cleavage occurs in two steps: first a nick is introduced on the top strand between a gene segment and the adjacent heptamer (see Fig. 6.9
), then the 3′-hydroxyl group participates as the nucleophile in a direct transesterification reaction to attack the phosphodiester bond adjacent to the heptamer on the bottom strand (see Fig. 6.9
), yielding a DNA hairpin structure on the coding end and a new 3′-hydroxyl group on the 3′ end of the bottom heptamer strand.50
After DNA cleavage, the RAG proteins remain in a complex with the DNA ends and facilitate aspects of the joining phase. Mutant forms of RAG1 or RAG2 have been reported that are competent for cleavage but show impairment in coding or signal joint formation.51
While nicking can occur asynchronously at the 12-RSS and 23-RSS, hairpin formation is “coupled” and occurs synchronously at both RSSs. In vitro
, coupled cleavage requires only the RAG proteins, HMG1/2 (discussed below) and Mg2+
as the divalent metal ion in the reaction buffer. In vivo
, DNA dsb formation at an individual RSS is dangerous as it could give rise to translocations, and it is thought that Mg2+
promotes an optimal molecular “architecture” for controlled V(D)J recombination. In vivo
experiments indeed suggest that RAG proteins may bind to and introduce a nick at a single 12-RSS, but do not complete DNA cleavage until a matching 23-RSS is captured into the RAG-RSS complex.52
In addition to the “classical” activities of RAG proteins on DNA segments containing RSSs, these proteins can also catalyze DNA strand cleavage on “nonstandard” substrates.
Transposition. In vitro
, purified recombinant core RAG proteins can catalyze the excision and insertion of a DNA fragment with signal ends into foreign DNA, acting as a transposase.53,54
This property provides additional support for the early speculation that the V(D)J recombination system may have originated by insertion of transposon-like DNA fragment encoding RAG genes (and bearing RSSs at its ends) into a primordial antigen receptor gene, thereby generating a pair of separated V and J gene segments. This model of the origin of V(D)J recombination is consistent with the many mechanistic similarities at the molecular level between Ig gene rearrangements and transposition,55
and the recent identification of the Transib transposase family that shows striking sequence similarity to RAG1 and is widespread in insect, echinoderm, helminth, coelenterate, and fungal genomes.56
The recent finding of an apparent homolog of the entire RAG1 and RAG2 gene locus in a sea urchin genome suggests that the two RAG genes may have entered the genome of a common ancestor of all deuterostomes far earlier than the Ig-/TCR-based adaptive immune system developed.57
It remains unclear whether the primordial RAG transposon encoded solely RAG1 (which would then have integrated next to the primordial RAG2 gene) or both RAG1 and RAG2. The transposase activity of RAGs, however, seems to be almost completely suppressed in vivo
, and the C-terminus of RAG2 may have evolved to control this potentially deleterious activity.51,58,59,60
replacement. As mentioned previously, recombination events can occur between a VH
23-RSS and cryptic RSS within rearranged VH
coding sequences. An in vitro
model suggests that in VH
replacement, the RAG proteins nick both DNA strands without forming a hairpin coding end.61
Whether this is indeed a completely different activity is unclear.
Translocations at non-RSS sequences. The RAG complex also generates two nicks to cleave within the major breakpoint region of the Bcl2
gene. This 150-bp segment is the target of a common RAG-catalyzed translocation between the IgH locus and the Bcl2
gene occurring in most follicular lymphomas. In this segment, there are no RSSs, and the RAG proteins recognize an unusual sequence-dependent DNA conformation different from the normal B-form double helix.62
Although the “core” RAG proteins have been useful for elucidating the molecular mechanism of the cleavage step of V(D)J recombination in biochemical studies, it is clear that the “noncore” portions of each protein confer important functions, as expected from their sequence conservation across species. Broadly speaking, the “noncore” regions ensure regulated and efficient recombination on the physiological substrates (i.e., imperfect RSSs deviating from the perfect consensus heptamer and nonamer) in the context of chromatin. The functions of the “noncore” regions have largely been inferred by comparing V(D)J recombination products from cells expressing core RAG proteins versus full-length versions, and more recently by in vitro studies using full-length RAG proteins that are now available for such analyses.
The C-terminal region of RAG2 has multiple functions and is important for achieving normal numbers of B- and T-lymphocytes in vivo
for the formation of precise signal joints during IgH recombination,64
and for protecting against RAG-mediated DNA transposition.51,65
These functions are thought to be conferred at least in part, by a plant homeo domain (PHD) zinc finger fold that is formed by amino acids 414 to 487 in murine RAG2. This PHD domain binds specifically to the tails of histone H3 that are trimethylated at lysine 4 (H3K4Me3),66,67,68
a histone modification that is associated with “open” chromatin and that is uniquely present on “accessible” RSSs in Ig loci (discussed below). In vitro
studies suggest that the binding of the RAG2 PHD domain to histone tails causes a conformational change that increases the catalytic activity of the RAG complex.69
Furthermore, the RAG2 C terminus regulates RAG2 protein levels—and hence V(D)J recombinase activity—across the cell cycle to prevent dsbs during DNA synthesis or mitosis, when such breaks could lead to chromosomal deletions.32
RAG1 protein and messenger RNA (mRNA) transcript levels of both RAG genes vary little across the cell cycle, but phosphorylation of RAG2 at Thr490 by the cyclin-dependent
kinase cdk2 mediates its destruction via ubiquitination and proteasomal degradation during S phase.70
Mice expressing RAG2 with a T490A mutation (which cannot be phosphorylated) showed RAG2 protein and dsbs throughout the cell cycle, demonstrating the importance of the RAG2 degradation signal in cell-cycle control of V(D)J recombination.71,72
The N-terminal noncore region of RAG1 is required in vivo
for optimal RAG1 activity and for the formation of precise signal joints in D-J recombination.64
This region of RAG1 contains a RING finger domain that seems to be required for ubiquitination of several proteins, including histone H3.73
Apart from the obvious importance of the RAG proteins in understanding the initial steps of V(D)J recombination, knowledge of these proteins and their genes has allowed two major technical advances that have opened the way to many additional experiments. First, various nonlymphoid cell lines with known defects in various DNA repair genes have been transfected with the RAG genes to identify genes involved V(D)J recombination (these factors are described below). Second, availability of the RAG1 and RAG2 knockout mice has been instrumental in a large number of immunology studies. These mice completely lack functional B cells or T cells, and are not “leaky” like SCID mice, which develop some functional B and T cells, especially as the animals age. Thus the RAG-deficient mice can be used to study the importance of the “innate” immune system (i.e., responses that occur in the absence of antigen-specific lymphocytes) in particular immune responses. They can also be used as recipients for various lymphocyte populations to explore the roles of different cell types. They can also be used as recipients for various lymphocyte populations to explore the roles of different cell types. They can be transfected with transgenes encoding specific Ig genes to study the roles of specific antibodies in B cell development and in immune responses. Finally, they can be used in “RAG complementation” experiments designed to assess the phenotype —in lymphocytes—of various other gene knockouts.74
In RAG complementation, embryonic stem cells in which the gene of interest has been knocked out by homologous recombination are injected into homozygous RAG2 knockout (RAG2-/-) blastocysts. This procedure yields chimeric mice in which all B and T cells derive from the embryonic stem cells deleted for the gene of interest, as these are the only source of intact RAG genes to support lymphocyte development. Such animals can be made more easily than a knockout mouse line, and can be used to study the effect of gene deletion in lymphocytes independent of effects the deletion may have in other cells. In particular, for cases where the gene knockout causes embryonic lethality due to effects on nonlymphoid cells, RAG complementation allows the selective knockout in lymphocytes to be studied in the background normal gene expression in nonlymphoid cells.
Nonhomologous End Joining Components
The RAG proteins are the essential lymphocyte-specific factors in the DNA cleavage phase of V(D)J recombination, but DNA repair factors that are part of a DNA repair pathway known as nonhomologous end joining (NHEJ) are essential for the joining phase. NHEJ is the major pathway for repair of dsbs (such as those induced by ionizing radiation or reactive oxygen species) during the G0-G1 phases of the cell cycle. (In the S and G2 phases, the additional chromatid genome copy enables breaks to be repaired by homologous recombination.) The six classical core components of NHEJ are Ku70, Ku80, DNA-PKcs, XRCC4, DNA Ligase IV, Artemis, and Cernnunos/XLF, but additional proteins play a role in some models of NHEJ.
The DNA-PK Complex. The first gene for an NHEJ component to be recognized as participating in V(D)J recombination was the SCID gene. This gene was originally identified as being mutated in the scid mouse strain that is immunodeficient due to a marked impairment in V(D)J recombination of both Ig and TCR genes. Lymphocytes from scid mice are able to perform the RAG-mediated cleavage reaction, and can also form signal joints, but are markedly defective in coding joint formation. Subsequently, it was found that the scid mutation also impairs NHEJ, causing radiosensitivity.
The gene mutated in the scid
mouse strain encodes DNAPKcs, a large protein (460 kD) with a kinase domain near its C terminus that is related to phosophoinositide-3-kinase (PI3K). This kinase is DNA-dependent and represents the catalytic subunit (hence “cs”) of a heterotrimer known as the DNA-PK complex. The other components are Ku70 and Ku80 (also referred to as Ku86), which were originally identified as the autoantigens recognized by a patient antiserum (Ku was the coded name of the patient, and the numbers refer to the approximate size of the proteins, 70 kD and 80 to 86 kD, respectively). Together, these two very abundant proteins form a heterodimer that binds to the ends of double-stranded DNA independent of the nucleotide sequence of the DNA. The DNA-Ku complex can then recruit DNAPKcs and activate autophosphorylation of this protein.78 In vitro
activation of DNA-PKcs was found to be efficient when DNA ends either were at high concentration or, if at low concentration, were on DNA fragments long enough to circularize readily. In contrast, when the DNA-PKcs was located on the ends of DNA fragments too short to circularize (and too dilute for efficient intermolecular interactions with other DNA ends), the DNA-PKcs activation was much reduced. These observations suggest that kinase activation can occur only after two DNA ends are brought together by DNA-PKcs in “synapsis.”79,80
Further phosphorylation of
DNA-PKcs inactivates the protein and may prepare it for removal once DNA ends have been sealed.
Ku genes are highly conserved through evolution, and homologs are even found encoded in the genome of some bacteria, consistent with a function in general NHEJ not restricted to V(D)J recombination. While mice with a targeted deletion of DNA-PKcs resemble the original scid
mutation (i.e., defective coding but functional signal joint formation81,82
), Ku70 and Ku80 mutant cell lines are defective in both signal and coding joint formation, and Ku70- and Ku80-deficient mice exhibit a complete block in B- and T-cell development due to their inability to undergo V(D)J recombination.83,84,85
DNA Ligase IV and XRCC4.
An important role of activated Ku-DNA-PKcs complex is to recruit the additional components of NHEJ. One such component is DNA ligase IV, which is recruited to the Ku complex and activated by the protein XRCC4.86,87
The evidence suggests that DNA ligase IV is the essential ligase that joins DNA ends in V(D)J recombination and NHEJ. Human patients with ligase IV deficiency (characterized by hypomorphic alleles) have a severe phenotype including chromosomal instability, developmental and growth retardation, radiosensitivity, and immunodeficiency with a T-B-NK+ phenotype.88
The rare DH
junctions detected show extensive nucleotide deletion consistent with delayed ligation and prolonged exonuclease digestion.89
In mice, disruption of either the XRCC4 or the DNA ligase IV gene causes embryonic lethality associated with neuronal apoptosis. Crossing these mice with p53 mutants does not improve V(D)J recombination, but rescues the mice from embryonic lethality, suggesting that neuronal cells may be unusually susceptible to p53-triggered apoptosis induced by normal low-level DNA damage during brain development; a similar mechanism may explain the severe human phenotype.90
DNA ligase IV is the only NHEJ component absolutely required to join compatible sticky DNA ends in vitro
, though XRCC4 can stimulate this activity significantly.87
The next NHEJ component was independently discovered by two laboratories. One group used yeast two-hybrid screening to search for proteins interacting with XRCC4.91
The other group searched for the gene causing a syndrome of T+ B lymphocytopenia, increased radiosensitivity, and microcephaly in a Turkish family; these investigators used functional cDNA rescue of a patient’s cell line from a radiomimetic drug to identify the gene.92
The protein identified by both groups is a 299 amino acid nuclear protein, which was named Cernunnos or XLF. The protein has a predicted secondary structure similar to that of XRCC4, to which it binds in cells93
as expected from its isolation via two-hybrid screen. When Cernunnos/XLF-deficient fibroblasts were transfected with RAG genes and a recombination substrate, imprecise signal joining was observed, similar to the defect in patients with hypomorphic DNA ligase IV mutations. These experiments all suggest a role for Cernunnos/XLF linked to the function of XRCC4 and ligase IV.
The coding ends generated by RAG cleavage cannot be directly ligated because of their hairpin structure, and therefore V(D)J recombination requires a single-strand endonuclease activity to cleave the hairpins. This activity is conferred by the protein named Artemis, which was discovered through positional cloning of the genetic defect in a group of human SCID patients with defects in V(D)J recombination and increased radiation sensitivity.94
Patients with homozygous null mutations of Artemis survive (no embryonic lethality) and show sensitivity to γ irradiation as well as defects in coding joints, while signal joint formation is normal. Hypomorphic Artemis mutations can cause features of the Omenn syndrome similar to those observed with hypomorphic RAG gene mutations.95
Purified recombinant Artemis protein has an intrinsic exonuclease activity in vitro
; however, when complexed with DNA-PKcs in the presence of DNA ends, it gains a single-strand endonuclease activity and, in an ATP-dependent step, becomes phosphorylated at multiple sites in the C-terminal region of the protein.96,97
The Artemis endonuclease can cleave synthetic and RAG-generated hairpin ends as well as other singlestranded DNA near a transition to double-strand DNA.98
DNA Polymerase X Family Members.
If a hairpin opening leaves blunt ends or complementary sticky ends (like the ends generated by many restriction enzymes), in vitro
joining experiments suggest that these ends can be joined by ligase IV without any additional processing.99
However, as Artemis probably opens most hairpins noncomplementary DNA overhangs, further processing of DNA ends generally occurs before ligation completes the recombination. This processing may include further nuclease digestion (by Artemis or exonucleases) and apparently also involves variable DNA extension by three DNA polymerases—polymerase λ, polymerase µ, and terminal deoxynucleotidyl transferase (TdT)—all of which are members of the polymerase X family. Interestingly, all three proteins contain a Brca1-C-terminus domain, which is thought to confer binding to Ku.100
Terminal Deoxynucleotidyl Transferase and N Regions.
TdT, the primary source of untemplated “N region” additions in VDJ junctions, is an enzyme uniquely expressed in the thymus and bone marrow; in the B lineage, it is expressed almost exclusively in pro-B cells. It catalyzes the nontemplated addition of nucleotides to the 3′ end of DNA strands. Though no template determines the nucleotides added, the enzyme adds dG residues preferentially, consistent with N region sequences observed in VDJ joints. Both TdT expression and N nucleotide addition are characteristically absent from fetal lymphocytes.101
N region addition is common in H chain genes (recombined in pro-B cells) but rare in murine L chain genes (recombined in pre-B cells), though perhaps somewhat less rare in human.102
This is consistent with the observation that in mice the expression of a µ H chain may downregulate TdT expression,103
contributing to the reduced level during the stage of L chain recombination.
Lymphocytes with engineered defects in their TdT genes produced rearranged Ig V regions with almost no N additions. Conversely, when TdT expression was engineered in cells undergoing κ or λ L chain rearrangement, the level of
N nucleotide addition to these coding joints was dramatically increased. Furthermore, mice engineered to undergo premature Vκ-Jκ joining in pro-B cells show an increased frequency of N region nucleotides in their recombined Vκ genes.104
These results suggest that the low frequency of N region sequences in normal κ or λ recombinations is caused by the reduced levels of TdT at this stage of B-cell development (see following discussion).
The absence of N region addition in TdT mutant mice, as well as in normal fetal lymphocytes, is associated with an increase in the frequency of recombination junctions with microhomologies. These are short stretches of nucleotides that are present close to the end of both germline gene segments involved in the recombination event. These junctions suggest a joining intermediate in which the complementary single-stranded regions from the two coding ends hybridize to each other, much as “sticky ends” generated by restriction endonucleases can facilitate ligation of DNA fragments. This alternative joining pathway may restrict the diversity of neonatal antibodies; the resulting antibodies are possibly enriched in specificities for commonly encountered pathogens, or have broadened specificity, as has been reported for TCRs lacking N regions.105
Decreased N region nucleotides and a high incidence of homology-mediated recombination have also been found in the rare coding joints formed in Ku80-/- mice, consistent with a role for Ku in recruiting TdT or supporting its action.106
Polymerase µ and Polymerase λ.
Polymerase µ and polymerase λ are ubiquitously expressed polymerases. Both readily fill in single-strand gaps in DNA and apparently participate in V(D)J recombination by filling in single-strand 3′ overhangs generated by asymmetric hairpin opening. Without this filling in, such overhangs might be resected by nucleases. Indeed, when in vitro
NHEJ reconstitution experiments are performed using purified proteins and DNA fragments with overhanging ends, the omission of polymerase µ or polymerase λ increases the deletional trimming at junctions.100
Similar excessive deletions at VDJ junctions are observed in mice lacking polymerase µ or polymerase λ. Remarkably, however, polymerase µ knockout mice show abnormalities only in their L chains,107
whereas the deletions in polymerase λ knockouts are restricted to their H chains.108
This selectivity may be explained by corresponding changes in the relative mRNA levels for these two polymerases at different stages of B-cell development.
Other Participants in V(D)J Recombination
DNA Damage Response Factors. In eukaryotic cells, DNA breaks initiate signals that halt cell division, induce DNA repair, and in some cases trigger apoptosis. Several proteins apart from NHEJ components can be detected at DNA breaks induced by V(D)J recombination or irradiation, including γ-H2AX, a phosphorylated form of the histone H2AX; ATM, the product of the gene mutated in the disease ataxia telangiectasia; Nbs1 (or nibrin), the product of the gene mutated in Nijmegen breakage syndrome; and 53BP1, p53 binding protein 1. The importance of these proteins in V(D)J recombination is not clear because defects in all three are compatible with near normal V(D)J recombination. Possibly, they participate in backup mechanisms to prevent aberrant V(D)J recombination and thus translocations.
Pax5/B-Cell-Specific Activator Protein.
Pax5 (also known as B-cell-specific activator protein; BSAP) is a transcription factor required for normal B-cell development. Pax5-deficient mice are able to complete DJH recombination, but VH
recombination is impaired except for certain VH
genes located proximal to the D regions. Interestingly, 94% of human and mouse VH
coding genes were found to have potential Pax5 binding sites. Surprisingly, Pax5 was found to coimmunoprecipitate with RAG proteins, to potentiate in vitro
cleavage of a VH
gene RSS, and to enhance VH
recombination in RAG-transfected fibroblasts; the latter enhancement required intact Pax5 binding sites in the VH