Genes
GENES
In the early 1860s, Gregor Mendel developed the concept of the gene to help explain results obtained while crossbreeding strains of garden peas. He identified physical characteristics (phenotypes), such as plant height and seed color, that could be passed on, unchanged, from one generation to the next. The hereditary factor that predicted the phenotype was termed a "gene." Mendel hypothesized that genes were inherited in pairs, one from the male and one from the female parent. Plants that bred true (homozygotes) had inherited identical genes from their parents, whereas plants that did not breed true (hybrids, or heterozygotes) inherited alternative copies of the genes (alleles) from one parent that were similar, but not identical, to those from the other parent.
Some of these alleles had a greater effect on the phenotypes of hybrids than others. For example, if a single copy of a given allele was sufficient to produce the same phenotype seen in homozygous organisms, that gene was termed a "dominant." Conversely, if the allele could only be detected in the minority of the offspring of hybrid parents that were homozygous for that "weaker" allele, the gene was termed a "recessive." Based on these observations, Mendel formulated a series of laws that are the basis of what we now term "Mendelian" inheritance patterns.
The "law of unit inheritance" holds that factors retain their identity from generation to generation and do not blend in the hybrid. The "law of segregation" states that two members (alleles) of a single pair of genes are never found in the same mature sperm or ovum (gamete) but always separate out (segregate). Finally, the "law of independent assortment" holds that members of different pairs of genes (nonalleles) are sorted out (assort) independently to different gametes.
Almost a century later, in 1953, Watson and Crick solved the structure of the DNA molecule and helped explain how this genetic information could be encoded in a polymer, deoxyribonucleic acid (DNA), which was found in the nucleus of the cell. They demonstrated that DNA is a double-stranded polymer consisting of two linear arrays of diverse purine (adenine [A] and guanine [G]) and pyrimidine (thymine [T] and cytosine [C]) bases. Each purine or pyrimidine on one strand pairs with a complementary base (A:T and G:C) on the other strand. Each strand is thus complementary to the other. The two antiparallel polynucleotide strands are gently twisted to form what is termed a "double helix."
In humans, the nucleus of each somatic cell contains twenty-three pairs of chromosomes, which are formed by tightly coiled DNA strands. Twenty-two pairs of the chromosome pairs are found in the cells of both men and women. These chromosomes are termed "autosomes," and they are numbered by size from 1 (the largest) to 22 (the smallest). The twenty-third pair of chromosomes determine the sex of the individual, and these two chromosomes are thus termed the "sex chromosomes." Women have a pair of X chromosomes,
During "mitosis," the DNA double strand is unwound and split apart. Each individual strand is then duplicated. By making copies of each DNA strand, a parental cell can transmit a complete set of genetic information into each of its two daughter cells.
Gametes result from "meiosis," which differs from mitosis in two ways. First, allelic chromosomes are paired prior to their duplication. Second, there are two sets of divisions before the final product, the gamete, is created. In the first set of divisions after DNA duplication, allelic chromosomes, rather than chromatids, segregate into the daughter cells. In the second set of divisions, the chromatids separate and segregate into the gamete. Thus, one and only one copy of each allelic pair is contributed to the gamete. In this way, a "diploid" germ cell gives rise to a "haploid" sperm or egg that contains an assortment of one of each of the twenty-three pairs of allelic chromosomes in the parental cell. During fertilization, a sperm and an egg unite to create a zygote with a newly constituted complete set of forty-six chromosomes. These fundamental properties of DNA and cell division are the basis of Mendel's laws of unit inheritance, segregation, and independent assortment.
The central dogma of molecular genetics holds that each gene encodes one polypeptide, forming a monomeric protein. The portion of the gene that specifies the polypeptide sequence is termed "coding" DNA. Each human cell contains approximately 3.9 × 109 base pairs of DNA per haploid genome, which is enough to encode about 1 million polypeptides of average length. However, there are approximately 35,000 structural genes—possibly in the range of 30,000—in humans; thus more than 90 percent of DNA does not encode peptide sequences. The DNA that does not code for protein, termed "noncoding" DNA, is often involved in the regulation of gene expression. Noncoding DNA can also play a structural role. Structural functions include providing structural stability for the chromosome (e.g., matrix-associated regions, or MARs), providing the specialized sequences that define the ends of the chromosome (telomeres), and providing a site to which the cellular cytoskeleton can be attached in order to allow the movement of chromosomes during meiosis and mitosis (centromeres). Approximately 10 percent of cellular DNA consists of a repetitive sequence that has been randomly inserted throughout the genome. Although the function of this repetitive DNA is unknown, its presence has proven useful for gene mapping studies.
Genetic information proceeds in a stepwise fashion from the sequence of a gene to the synthesis of a polypeptide. Located near the coding sequence of the gene are sequences, called DNA control regions, that identify the transcription start site (promoters), mark the tissue in which it will be expressed (enhancers), and control the use of batteries of genes during ontogeny (locus control regions). The regions of DNA that specify the sequence of a polypeptide chain, or structural genes, are organized into discrete units (exons) that are separated by noncoding sequences (introns). The first step in synthesizing a new protein occurs in the nucleus, where the sequence of the coding DNA is copied (transcribed) into ribonucleic acid (RNA), a less stable nucleic acid that can be rapidly degraded. The ends of the RNA are modified to help stabilize the final product and the introns are removed, or spliced out, generating messenger ribonucleic acid (mRNA). The mRNA is transported from the nucleus to the cytoplasm, where it is translated by ribosomes into polypeptide strands.
Ribosomes read the sequence of the mRNA in sequential groups of three, or triplets, termed a codon. There are sixty-four different combinations (e.g., AAA, TTT, CAC), all but three of which specify a specific amino acid. Each codon specifies a single amino acid, but amino acids can be encoded by more than one codon, thus there is considerable degeneracy in the code. Translation begins when the mRNA is bound to the ribosome. Transfer RNA (tRNA), an adapter molecule, contains a complementary triplet anticodon at one end, and an amino acid bound to the other end. The tRNA anticodon binds to the mRNA codon and helps stabilize the interaction with the ribosome. Each ribosome has two sites where the tRNA can bind. Binding of the downstream tRNA, which contains sequence complementary to the next three nucleotide codon on the RNA, brings its amino acid next to the end of the growing polypeptide strand. Formation of a peptide bond
Many genes are composed of a series of structural or functional domains, with each exon specifying part or all of the sequence of a single structural domain. Each domain can endow the protein with a different property. For example, a protein may have one or more extracellular domains that allow it to bind to a specific soluble ligand, a transmembrane domain that allows it to be anchored in the cell membrane, and one or more intracellular domains that allow it to signal inside the cell. These types of proteins are the product of mixing and matching different types of domains during evolution, a process that is facilitated by the exon/intron structure of the gene. By changing the extracellular domains while maintaining the rest of the molecule relatively intact, for example, a similar signal can be elicited by the binding of several different types of ligands. Conversely, the presence or absence of a transmembrane domain can allow the protein to be tethered to the cell or to exist as a soluble factor. The function of an unknown protein can often be guessed by analyzing its complement of domains.
At first glance, the linking of genes in chromosomal units and their transmission as a unit to daughter cells would seem to violate Mendel's laws of independent assortment and segregation, because effectively one might expect genes to be inherited as part of only 23 sets of genes. However, when allelic chromosomes are brought into close juxtaposition during the process of meiosis, breaks occur in the chromosomes and allow bridges, or chiasmata, to form between homologous portions of the chromosomes. This crossing over of DNA strands allows allelic chromosomes to recombine, forming patchwork or chimeric chromosomes that contain portions of each of the parental chromosomes. Although recombination can occur anywhere in the chromosome, only a limited number of chiasmata form during each meiosis. Two genes that are on opposite ends of the chromosome may thus behave as if they were on different chromosomes, whereas recombination is less likely between genes that are very close to each other in their primary sequence. The increased frequency of the joint inheritance of two genes that are closely physically linked on a chromosome is termed "linkage disequilibrium."
Distances between genes on a chromosome are quantified by either their physical distance from each other in millions of base pairs (megabases), or by their genetic distance, as measured by the frequency of recombination between the two genes per generation. One percent of genetic recombination is termed a "centimorgan," after the geneticist Thomas Hunt Morgan, whose studies of the common fruitfly, Drosophila, in the first half of the twentieth century helped elucidate the properties of recombination. As a rough guide, one centimorgan covers approximately one megabase of DNA. However, the relationship between linear and genetic distance is not absolute. The frequency of recombination, and thus the genetic distance between genes in specific regions of the genome, may differ depending on the sequence or the nonhistone proteins that cover the DNA. Recombination frequencies in selected regions of the genome may differ in male and female gametes, implying that segments of chromosomes can be handled differently by spermatogonia and oocytes. This disparity in how DNA is treated by male and female gametes can lead to differences in the function of alleles, depending on whether they have been inherited from the mother or the father, a process termed "imprinting."
A "mutation" is defined as a stable, heritable alteration in the DNA sequence that can be passed from a parental cell to at least one its daughters. From the standpoint of evolution, mutations are required to generate the genetic diversity that is needed to permit species to adapt to a changing environment. The normal rate of mutation is approximately one base pair change per generation per 107 base pairs; thus, on average, each child differs from its parent by approximately 390 base pairs as a result of mutations in the gametes. Mutations in the nonreproductive cells of the body are termed "somatic" mutations. Although by definition these alterations are not transmitted to the gametes, the mutations are passed on to the daughter cells of the mutated parent. Somatic mutations
Mutations can involve an entire human genome, as in triploidy, in which a third copy of the entire chromosomal complement occurs. Mutations may involve all or part of a single chromosome, including duplications, deletions, and translocations of a portion of one chromosome to another. At the other extreme, a mutation can be minute and involve a small deletion or insertion, or a replacement of only a single base pair (point mutation). Deletions or insertions that occur in a coding region can alter the reading frame distal to the mutation (frameshift mutations). Frameshift mutations frequently alter the protein sequence and can lead to premature peptide termination by generating a stop codon, one of the three triplet sequences that does not encode an amino acid. Point mutations in coding regions may be of three types: (1) a nonsense mutation (about 4% of base substitutions in coding regions), in which the base change generates one of the three termination codons; (2) a missense, or replacement, mutation (about 73% of base substitutions in coding regions), in which the base change results in substitution of one amino acid for another; and (3) a synonymous, or silent, mutation (about 23% of random base substitutions in coding regions), in which the base replacement does not lead to a change in the amino acid but only to a different codon for the same amino acid. Even synonymous mutations can have deleterious affects, however. A change in the coding sequence of a given gene may alter splicing patterns or diminish mRNA stability, reducing protein production.
The consequences of a single-point mutation to the function of a given protein can vary greatly. Enzymes, for example, exhibit a hierarchy of resistance to mutation. Portions of the hydrophilic exterior may serve primarily to allow the protein to be soluble in an aqueous solution, hence changes in the amino acid sequence that preserve hydropathicity may have little or no effect on the function of the protein. The hydrophobic core provides structural stability for the molecule, and amino acid changes may result in an unstable protein product that is temperature sensitive (e.g., falling apart at high temperature). Finally, the catalytic site is exquisitely sensitive, and a single mutation may completely abolish function.
Large deletions may interrupt a coding region and cause an absence of one or more closely linked protein products. If the deletion removes a bridge between two coding regions, the result may be a fusion or hybrid protein containing the initial sequence of one protein and the terminal portion of the other. Such deletions can also result from unequal crossing-over between homologous genes. Finally, alterations of the DNA in the surrounding regions may lead to changes in RNA splicing, transcriptional efficiency, or control of tissue expression.
The Human Genome Project began in 1990 with the goals of developing genetic and physical maps and determining the complete DNA sequence of the human genome. The ultimate goal is to use this mapping and sequence information to isolate and study the structure and function of genes that can contribute to the development of disease. Knowledge of the genetic basis of susceptibility for specific diseases is likely to aid in disease prevention as well as therapy. Associated with these benefits, however, is the risk of discrimination against healthy at-risk individuals that may never develop a disorder. Thus, in addition to learning how to use this new knowledge, we must gain the wisdom to use genetic information appropriately.
HARRY W. SCHROEDER, JR.
(SEE ALSO: Genetic Disorders; Genetics and Health; Human Genome Project; Medical Genetics)
