Menu
For free
Registration
home  /  Health/ The genetic code consists of 2 nucleotides. Biosynthesis of protein and nucleic acids

The genetic code consists of 2 nucleotides. Biosynthesis of protein and nucleic acids

The same nucleotides are used, with the exception of the nucleotide containing thymine, which is replaced by a similar nucleotide containing uracil, which is designated by the letter (in Russian-language literature). In DNA and RNA molecules, nucleotides are arranged in chains and, thus, sequences of genetic letters are obtained.

The proteins of almost all living organisms are built from only 20 types of amino acids. These amino acids are called canonical. Each protein is a chain or several chains of amino acids connected in a strictly defined sequence. This sequence determines the structure of the protein, and therefore all its biological properties.

However, in the early 60s of the 20th century, new data revealed the inconsistency of the “code without commas” hypothesis. Then experiments showed that codons, considered meaningless by Crick, can provoke protein synthesis in vitro, and by 1965 the meaning of all 64 triplets had been established. It turned out that some codons are simply redundant, that is, a whole series of amino acids are encoded by two, four or even six triplets.

Properties

Tables of correspondence between codons of mRNA and amino acids

Genetic code common to most pro- and eukaryotes. The table shows all 64 codons and the corresponding amino acids. The base order is from the 5" to the 3" end of the mRNA.

Standard genetic code
1st
base
2nd base 3rd
base
U C A G
U UUU (Phe/F) Phenylalanine UCU (Ser/S) Serin UAU (Tyr/Y) Tyrosine UGU (Cys/C) Cysteine U
UUC UCC UAC UGC C
UUA (Leu/L) Leucine UCA UAA Stop ( Ocher) U.G.A. Stop ( Opal) A
UUG UCG UAG Stop ( Amber) UGG (Trp/W) Tryptophan G
C CUU CCU (Pro/P) Proline CAU (His/H) Histidine C.G.U. (Arg/R) Arginine U
CUC CCC C.A.C. C.G.C. C
CUA CCA CAA (Gln/Q) Glutamine C.G.A. A
C.U.G. CCG CAG CGG G
A AUU (Ile/I) Isoleucine ACU (Thr/T) Threonine AAU (Asn/N) Asparagine AGU (Ser/S) Serin U
AUC ACC A.A.C. A.G.C. C
AUA ACA AAA (Lys/K) Lysine A.G.A. (Arg/R) Arginine A
AUG (Met/M) Methionine A.C.G. AAG AGG G
G GUU (Val/V) Valine G.C.U. (Ala/A) Alanine GAU (Asp/D) Aspartic acid GGU (Gly/G) Glycine U
GUC GCC GAC GGC C
GUA G.C.A. GAA (Glu/E) Glutamic acid GGA A
G.U.G. GCG GAG GGG G
The AUG codon encodes methionine and is also the translation initiation site: the first AUG codon in the coding region of the mRNA serves as the beginning of protein synthesis. Reverse table (codons for each amino acid are shown, as well as stop codons)
Ala/A GCU, GCC, GCA, GCG Leu/L UUA, UUG, CUU, CUC, CUA, CUG
Arg/R CGU, CGC, CGA, CGG, AGA, AGG Lys/K AAA, AAG
Asn/N AAU, AAC Met/M AUG
Asp/D GAU, GAC Phe/F UUU, UUC
Cys/C UGU, UGC Pro/P CCU, CCC, CCA, CCG
Gln/Q CAA, CAG Ser/S UCU, UCC, UCA, UCG, AGU, AGC
Glu/E GAA, GAG Thr/T ACU, ACC, ACA, ACG
Gly/G GGU, GGC, GGA, GGG Trp/W UGG
His/H CAU, CAC Tyr/Y UAU, UAC
Ile/I AUU, AUC, AUA Val/V GUU, GUC, GUA, GUG
START AUG STOP UAG, UGA, UAA

Variations in the standard genetic code

The first example of a deviation from the standard genetic code was discovered in 1979 during a study of human mitochondrial genes. Since that time, several similar variants have been found, including a variety of alternative mitochondrial codes, for example, reading the stop codon UGA as the codon specifying tryptophan in mycoplasmas. In bacteria and archaea, HG and UG are often used as start codons. In some cases, genes begin encoding a protein at a start codon that is different from that normally used by the species.

In some proteins, non-standard amino acids, such as selenocysteine ​​and pyrrolysine, are inserted by a ribosome reading the stop codon, depending on the sequences in the mRNA. Selenocysteine ​​is now considered to be the 21st, and pyrrolysine the 22nd, of the amino acids that make up proteins.

Despite these exceptions, all living organisms have a genetic code common features: codons consist of three nucleotides, where the first two are decisive; codons are translated by tRNA and ribosomes into an amino acid sequence.

Deviations from the standard genetic code.
Example Codon Normal meaning Reads like:
Some types of yeast Candida C.U.G. Leucine Serin
Mitochondria, in particular in Saccharomyces cerevisiae CU(U, C, A, G) Leucine Serin
Mitochondria of higher plants CGG Arginine Tryptophan
Mitochondria (in all studied organisms without exception) U.G.A. Stop Tryptophan
Nuclear genome of ciliates Euplotes U.G.A. Stop Cysteine ​​or selenocysteine
Mitochondria of mammals, Drosophila, S. cerevisiae and many protozoa AUA Isoleucine Methionine = Start
Prokaryotes G.U.G. Valin Start
Eukaryotes (rare) C.U.G. Leucine Start
Eukaryotes (rare) G.U.G. Valin Start
Prokaryotes (rare) UUG Leucine Start
Eukaryotes (rare) A.C.G. Threonine Start
Mammalian mitochondria AGC, AGU Serin Stop
Drosophila mitochondria A.G.A. Arginine Stop
Mammalian mitochondria AG(A, G) Arginine Stop

Evolution

It is believed that the triplet code developed quite early in the evolution of life. But the existence of differences in some organisms that appeared at different evolutionary stages indicates that he was not always like this.

According to some models, the code first existed in a primitive form, when a small number of codons designated a relatively small number of amino acids. More precise codon meaning and larger number amino acids could be introduced later. At first, only the first two of the three bases could be used for recognition [which depends on the structure of the tRNA].

- Lewin B. Genes. M.: 1987. P. 62.

see also

Notes

  1. Sanger F. (1952). “The arrangement of amino acids in proteins.” Adv. Protein Chem. 7 : 1-67. PMID.
  2. Ichas M. Biological code. - M.: Mir, 1971.
  3. Watson J. D., Crick F. H. (April 1953). “Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid.” Nature. 171 : 737-738. PMID. reference)
  4. Watson J. D., Crick F. H. (May 1953). “Genetic implications of the structure of deoxyribonucleic acid.” Nature. 171 : 964-967. PMID. Uses deprecated |month= parameter (help)
  5. Crick F. H. (April 1966). “The genetic code - yesterday, today, and tomorrow.” Cold Spring Harb. Symp. Quant. Biol.: 1-9. PMID. Uses deprecated |month= parameter (help)
  6. Gamow G. (February 1954). “Possible relation between deoxyribonucleic acid and protein structures.” Nature. 173 : 318. DOI:10.1038/173318a0. PMID. Uses deprecated |month= parameter (help)
  7. Gamow G., Rich A., Ycas M. (1956). “The problem of information transfer from the nucleic acids to proteins.” Adv. Bio.l Med. Phys.. 4 : 23-68. PMID.
  8. Gamow G, Ycas M. (1955). “Statistical correlation of protein and ribonucleic acid composition” . Proc. Natl. Acad. Sci. U.S.A.. 41 : 1011-1019. PMID.
  9. Crick F. H., Griffith J. S., Orgel L. E. (1957).

They line up in chains and thus produce sequences of genetic letters.

Genetic code

The proteins of almost all living organisms are built from only 20 types of amino acids. These amino acids are called canonical. Each protein is a chain or several chains of amino acids connected in a strictly defined sequence. This sequence determines the structure of the protein, and therefore all its biological properties.

C

CUU (Leu/L)Leucine
CUC (Leu/L)Leucine
CUA (Leu/L)Leucine
CUG (Leu/L)Leucine

In some proteins, nonstandard amino acids, such as selenocysteine ​​and pyrrolysine, are inserted by a ribosome reading the stop codon, depending on the sequences in the mRNA. Selenocysteine ​​is now considered to be the 21st, and pyrrolysine the 22nd, amino acids that make up proteins.

Despite these exceptions, all living organisms have common genetic codes: a codon consists of three nucleotides, where the first two are decisive; codons are translated by tRNA and ribosomes into an amino acid sequence.

Deviations from the standard genetic code.
Example Codon Normal meaning Reads like:
Some types of yeast Candida C.U.G. Leucine Serin
Mitochondria, in particular in Saccharomyces cerevisiae CU(U, C, A, G) Leucine Serin
Mitochondria of higher plants CGG Arginine Tryptophan
Mitochondria (in all studied organisms without exception) U.G.A. Stop Tryptophan
Mitochondria in mammals, Drosophila, S. cerevisiae and many protozoa AUA Isoleucine Methionine = Start
Prokaryotes G.U.G. Valin Start
Eukaryotes (rare) C.U.G. Leucine Start
Eukaryotes (rare) G.U.G. Valin Start
Prokaryotes (rare) UUG Leucine Start
Eukaryotes (rare) A.C.G. Threonine Start
Mammalian mitochondria AGC, AGU Serin Stop
Drosophila mitochondria A.G.A. Arginine Stop
Mammalian mitochondria AG(A, G) Arginine Stop

History of ideas about the genetic code

However, in the early 60s of the 20th century, new data revealed the inconsistency of the “code without commas” hypothesis. Then experiments showed that codons, considered meaningless by Crick, could provoke protein synthesis in vitro, and by 1965 the meaning of all 64 triplets was established. It turned out that some codons are simply redundant, that is, a whole series of amino acids are encoded by two, four or even six triplets.

see also

Notes

  1. Genetic code supports targeted insertion of two amino acids by one codon. Turanov AA, Lobanov AV, Fomenko DE, Morrison HG, Sogin ML, Klobutcher LA, Hatfield DL, Gladyshev VN. Science. 2009 Jan 9;323(5911):259-61.
  2. The AUG codon encodes methionine, but at the same time serves as a start codon - translation usually begins with the first AUG codon of mRNA.
  3. NCBI: "The Genetic Codes", Compiled by Andrzej (Anjay) Elzanowski and Jim Ostell
  4. Jukes TH, Osawa S, The genetic code in mitochondria and chloroplasts., Experience. 1990 Dec 1;46(11-12):1117-26.
  5. Osawa S, Jukes TH, Watanabe K, Muto A (March 1992). "Recent evidence for evolution of the genetic code." Microbiol. Rev. 56 (1): 229–64. PMID 1579111.
  6. SANGER F. (1952). "The arrangement of amino acids in proteins." Adv Protein Chem. 7 : 1-67. PMID 14933251.
  7. M. Ichas Biological code. - World, 1971.
  8. WATSON JD, CRICK FH. (April 1953). “Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid." Nature 171 : 737-738. PMID 13054692.
  9. WATSON JD, CRICK FH. (May 1953). "Genetic implications of the structure of deoxyribonucleic acid." Nature 171 : 964-967. PMID 13063483.
  10. Crick FH. (April 1966). “The genetic code - yesterday, today, and tomorrow.” Cold Spring Harb Symp Quant Biol.: 1-9. PMID 5237190.
  11. G. GAMOW (February 1954). "Possible Relation between Deoxyribonucleic Acid and Protein Structures." Nature 173 : 318. DOI:10.1038/173318a0. PMID 13882203.
  12. GAMOW G, RICH A, YCAS M. (1956). "The problem of information transfer from the nucleic acids to proteins." Adv Biol Med Phys. 4 : 23-68. PMID 13354508.
  13. Gamow G, Ycas M. (1955). “STATISTICAL CORRELATION OF PROTEIN AND RIBONUCLEIC ACID COMPOSITION. " Proc Natl Acad Sci U S A. 41 : 1011-1019. PMID 16589789.
  14. Crick FH, Griffith JS, Orgel LE. (1957). “CODES WITHOUT COMMAS. " Proc Natl Acad Sci U S A. 43 : 416-421. PMID 16590032.
  15. Hayes B. (1998). "The Invention of the Genetic Code." (PDF reprint). American Scientist 86 : 8-14.

Literature

  • Azimov A. Genetic code. From the theory of evolution to deciphering DNA. - M.: Tsentrpoligraf, 2006. - 208 pp. - ISBN 5-9524-2230-6.
  • Ratner V. A. Genetic code as a system - Soros educational journal, 2000, 6, No. 3, pp. 17-22.
  • Crick FH, Barnett L, Brenner S, Watts-Tobin RJ. General nature of the genetic code for proteins - Nature, 1961 (192), pp. 1227-32

Links

  • Genetic code- article from the Great Soviet Encyclopedia

Wikimedia Foundation. 2010.

On the right is the largest helix of human DNA, built from people on the beach in Varna (Bulgaria), included in the Guinness Book of Records on April 23, 2016

Deoxyribonucleic acid. General information

DNA (deoxyribonucleic acid) is a kind of blueprint for life, a complex code that contains data on hereditary information. This complex macromolecule is capable of storing and transmitting hereditary genetic information from generation to generation. DNA determines such properties of any living organism as heredity and variability. The information encoded in it sets the entire development program of any living organism. Genetically determined factors predetermine the entire course of life of both a person and any other organism. Artificial or natural influence external environment are capable of only to a small extent influencing the overall expression of individual genetic traits or affecting the development of programmed processes.

Deoxyribonucleic acid(DNA) is a macromolecule (one of the three main ones, the other two are RNA and proteins) that ensures storage, transmission from generation to generation and implementation of the genetic program for the development and functioning of living organisms. DNA contains structural information various types RNA and proteins.

In eukaryotic cells (animals, plants and fungi), DNA is found in the cell nucleus as part of chromosomes, as well as in some cellular organelles (mitochondria and plastids). In the cells of prokaryotic organisms (bacteria and archaea), a circular or linear DNA molecule, the so-called nucleoid, is attached from the inside to cell membrane. In them and in lower eukaryotes (for example, yeast), small autonomous, predominantly circular DNA molecules called plasmids are also found.

From a chemical point of view, DNA is a long polymer molecule consisting of repeating blocks called nucleotides. Each nucleotide consists of a nitrogenous base, a sugar (deoxyribose) and a phosphate group. The bonds between nucleotides in the chain are formed due to deoxyribose ( WITH) and phosphate ( F) groups (phosphodiester bonds).


Rice. 2. A nucleotide consists of a nitrogenous base, a sugar (deoxyribose) and a phosphate group

In the vast majority of cases (except for some viruses containing single-stranded DNA), the DNA macromolecule consists of two chains oriented with nitrogenous bases towards each other. This double-stranded molecule is twisted along a helix.

There are four types of nitrogenous bases found in DNA (adenine, guanine, thymine and cytosine). The nitrogenous bases of one of the chains are connected to the nitrogenous bases of the other chain by hydrogen bonds according to the principle of complementarity: adenine combines only with thymine ( A-T), guanine - only with cytosine ( G-C). It is these pairs that make up the “rungs” of the DNA spiral “staircase” (see: Fig. 2, 3 and 4).


Rice. 2. Nitrogenous bases

The sequence of nucleotides allows you to “encode” information about various types of RNA, the most important of which are messenger or template (mRNA), ribosomal (rRNA) and transport (tRNA). All these types of RNA are synthesized on a DNA template by copying a DNA sequence into an RNA sequence synthesized during transcription, and take part in protein biosynthesis (the translation process). In addition to coding sequences, cell DNA contains sequences that perform regulatory and structural functions.


Rice. 3. DNA replication

The arrangement of basic combinations of DNA chemical compounds and the quantitative relationships between these combinations ensure the coding of hereditary information.

Education new DNA (replication)

  1. Replication process: unwinding of the DNA double helix - synthesis of complementary strands by DNA polymerase - formation of two DNA molecules from one.
  2. The double helix "unzips" into two branches when enzymes break the bond between the base pairs of chemical compounds.
  3. Each branch is an element of new DNA. New base pairs are connected in the same sequence as in the parent branch.

Upon completion of duplication, two independent helices are formed, created from chemical compounds of the parent DNA and having the same genetic code. In this way, DNA is able to pass information from cell to cell.

More detailed information:

STRUCTURE OF NUCLEIC ACIDS


Rice. 4 . Nitrogen bases: adenine, guanine, cytosine, thymine

Deoxyribonucleic acid(DNA) refers to nucleic acids. Nucleic acids are a class of irregular biopolymers whose monomers are nucleotides.

NUCLEOTIDES consist of nitrogenous base, connected to a five-carbon carbohydrate (pentose) - deoxyribose(in case of DNA) or ribose(in the case of RNA), which combines with a phosphoric acid residue (H 2 PO 3 -).

Nitrogenous bases There are two types: pyrimidine bases - uracil (only in RNA), cytosine and thymine, purine bases - adenine and guanine.


Rice. 5. Structure of nucleotides (left), location of the nucleotide in DNA (bottom) and types of nitrogenous bases (right): pyrimidine and purine


The carbon atoms in the pentose molecule are numbered from 1 to 5. The phosphate combines with the third and fifth carbon atoms. This is how nucleinotides are combined into a nucleic acid chain. Thus, we can distinguish the 3' and 5' ends of the DNA strand:


Rice. 6. Isolation of the 3' and 5' ends of the DNA chain

Two strands of DNA form double helix. These chains in the spiral are oriented in opposite directions. In different strands of DNA, nitrogenous bases are connected to each other by hydrogen bonds. Adenine always pairs with thymine, and cytosine always pairs with guanine. It is called complementarity rule(cm. principle of complementarity).

Complementarity rule:

A-T G-C

For example, if we are given a DNA strand with the sequence

3’- ATGTCCTAGCTGCTCG - 5’,

then the second chain will be complementary to it and directed in the opposite direction - from the 5’ end to the 3’ end:

5'- TACAGGATCGACGAGC- 3'.


Rice. 7. Direction of the chains of the DNA molecule and the connection of nitrogenous bases using hydrogen bonds

DNA REPLICATION

DNA replication is the process of doubling a DNA molecule through template synthesis. In most cases of natural DNA replicationprimerfor DNA synthesis is short fragment (recreated). Such a ribonucleotide primer is created by the enzyme primase (DNA primase in prokaryotes, DNA polymerase in eukaryotes), and is subsequently replaced by deoxyribonucleotide polymerase, which normally performs repair functions (correcting chemical damage and breaks in the DNA molecule).

Replication occurs according to a semi-conservative mechanism. This means that the double helix of DNA unwinds and a new chain is built on each of its chains according to the principle of complementarity. The daughter DNA molecule thus contains one strand from the parent molecule and one newly synthesized one. Replication occurs in the direction from the 3' to the 5' end of the mother strand.

Rice. 8. Replication (doubling) of a DNA molecule

DNA synthesis- this is not as complicated a process as it might seem at first glance. If you think about it, first you need to figure out what synthesis is. This is the process of combining something into one whole. The formation of a new DNA molecule occurs in several stages:

1) DNA topoisomerase, located in front of the replication fork, cuts the DNA in order to facilitate its unwinding and unwinding.
2) DNA helicase, following topoisomerase, influences the process of “unbraiding” of the DNA helix.
3) DNA-binding proteins bind DNA strands and also stabilize them, preventing them from sticking to each other.
4) DNA polymerase δ(delta) , coordinated with the speed of movement of the replication fork, carries out synthesisleadingchains subsidiary DNA in the 5"→3" direction on the matrix maternal DNA strands in the direction from its 3" end to the 5" end (speed up to 100 nucleotide pairs per second). These events at this maternal DNA strands are limited.



Rice. 9. Schematic representation of the DNA replication process: (1) Lagging strand (lagging strand), (2) Leading strand (leading strand), (3) DNA polymerase α (Polα), (4) DNA ligase, (5) RNA -primer, (6) Primase, (7) Okazaki fragment, (8) DNA polymerase δ (Polδ), (9) Helicase, (10) Single-stranded DNA-binding proteins, (11) Topoisomerase.

The synthesis of the lagging strand of daughter DNA is described below (see. Scheme replication fork and functions of replication enzymes)

For more information about DNA replication, see

5) Immediately after the other strand of the mother molecule is unraveled and stabilized, it is attached to itDNA polymerase α(alpha)and in the 5"→3" direction it synthesizes a primer (RNA primer) - an RNA sequence on a DNA template with a length of 10 to 200 nucleotides. After this the enzymeremoved from the DNA strand.

Instead of DNA polymerasesα is attached to the 3" end of the primer DNA polymeraseε .

6) DNA polymeraseε (epsilon) seems to continue to extend the primer, but inserts it as a substratedeoxyribonucleotides(in the amount of 150-200 nucleotides). As a result, a single thread is formed from two parts -RNA(i.e. primer) and DNA. DNA polymerase εruns until it encounters the previous primerfragment of Okazaki(synthesized a little earlier). After this, this enzyme is removed from the chain.

7) DNA polymerase β(beta) stands insteadDNA polymerase ε,moves in the same direction (5"→3") and removes the primer ribonucleotides while simultaneously inserting deoxyribonucleotides in their place. The enzyme works until the primer is completely removed, i.e. until a deoxyribonucleotide (an even earlier synthesizedDNA polymerase ε). The enzyme is not able to connect the result of its work with the DNA in front, so it goes off the chain.

As a result, a fragment of daughter DNA “lies” on the matrix of the mother strand. It is calledfragment of Okazaki.

8) DNA ligase crosslinks two adjacent fragments of Okazaki , i.e. 5" end of the segment synthesizedDNA polymerase ε,and 3"-end chain built-inDNA polymeraseβ .

STRUCTURE OF RNA

Ribonucleic acid(RNA) is one of the three main macromolecules (the other two are DNA and proteins) that are found in the cells of all living organisms.

Just like DNA, RNA consists of a long chain in which each link is called nucleotide. Each nucleotide consists of a nitrogenous base, a ribose sugar, and a phosphate group. However, unlike DNA, RNA usually has one strand rather than two. The pentose in RNA is ribose, not deoxyribose (ribose has an additional hydroxyl group on the second carbohydrate atom). Finally, DNA differs from RNA in the composition of nitrogenous bases: instead of thymine ( T) RNA contains uracil ( U) , which is also complementary to adenine.

The sequence of nucleotides allows RNA to encode genetic information. All cellular organisms use RNA (mRNA) to program protein synthesis.

Cellular RNA is produced through a process called transcription , that is, the synthesis of RNA on a DNA matrix, carried out by special enzymes - RNA polymerases.

Messenger RNAs (mRNAs) then take part in a process called broadcast, those. protein synthesis on an mRNA matrix with the participation of ribosomes. Other RNAs undergo chemical modifications after transcription, and after the formation of secondary and tertiary structures, they perform functions depending on the type of RNA.

Rice. 10. The difference between DNA and RNA in the nitrogenous base: instead of thymine (T), RNA contains uracil (U), which is also complementary to adenine.

TRANSCRIPTION

This is the process of RNA synthesis on a DNA template. DNA unwinds at one of the sites. One of the strands contains information that needs to be copied onto an RNA molecule - this strand is called the coding strand. The second strand of DNA, complementary to the coding one, is called the template. During transcription, a complementary RNA chain is synthesized on the template strand in the 3’ - 5’ direction (along the DNA strand). This creates an RNA copy of the coding strand.

Rice. 11. Schematic representation of the transcription

For example, if we are given the sequence of the coding chain

3’- ATGTCCTAGCTGCTCG - 5’,

then, according to the complementarity rule, the matrix chain will carry the sequence

5’- TACAGGATCGACGAGC- 3’,

and the RNA synthesized from it is the sequence

BROADCAST

Let's consider the mechanism protein synthesis on the RNA matrix, as well as the genetic code and its properties. Also, for clarity, at the link below, we recommend watching a short video about the processes of transcription and translation occurring in a living cell:

Rice. 12. Protein synthesis process: DNA codes for RNA, RNA codes for protein

GENETIC CODE

Genetic code- a method of encoding the amino acid sequence of proteins using a sequence of nucleotides. Each amino acid is encoded by a sequence of three nucleotides - a codon or triplet.

Genetic code common to most pro- and eukaryotes. The table shows all 64 codons and the corresponding amino acids. The base order is from the 5" to the 3" end of the mRNA.

Table 1. Standard genetic code

1st
the basis

tion

2nd base

3rd
the basis

tion

U

C

A

G

U

U U U

(Phe/F)

U C U

(Ser/S)

U A U

(Tyr/Y)

U G U

(Cys/C)

U

U U C

U C C

U A C

U G C

C

U U A

(Leu/L)

U C A

U A A

Stop codon**

U G A

Stop codon**

A

U U G

U C G

U A G

Stop codon**

U G G

(Trp/W)

G

C

C U U

C C U

(Pro/P)

C A U

(His/H)

C G U

(Arg/R)

U

C U C

C C C

C A C

C G C

C

C U A

C C A

C A A

(Gln/Q)

C GA

A

C U G

C C G

C A G

C G G

G

A

A U U

(Ile/I)

A C U

(Thr/T)

A A U

(Asn/N)

A G U

(Ser/S)

U

A U C

A C C

A A C

A G C

C

A U A

A C A

A A A

(Lys/K)

A G A

A

A U G

(Met/M)

A C G

A A G

A G G

G

G

G U U

(Val/V)

G C U

(Ala/A)

G A U

(Asp/D)

G G U

(Gly/G)

U

G U C

G C C

G A C

G G C

C

G U A

G C A

G A A

(Glu/E)

G G A

A

G U G

G C G

G A G

G G G

G

Among the triplets, there are 4 special sequences that serve as “punctuation marks”:

  • *Triplet AUG, also encoding methionine, is called start codon. The synthesis of a protein molecule begins with this codon. Thus, during protein synthesis, the first amino acid in the sequence will always be methionine.
  • **Triplets UAA, UAG And U.G.A. are called stop codons and do not code for a single amino acid. At these sequences, protein synthesis stops.

Properties of the genetic code

1. Triplety. Each amino acid is encoded by a sequence of three nucleotides - a triplet or codon.

2. Continuity. There are no additional nucleotides between the triplets; the information is read continuously.

3. Non-overlapping. One nucleotide cannot be included in two triplets at the same time.

4. Unambiguity. One codon can code for only one amino acid.

5. Degeneracy. One amino acid can be encoded by several different codons.

6. Versatility. The genetic code is the same for all living organisms.

Example. We are given the sequence of the coding chain:

3’- CCGATTGCACGTCGATCGTATA- 5’.

The matrix chain will have the sequence:

5’- GGCTAACGTGCAGCTAGCATAT- 3’.

Now we “synthesize” information RNA from this chain:

3’- CCGAUUGCACGUCGAUCGUAUA- 5’.

Protein synthesis proceeds in the direction 5’ → 3’, therefore, we need to reverse the sequence to “read” the genetic code:

5’- AUAUGCUAGCUGCACGUUAGCC- 3’.

Now let's find the start codon AUG:

5’- AU AUG CUAGCUGCACGUUAGCC- 3’.

Let's divide the sequence into triplets:

sounds like this: information is transferred from DNA to RNA (transcription), from RNA to protein (translation). DNA can also be duplicated by replication, and the process of reverse transcription is also possible, when DNA is synthesized from an RNA template, but this process is mainly characteristic of viruses.


Rice. 13. Central Dogma of Molecular Biology

GENOME: GENES and CHROMOSOMES

(general concepts)

Genome - the totality of all the genes of an organism; its complete chromosome set.

The term “genome” was proposed by G. Winkler in 1920 to describe the set of genes contained in the haploid set of chromosomes of organisms of one biological species. The original meaning of this term indicated that the concept of a genome, in contrast to a genotype, is a genetic characteristic of the species as a whole, and not of an individual. With the development of molecular genetics, the meaning of this term has changed. It is known that DNA, which is the carrier of genetic information in most organisms and, therefore, forms the basis of the genome, includes not only genes in the modern sense of the word. Most of the DNA of eukaryotic cells is represented by non-coding (“redundant”) nucleotide sequences that do not contain information about proteins and nucleic acids. Thus, the main part of the genome of any organism is the entire DNA of its haploid set of chromosomes.

Genes are sections of DNA molecules that encode polypeptides and RNA molecules

Over the last century, our understanding of genes has changed significantly. Previously, a genome was a region of a chromosome that encodes or defines one characteristic or phenotypic(visible) property, such as eye color.

In 1940, George Beadle and Edward Tatham proposed a molecular definition of the gene. Scientists processed fungal spores Neurospora crassa X-rays and other agents that cause changes in the DNA sequence ( mutations), and discovered mutant strains of the fungus that had lost some specific enzymes, which in some cases led to disruption of the entire metabolic pathway. Beadle and Tatem concluded that a gene is a piece of genetic material that specifies or codes for a single enzyme. This is how the hypothesis appeared "one gene - one enzyme". This concept was later expanded to define "one gene - one polypeptide", since many genes encode proteins that are not enzymes, and the polypeptide may be a subunit of a complex protein complex.

In Fig. Figure 14 shows a diagram of how triplets of nucleotides in DNA determine a polypeptide - the amino acid sequence of a protein through the mediation of mRNA. One of the DNA chains plays the role of a template for the synthesis of mRNA, the nucleotide triplets (codons) of which are complementary to the DNA triplets. In some bacteria and many eukaryotes, coding sequences are interrupted by non-coding regions (called introns).

Modern biochemical determination of the gene even more specific. Genes are all sections of DNA that encode the primary sequence of end products, which include polypeptides or RNA that have a structural or catalytic function.

Along with genes, DNA also contains other sequences that perform exclusively a regulatory function. Regulatory sequences may mark the beginning or end of genes, influence transcription, or indicate the site of initiation of replication or recombination. Some genes can be expressed in different ways, with the same DNA region serving as a template for the formation of different products.

We can roughly calculate minimum gene size, encoding the middle protein. Each amino acid in a polypeptide chain is encoded by a sequence of three nucleotides; the sequences of these triplets (codons) correspond to the chain of amino acids in the polypeptide that is encoded by this gene. A polypeptide chain of 350 amino acid residues (medium length chain) corresponds to a sequence of 1050 bp. ( base pairs). However, many eukaryotic genes and some prokaryotic genes are interrupted by DNA segments that do not carry protein information, and therefore turn out to be much longer than a simple calculation shows.

How many genes are on one chromosome?


Rice. 15. View of chromosomes in prokaryotic (left) and eukaryotic cells. Histones are a large class of nuclear proteins that perform two main functions: they participate in the packaging of DNA strands in the nucleus and in the epigenetic regulation of nuclear processes such as transcription, replication and repair.

The DNA of prokaryotes is simpler: their cells do not have a nucleus, so the DNA is located directly in the cytoplasm in the form of a nucleoid.

As is known, bacterial cells have a chromosome in the form of a DNA strand arranged in a compact structure - a nucleoid. Prokaryotic chromosome Escherichia coli, whose genome has been completely deciphered, is a circular DNA molecule (in fact, it is not a perfect circle, but rather a loop without a beginning or end), consisting of 4,639,675 bp. This sequence contains approximately 4,300 protein genes and another 157 genes for stable RNA molecules. IN human genome approximately 3.1 billion base pairs corresponding to nearly 29,000 genes located on 24 different chromosomes.

Prokaryotes (Bacteria).

Bacterium E. coli has one double-stranded circular DNA molecule. It consists of 4,639,675 bp. and reaches a length of approximately 1.7 mm, which exceeds the length of the cell itself E. coli approximately 850 times. In addition to the large circular chromosome as part of the nucleoid, many bacteria contain one or several small circular DNA molecules that are freely located in the cytosol. These extrachromosomal elements are called plasmids(Fig. 16).

Most plasmids consist of only a few thousand base pairs, some contain more than 10,000 bp. They carry genetic information and replicate to form daughter plasmids, which enter the daughter cells during the division of the parent cell. Plasmids are found not only in bacteria, but also in yeast and other fungi. In many cases, plasmids provide no benefit to the host cells and their sole purpose is to reproduce independently. However, some plasmids carry genes beneficial to the host. For example, genes contained in plasmids can make bacterial cells resistant to antibacterial agents. Plasmids carrying the β-lactamase gene provide resistance to β-lactam antibiotics such as penicillin and amoxicillin. Plasmids can pass from cells that are resistant to antibiotics to other cells of the same or a different species of bacteria, causing those cells to also become resistant. Intensive use of antibiotics is a powerful selective factor contributing to the spread of plasmids encoding antibiotic resistance (as well as transposons that encode similar genes) among pathogenic bacteria, and leads to the emergence of bacterial strains resistant to several antibiotics. Doctors are beginning to understand the dangers of widespread use of antibiotics and prescribe them only in cases of urgent need. For similar reasons, the widespread use of antibiotics to treat farm animals is limited.

See also: Ravin N.V., Shestakov S.V. Genome of prokaryotes // Vavilov Journal of Genetics and Breeding, 2013. T. 17. No. 4/2. pp. 972-984.

Eukaryotes.

Table 2. DNA, genes and chromosomes of some organisms

Shared DNA

p.n.

Number of chromosomes*

Approximate number of genes

Escherichia coli(bacterium)

4 639 675

4 435

Saccharomyces cerevisiae(yeast)

12 080 000

16**

5 860

Caenorhabditis elegans(nematode)

90 269 800

12***

23 000

Arabidopsis thaliana(plant)

119 186 200

33 000

Drosophila melanogaster(fruit fly)

120 367 260

20 000

Oryza sativa(rice)

480 000 000

57 000

Mus musculus(mouse)

2 634 266 500

27 000

Homo sapiens(Human)

3 070 128 600

29 000

Note. Information is constantly updated; For more up-to-date information, refer to individual genomics project websites

* For all eukaryotes, except yeast, the diploid set of chromosomes is given. Diploid kit chromosomes (from the Greek diploos - double and eidos - species) - double set of chromosomes(2n), each of which has a homologous one.
**Haploid set. Wild yeast strains typically have eight (octaploid) or more sets of these chromosomes.
***For females with two X chromosomes. Males have an X chromosome, but no Y, i.e. only 11 chromosomes.

Yeast, one of the smallest eukaryotes, has 2.6 times more DNA than E. coli(Table 2). Fruit fly cells Drosophila, a classic subject of genetic research, contain 35 times more DNA, and human cells contain approximately 700 times more DNA than E. coli. Many plants and amphibians contain even more DNA. The genetic material of eukaryotic cells is organized in the form of chromosomes. Diploid set of chromosomes (2 n) depends on the type of organism (Table 2).

For example, a human somatic cell has 46 chromosomes ( rice. 17). Each chromosome of a eukaryotic cell, as shown in Fig. 17, A, contains one very large double-stranded DNA molecule. Twenty-four human chromosomes (22 paired chromosomes and two sex chromosomes X and Y) vary in length by more than 25 times. Each eukaryotic chromosome contains a specific set of genes.


Rice. 17. Chromosomes of eukaryotes.A- a pair of linked and condensed sister chromatids from the human chromosome. In this form, eukaryotic chromosomes remain after replication and in metaphase during mitosis. b- a complete set of chromosomes from a leukocyte of one of the authors of the book. Each normal human somatic cell contains 46 chromosomes.


The size and function of DNA as a matrix for storing and transmitting hereditary material explains the presence of special structural elements in the organization of this molecule. In higher organisms, DNA is distributed between chromosomes.

The collection of DNA (chromosomes) of an organism is called the genome. Chromosomes are found in the cell nucleus and form a structure called chromatin. Chromatin is a complex of DNA and basic proteins (histones) in a 1:1 ratio. DNA length is usually measured by the number of complementary nucleotide pairs (bp). For example, the 3rd human chromosomecentury is a DNA molecule measuring 160 million bp. Isolated linearized DNA measuring 3*10 6 bp. has a length of approximately 1 mm, therefore, the linearized molecule of the 3rd human chromosome would be 5 mm in length, and the DNA of all 23 chromosomes (~3 * 10 9 bp, MR = 1.8 * 10 12) of a haploid cell - an egg or sperm - in linearized form would be 1 m. With the exception of germ cells, all cells of the human body (there are about 1013 of them) contain a double set of chromosomes. During cell division, all 46 DNA molecules are replicated and reorganized into 46 chromosomes.

If you connect the DNA molecules of the human genome (22 chromosomes and chromosomes X and Y or X and X), you get a sequence about one meter long. Note: In all mammals and other heterogametic male organisms, females have two X chromosomes (XX) and males have one X chromosome and one Y chromosome (XY).

Most human cells, so the total DNA length of such cells is about 2 m. An adult human has approximately 10 14 cells, so the total length of all DNA molecules is 2・10 11 km. For comparison, the circumference of the Earth is 4・10 4 km, and the distance from the Earth to the Sun is 1.5・10 8 km. This is how amazingly compact DNA is packed in our cells!

In eukaryotic cells there are other organelles containing DNA - mitochondria and chloroplasts. Many hypotheses have been put forward regarding the origin of mitochondrial and chloroplast DNA. The generally accepted point of view today is that they represent the rudiments of the chromosomes of ancient bacteria, which penetrated the cytoplasm of the host cells and became the precursors of these organelles. Mitochondrial DNA encodes mitochondrial tRNAs and rRNAs, as well as several mitochondrial proteins. More than 95% of mitochondrial proteins are encoded by nuclear DNA.

STRUCTURE OF GENES

Let's consider the structure of the gene in prokaryotes and eukaryotes, their similarities and differences. Despite the fact that a gene is a section of DNA that encodes only one protein or RNA, in addition to the immediate coding part, it also includes regulatory and other structural elements that have different structures in prokaryotes and eukaryotes.

Coding sequence- the main structural and functional unit of the gene, it is in it that the triplets of nucleotides encoding are locatedamino acid sequence. It begins with a start codon and ends with a stop codon.

Before and after the coding sequence there are untranslated 5' and 3' sequences. They perform regulatory and auxiliary functions, for example, ensuring the landing of the ribosome on mRNA.

Untranslated and coding sequences make up the transcription unit - the transcribed section of DNA, that is, the section of DNA from which mRNA synthesis occurs.

Terminator- a non-transcribed section of DNA at the end of a gene where RNA synthesis stops.

At the beginning of the gene is regulatory region, which includes promoter And operator.

Promoter- the sequence to which the polymerase binds during transcription initiation. Operator- this is an area that special proteins can bind to - repressors, which can reduce the activity of RNA synthesis from this gene - in other words, reduce it expression.

Gene structure in prokaryotes

The general plan of gene structure in prokaryotes and eukaryotes is no different - both contain a regulatory region with a promoter and operator, a transcription unit with coding and untranslated sequences, and a terminator. However, the organization of genes in prokaryotes and eukaryotes is different.

Rice. 18. Scheme of gene structure in prokaryotes (bacteria) -the image is enlarged

At the beginning and end of the operon there are common regulatory regions for several structural genes. From the transcribed region of the operon, one mRNA molecule is read, which contains several coding sequences, each of which has its own start and stop codon. From each of these areas withone protein is synthesized. Thus, Several protein molecules are synthesized from one mRNA molecule.

Prokaryotes are characterized by the combination of several genes into a single functional unit - operon. The operation of the operon can be regulated by other genes, which can be noticeably distant from the operon itself - regulators. The protein translated from this gene is called repressor. It binds to the operator of the operon, regulating the expression of all genes contained in it at once.

Prokaryotes are also characterized by the phenomenon Transcription-translation interfaces.


Rice. 19 The phenomenon of coupling of transcription and translation in prokaryotes - the image is enlarged

Such coupling does not occur in eukaryotes due to the presence of a nuclear envelope that separates the cytoplasm, where translation occurs, from the genetic material on which transcription occurs. In prokaryotes, during RNA synthesis on a DNA template, a ribosome can immediately bind to the synthesized RNA molecule. Thus, translation begins even before transcription is completed. Moreover, several ribosomes can simultaneously bind to one RNA molecule, synthesizing several molecules of one protein at once.

Gene structure in eukaryotes

The genes and chromosomes of eukaryotes are very complexly organized

Many species of bacteria have only one chromosome, and in almost all cases there is one copy of each gene on each chromosome. Only a few genes, such as rRNA genes, are found in multiple copies. Genes and regulatory sequences make up virtually the entire prokaryotic genome. Moreover, almost every gene strictly corresponds to the amino acid sequence (or RNA sequence) it encodes (Fig. 14).

The structural and functional organization of eukaryotic genes is much more complex. The study of eukaryotic chromosomes, and later the sequencing of complete eukaryotic genome sequences, brought many surprises. Many, if not most, eukaryotic genes have interesting feature: their nucleotide sequences contain one or more DNA regions that do not encode the amino acid sequence of the polypeptide product. Such untranslated insertions disrupt the direct correspondence between the nucleotide sequence of the gene and the amino acid sequence of the encoded polypeptide. These untranslated segments within genes are called introns, or built-in sequences, and the coding segments are exons. In prokaryotes, only a few genes contain introns.

So, in eukaryotes, the combination of genes into operons practically does not occur, and the coding sequence of a eukaryotic gene is most often divided into translated sections - exons, and untranslated sections - introns.

In most cases, the function of introns is not established. In general, only about 1.5% of human DNA is “coding,” that is, it carries information about proteins or RNA. However, taking into account large introns, it turns out that human DNA is 30% genes. Because genes make up a relatively small proportion of the human genome, a significant portion of DNA remains unaccounted for.

Rice. 16. Scheme of gene structure in eukaryotes - the image is enlarged

From each gene, immature or pre-RNA is first synthesized, which contains both introns and exons.

After this, the splicing process takes place, as a result of which the intronic regions are excised, and a mature mRNA is formed, from which protein can be synthesized.


Rice. 20. Alternative splicing process - the image is enlarged

This organization of genes makes it possible, for example, to realize when one gene can be synthesized different shapes protein, due to the fact that during splicing exons can be stitched together in different sequences.

Rice. 21. Differences in the structure of genes of prokaryotes and eukaryotes - the image is enlarged

MUTATIONS AND MUTAGENESIS

Mutation is called a persistent change in the genotype, that is, a change in the nucleotide sequence.

The process that leads to mutations is called mutagenesis, and the body All whose cells carry the same mutation - mutant.

Mutation theory was first formulated by Hugo de Vries in 1903. Its modern version includes the following provisions:

1. Mutations occur suddenly, spasmodically.

2. Mutations are passed on from generation to generation.

3. Mutations can be beneficial, harmful or neutral, dominant or recessive.

4. The probability of detecting mutations depends on the number of individuals studied.

5. Similar mutations can occur repeatedly.

6. Mutations are not directed.

Mutations can occur under the influence of various factors. There are mutations that arise under the influence of mutagenic impacts: physical (for example, ultraviolet or radiation), chemical (for example, colchicine or reactive oxygen species) and biological (for example, viruses). Mutations can also be caused replication errors.

Depending on the conditions under which mutations appear, mutations are divided into spontaneous- that is, mutations that arose in normal conditions, And induced- that is, mutations that arose under special conditions.

Mutations can occur not only in nuclear DNA, but also, for example, in mitochondrial or plastid DNA. Accordingly, we can distinguish nuclear And cytoplasmic mutations.

As a result of mutations, new alleles can often appear. If a mutant allele suppresses the action of a normal one, the mutation is called dominant. If a normal allele suppresses a mutant one, this mutation is called recessive. Most mutations that lead to the emergence of new alleles are recessive.

Mutations are distinguished by effect adaptive leading to increased adaptability of the organism to the environment, neutral, which do not affect survival, harmful, reducing the adaptability of organisms to environmental conditions and lethal, leading to the death of the organism in the early stages of development.

According to the consequences, mutations leading to loss of protein function, mutations leading to emergence protein has a new function, as well as mutations that change gene dosage, and, accordingly, the dose of protein synthesized from it.

A mutation can occur in any cell of the body. If a mutation occurs in a germ cell, it is called germinal(germinal or generative). Such mutations do not appear in the organism in which they appeared, but lead to the appearance of mutants in the offspring and are inherited, so they are important for genetics and evolution. If a mutation occurs in any other cell, it is called somatic. Such a mutation can manifest itself to one degree or another in the organism in which it arose, for example, leading to the formation of cancerous tumors. However, such a mutation is not inherited and does not affect descendants.

Mutations can affect regions of the genome of different sizes. Highlight genetic, chromosomal And genomic mutations.

Gene mutations

Mutations that occur on a scale smaller than one gene are called genetic, or point (point). Such mutations lead to changes in one or several nucleotides in the sequence. Among gene mutations there arereplacements, leading to the replacement of one nucleotide with another,deletions, leading to the loss of one of the nucleotides,insertions, leading to the addition of an extra nucleotide to the sequence.


Rice. 23. Gene (point) mutations

According to the mechanism of action on the protein, gene mutations are divided into:synonymous, which (as a result of the degeneracy of the genetic code) do not lead to a change in the amino acid composition of the protein product,missense mutations, which lead to the replacement of one amino acid with another and can affect the structure of the synthesized protein, although they are often insignificant,nonsense mutations, leading to the replacement of the coding codon with a stop codon,mutations leading to splicing disorder:


Rice. 24. Mutation patterns

Also, according to the mechanism of action on the protein, mutations are distinguished that lead to frame shift reading, such as insertions and deletions. Such mutations, like nonsense mutations, although they occur at one point in the gene, often affect the entire structure of the protein, which can lead to a complete change in its structure. when a section of a chromosome rotates 180 degrees, Rice. 28. Translocation

Rice. 29. Chromosome before and after duplication

Genomic mutations

Finally, genomic mutations affect the entire genome, that is, the number of chromosomes changes. There are polyploidies - an increase in the ploidy of the cell, and aneuploidies, that is, a change in the number of chromosomes, for example, trisomy (the presence of an additional homologue on one of the chromosomes) and monosomy (the absence of a homologue on the chromosome).

Video on DNA

DNA REPLICATION, RNA CODING, PROTEIN SYNTHESIS

(If the video is not displayed, it is available via

The genetic code is a system for recording hereditary information in nucleic acid molecules, based on a certain alternation of nucleotide sequences in DNA or RNA, forming codons corresponding to amino acids in a protein.

Properties of the genetic code.

The genetic code has several properties.

    Tripletity.

    Degeneracy or redundancy.

    Unambiguity.

    Polarity.

    Non-overlapping.

    Compactness.

    Versatility.

It should be noted that some authors also propose other properties of the code related to chemical features included in the code of nucleotides or with the frequency of occurrence of individual amino acids in the proteins of the body, etc. However, these properties follow from those listed above, so we will consider them there.

A. Tripletity. The genetic code, like many complexly organized systems, has the smallest structural and smallest functional unit. A triplet is the smallest structural unit of the genetic code. It consists of three nucleotides. A codon is the smallest functional unit of the genetic code. Typically, triplets of mRNA are called codons. In the genetic code, a codon performs several functions. Firstly, its main function is that it encodes a single amino acid. Secondly, the codon may not code for an amino acid, but, in this case, it performs another function (see below). As can be seen from the definition, a triplet is a concept that characterizes elementary structural unit genetic code (three nucleotides). Codon – characterizes elementary semantic unit genome - three nucleotides determine the attachment of one amino acid to the polypeptide chain.

The elementary structural unit was first deciphered theoretically, and then its existence was confirmed experimentally. Indeed, 20 amino acids cannot be encoded with one or two nucleotides because there are only 4 of the latter. Three out of four nucleotides give 4 3 = 64 variants, which more than covers the number of amino acids available in living organisms (see Table 1).

The 64 nucleotide combinations presented in table have two features. Firstly, of the 64 triplet variants, only 61 are codons and encode any amino acid; they are called sense codons. Three triplets do not encode

amino acids a are stop signals indicating the end of translation. There are three such triplets - UAA, UAG, UGA, they are also called “meaningless” (nonsense codons). As a result of a mutation, which is associated with the replacement of one nucleotide in a triplet with another, a nonsense codon can arise from a sense codon. This type of mutation is called nonsense mutation. If such a stop signal is formed inside the gene (in its information part), then during protein synthesis in this place the process will be constantly interrupted - only the first (before the stop signal) part of the protein will be synthesized. A person with this pathology will experience a lack of protein and will experience symptoms associated with this deficiency. For example, this kind of mutation was identified in the gene encoding the hemoglobin beta chain. A shortened inactive hemoglobin chain is synthesized, which is quickly destroyed. As a result, a hemoglobin molecule devoid of a beta chain is formed. It is clear that such a molecule is unlikely to fully fulfill its duties. A serious disease occurs that develops as hemolytic anemia (beta-zero thalassemia, from the Greek word “Thalas” - Mediterranean Sea, where this disease was first discovered).

The mechanism of action of stop codons differs from the mechanism of action of sense codons. This follows from the fact that for all codons encoding amino acids, corresponding tRNAs have been found. No tRNAs were found for nonsense codons. Consequently, tRNA does not take part in the process of stopping protein synthesis.

CodonAUG (sometimes GUG in bacteria) not only encode the amino acids methionine and valine, but are alsobroadcast initiator .

b. Degeneracy or redundancy.

61 of the 64 triplets encode 20 amino acids. This three-fold excess of the number of triplets over the number of amino acids suggests that two coding options can be used in the transfer of information. Firstly, not all 64 codons can be involved in encoding 20 amino acids, but only 20 and, secondly, amino acids can be encoded by several codons. Research has shown that nature used the latter option.

His preference is obvious. If out of 64 variant triplets only 20 were involved in encoding amino acids, then 44 triplets (out of 64) would remain non-coding, i.e. meaningless (nonsense codons). Previously, we pointed out how dangerous it is for the life of a cell to transform a coding triplet as a result of mutation into a nonsense codon - this significantly disrupts the normal functioning of RNA polymerase, ultimately leading to the development of diseases. Currently, three codons in our genome are nonsense, but now imagine what would happen if the number of nonsense codons increased by about 15 times. It is clear that in such a situation the transition of normal codons to nonsense codons will be immeasurably higher.

A code in which one amino acid is encoded by several triplets is called degenerate or redundant. Almost every amino acid has several codons. Thus, the amino acid leucine can be encoded by six triplets - UUA, UUG, TSUU, TsUC, TsUA, TsUG. Valine is encoded by four triplets, phenylalanine by two and only tryptophan and methionine encoded by one codon. The property that is associated with recording the same information with different symbols is called degeneracy.

The number of codons designated for one amino acid correlates well with the frequency of occurrence of the amino acid in proteins.

And this is most likely not accidental. The higher the frequency of occurrence of an amino acid in a protein, the more often the codon of this amino acid is represented in the genome, the higher the likelihood of its damage by mutagenic factors. Therefore, it is clear that a mutated codon has a greater chance of encoding the same amino acid if it is highly degenerate. From this perspective, the degeneracy of the genetic code is a mechanism that protects the human genome from damage.

It should be noted that the term degeneracy is used in molecular genetics in another sense. Thus, the bulk of the information in a codon is contained in the first two nucleotides; the base in the third position of the codon turns out to be of little importance. This phenomenon is called “degeneracy of the third base.” The latter feature minimizes the effect of mutations. For example, it is known that the main function of red blood cells is to carry oxygen from the lungs to the tissues and carbon dioxide from tissues to lungs. This function is performed by the respiratory pigment - hemoglobin, which fills the entire cytoplasm of the erythrocyte. It consists of a protein part - globin, which is encoded by the corresponding gene. In addition to protein, the hemoglobin molecule contains heme, which contains iron. Mutations in globin genes lead to the appearance of different variants of hemoglobins. Most often, mutations are associated with replacing one nucleotide with another and the appearance of a new codon in the gene, which may encode a new amino acid in the hemoglobin polypeptide chain. In a triplet, as a result of mutation, any nucleotide can be replaced - the first, second or third. Several hundred mutations are known that affect the integrity of the globin genes. Near 400 of which are associated with the replacement of single nucleotides in a gene and the corresponding amino acid replacement in a polypeptide. Of these only 100 replacements lead to instability of hemoglobin and various kinds of diseases from mild to very severe. 300 (approximately 64%) substitution mutations do not affect hemoglobin function and do not lead to pathology. One of the reasons for this is the above-mentioned “degeneracy of the third base,” when a replacement of the third nucleotide in a triplet encoding serine, leucine, proline, arginine and some other amino acids leads to the appearance of a synonymous codon encoding the same amino acid. Such a mutation will not manifest itself phenotypically. In contrast, any replacement of the first or second nucleotide in a triplet in 100% of cases leads to the appearance of a new hemoglobin variant. But even in this case, there may not be severe phenotypic disorders. The reason for this is the replacement of an amino acid in hemoglobin with another one similar to the first one. physical and chemical properties. For example, if an amino acid with hydrophilic properties is replaced by another amino acid, but with the same properties.

Hemoglobin consists of the iron porphyrin group of heme (oxygen and carbon dioxide molecules are attached to it) and protein - globin. Adult hemoglobin (HbA) contains two identical-chains and two-chains. Molecule-chain contains 141 amino acid residues,-chain - 146,- And-chains differ in many amino acid residues. The amino acid sequence of each globin chain is encoded by its own gene. Gene encoding-the chain is located in the short arm of chromosome 16,-gene - in the short arm of chromosome 11. Substitution in the gene encoding-the hemoglobin chain of the first or second nucleotide almost always leads to the appearance of new amino acids in the protein, disruption of hemoglobin functions and serious consequences for the patient. For example, replacing “C” in one of the triplets CAU (histidine) with “Y” will lead to the appearance of a new triplet UAU, encoding another amino acid - tyrosine. Phenotypically this will manifest itself in a severe disease.. A similar substitution in position 63-chain of histidine polypeptide to tyrosine will lead to destabilization of hemoglobin. The disease methemoglobinemia develops. Replacement, as a result of mutation, of glutamic acid with valine in the 6th position-chain is the cause of the most severe disease - sickle cell anemia. Let's not continue the sad list. Let us only note that when replacing the first two nucleotides, an amino acid with physicochemical properties similar to the previous one may appear. Thus, replacement of the 2nd nucleotide in one of the triplets encoding glutamic acid (GAA) in-chain with “U” leads to the appearance of a new triplet (GUA), encoding valine, and replacing the first nucleotide with “A” forms the triplet AAA, encoding the amino acid lysine. Glutamic acid and lysine are similar in physicochemical properties - they are both hydrophilic. Valine is a hydrophobic amino acid. Therefore, replacing hydrophilic glutamic acid with hydrophobic valine significantly changes the properties of hemoglobin, which ultimately leads to the development of sickle cell anemia, while replacing hydrophilic glutamic acid with hydrophilic lysine changes the function of hemoglobin to a lesser extent - patients develop a mild form of anemia. As a result of the replacement of the third base, the new triplet can encode the same amino acids as the previous one. For example, if in the CAC triplet uracil was replaced by cytosine and a CAC triplet appeared, then practically no phenotypic changes will be detected in humans. This is understandable, because both triplets code for the same amino acid – histidine.

In conclusion, it is appropriate to emphasize that the degeneracy of the genetic code and the degeneracy of the third base from a general biological point of view are protective mechanisms that are inherent in evolution in the unique structure of DNA and RNA.

V. Unambiguity.

Each triplet (except nonsense) encodes only one amino acid. Thus, in the direction codon - amino acid the genetic code is unambiguous, in the direction amino acid - codon it is ambiguous (degenerate).

Unambiguous

Amino acid codon

Degenerate

And in this case, the need for unambiguity in the genetic code is obvious. In another option, when translating the same codon, different amino acids would be inserted into the protein chain and, as a result, proteins with different primary structure And different function. Cell metabolism would switch to the “one gene – several polypeptides” mode of operation. It is clear that in such a situation the regulatory function of genes would be completely lost.

g. Polarity

Reading information from DNA and mRNA occurs only in one direction. Polarity is important for defining higher order structures (secondary, tertiary, etc.). Earlier we talked about how lower-order structures determine higher-order structures. Tertiary structure and higher order structures in proteins are formed as soon as the synthesized RNA chain leaves the DNA molecule or the polypeptide chain leaves the ribosome. While the free end of an RNA or polypeptide acquires a tertiary structure, the other end of the chain continues to be synthesized on DNA (if RNA is transcribed) or a ribosome (if a polypeptide is transcribed).

Therefore, the unidirectional process of reading information (during the synthesis of RNA and protein) is essential not only for determining the sequence of nucleotides or amino acids in the synthesized substance, but for the strict determination of secondary, tertiary, etc. structures.

d. Non-overlapping.

The code may be overlapping or non-overlapping. Most organisms have a non-overlapping code. Overlapping code is found in some phages.

The essence of a non-overlapping code is that a nucleotide of one codon cannot simultaneously be a nucleotide of another codon. If the code were overlapping, then the sequence of seven nucleotides (GCUGCUG) could encode not two amino acids (alanine-alanine) (Fig. 33, A) as in the case of a non-overlapping code, but three (if there is one nucleotide in common) (Fig. 33, B) or five (if two nucleotides are common) (see Fig. 33, C). In the last two cases, a mutation of any nucleotide would lead to a violation in the sequence of two, three, etc. amino acids.

However, it has been established that a mutation of one nucleotide always disrupts the inclusion of one amino acid in a polypeptide. This is a significant argument that the code is non-overlapping.

Let us explain this in Figure 34. Bold lines show triplets encoding amino acids in the case of non-overlapping and overlapping code. Experiments have clearly shown that the genetic code is non-overlapping. Without going into details of the experiment, we note that if you replace the third nucleotide in the sequence of nucleotides (see Fig. 34)U (marked with an asterisk) to some other thing:

1. With a non-overlapping code, the protein controlled by this sequence would have a substitution of one (first) amino acid (marked with asterisks).

2. With an overlapping code in option A, a substitution would occur in two (first and second) amino acids (marked with asterisks). Under option B, the replacement would affect three amino acids (marked with asterisks).

However, numerous experiments have shown that when one nucleotide in DNA is disrupted, the disruption in the protein always affects only one amino acid, which is typical for a non-overlapping code.

GZUGZUG GZUGZUG GZUGZUG

GCU GCU GCU UGC GCU GCU GCU UGC GCU GCU GCU

*** *** *** *** *** ***

Alanin - Alanin Ala - Cis - Ley Ala - Ley - Ley - Ala - Ley

A B C

Non-overlapping code Overlapping code

Rice. 34. A diagram explaining the presence of a non-overlapping code in the genome (explanation in the text).

The non-overlap of the genetic code is associated with another property - the reading of information begins from a certain point - the initiation signal. Such an initiation signal in mRNA is the codon encoding methionine AUG.

It should be noted that humans still have a small number of genes that deviate from general rule and overlap.

e. Compactness.

There is no punctuation between codons. In other words, triplets are not separated from each other, for example, by one meaningless nucleotide. The absence of “punctuation marks” in the genetic code has been proven in experiments.

and. Versatility.

The code is the same for all organisms living on Earth. Direct evidence of the universality of the genetic code was obtained by comparing DNA sequences with corresponding protein sequences. It turned out that all bacterial and eukaryotic genomes use the same sets of code values. There are exceptions, but not many.

The first exceptions to the universality of the genetic code were found in the mitochondria of some animal species. This concerned the terminator codon UGA, which reads the same as the codon UGG, encoding the amino acid tryptophan. Other rarer deviations from universality were also found.

DNA code system.

The genetic code of DNA consists of 64 triplets of nucleotides. These triplets are called codons. Each codon codes for one of the 20 amino acids used in protein synthesis. This gives some redundancy in the code: most amino acids are coded for by more than one codon.
One codon does two interconnected functions: signals the start of translation and encodes the inclusion of the amino acid methionine (Met) into the growing polypeptide chain. The DNA coding system is designed so that the genetic code can be expressed either as RNA codons or DNA codons. RNA codons are found in RNA (mRNA) and these codons are able to read information during the synthesis of polypeptides (a process called translation). But each mRNA molecule acquires a nucleotide sequence in transcription from the corresponding gene.

All but two amino acids (Met and Trp) can be encoded by 2 to 6 different codons. However, the genome of most organisms shows that certain codons are favored over others. In humans, for example, alanine is encoded by GCC four times more often than by GCG. This probably indicates greater translation efficiency of the translation apparatus (for example, the ribosome) for some codons.

The genetic code is almost universal. The same codons are assigned to the same section of amino acids and the same start and stop signals are overwhelmingly the same in animals, plants and microorganisms. However, some exceptions have been found. Most involve assigning one or two of the three stop codons to an amino acid.

Today it is no secret to anyone that the life program of all living organisms is written on a DNA molecule. The easiest way to imagine a DNA molecule is as a long ladder. The vertical posts of this staircase are made up of molecules of sugar, oxygen and phosphorus. All the important operating information in the molecule is written on the rungs of the ladder - they consist of two molecules, each of which is attached to one of the vertical posts. These molecules—the nitrogenous bases—are called adenine, guanine, thymine, and cytosine, but they are usually simply designated by the letters A, G, T, and C. The shape of these molecules allows them to form bonds—complete ladders—only of a certain type. These are connections between the bases A and T and between the bases G and C (the pair thus formed is called "base pair"). There cannot be any other types of connections in a DNA molecule.

By going down the steps along one strand of a DNA molecule, you get a sequence of bases. It is this message in the form of a sequence of bases that determines the flow of chemical reactions in the cell and, consequently, the characteristics of the organism possessing this DNA. According to the central dogma of molecular biology, the DNA molecule encodes information about proteins, which, in turn, act as enzymes ( cm. Catalysts and enzymes) regulate everything chemical reactions in living organisms.

The strict correspondence between the sequence of base pairs in a DNA molecule and the sequence of amino acids that make up protein enzymes is called the genetic code. The genetic code was deciphered soon after the discovery of the double-stranded structure of DNA. It was known that the newly discovered molecule informational, or matrix RNA (mRNA, or mRNA) carries information written on DNA. Biochemists Marshall W. Nirenberg and J. Heinrich Matthaei of the National Institutes of Health in Bethesda, near Washington, D.C., conducted the first experiments that led to clues to the genetic code.

They began by synthesizing artificial mRNA molecules consisting only of the repeating nitrogenous base uracil (which is an analogue of thymine, "T", and forms bonds only with adenine, "A", from the DNA molecule). They added these mRNAs to test tubes with a mixture of amino acids, and in each tube only one of the amino acids was labeled with a radioactive label. The researchers discovered that the mRNA they artificially synthesized initiated protein formation in only one test tube, which contained the labeled amino acid phenylalanine. So they established that the sequence “—U—U—U—” on the mRNA molecule (and, therefore, the equivalent sequence “—A—A—A—” on the DNA molecule) encodes a protein consisting only of the amino acid phenylalanine. This was the first step towards deciphering the genetic code.

Today it is known that three base pairs of a DNA molecule (this triplet is called codon) code for one amino acid in a protein. By performing experiments similar to those described above, geneticists eventually deciphered the entire genetic code, in which each of the 64 possible codons corresponds to a specific amino acid.