Jump to content

Protein structure: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m grammatical
consistent citation formatting
Line 3: Line 3:
{{More citations needed|date=May 2018}}
{{More citations needed|date=May 2018}}
{{Protein structure}}
{{Protein structure}}
'''Protein structure''' is the [[molecular geometry|three-dimensional arrangement of atom]]s in an [[amino acid]]-chain [[molecule]]. [[Protein]]s are [[polymer]]s{{snd}} specifically [[polypeptide]]s{{snd}} formed from sequences of [[amino acid]]s, which are the [[monomer]]s of the polymer. A single amino acid monomer may also be called a ''residue'', which indicates a repeating unit of a polymer. Proteins form by amino acids undergoing [[condensation reaction]]s, in which the amino acids lose one [[water molecule]] per [[chemical reaction|reaction]] in order to attach to one another with a [[peptide bond]]. By convention, a chain under 30 amino acids is often identified as a [[peptide]], rather than a protein.<ref name="Stoker2015">{{cite book|author=H. Stephen Stoker|title=Organic and Biological Chemistry|url=https://books.google.com/books?id=HRCdBQAAQBAJ&pg=PA371|date=1 January 2015|publisher=Cengage Learning|isbn=978-1-305-68645-8|page=371}}</ref> To be able to perform their biological function, proteins fold into one or more specific spatial conformations driven by a number of [[non-covalent interaction]]s, such as [[hydrogen bonding]], [[ionic interaction]]s, [[Van der Waals forces]], and [[hydrophobic]] packing. To understand the functions of proteins at a molecular level, it is often necessary to determine their [[Protein tertiary structure|three-dimensional structure]]. This is the topic of the scientific field of [[structural biology]], which employs techniques such as [[X-ray crystallography]], [[protein NMR|NMR spectroscopy]], [[Cryogenic electron microscopy|cryo-electron microscopy (cryo-EM)]] and [[dual polarisation interferometry]], to determine the structure of proteins.
'''Protein structure''' is the [[molecular geometry|three-dimensional arrangement of atom]]s in an [[amino acid]]-chain [[molecule]]. [[Protein]]s are [[polymer]]s{{snd}} specifically [[polypeptide]]s{{snd}} formed from sequences of [[amino acid]]s, which are the [[monomer]]s of the polymer. A single amino acid monomer may also be called a ''residue'', which indicates a repeating unit of a polymer. Proteins form by amino acids undergoing [[condensation reaction]]s, in which the amino acids lose one [[water molecule]] per [[chemical reaction|reaction]] in order to attach to one another with a [[peptide bond]]. By convention, a chain under 30 amino acids is often identified as a [[peptide]], rather than a protein.<ref name="Stoker2015">{{cite book| vauthors = Stoker HS |title=Organic and Biological Chemistry|url=https://books.google.com/books?id=HRCdBQAAQBAJ&pg=PA371|date=1 January 2015|publisher=Cengage Learning|isbn=978-1-305-68645-8|page=371}}</ref> To be able to perform their biological function, proteins fold into one or more specific spatial conformations driven by a number of [[non-covalent interaction]]s, such as [[hydrogen bonding]], [[ionic interaction]]s, [[Van der Waals forces]], and [[hydrophobic]] packing. To understand the functions of proteins at a molecular level, it is often necessary to determine their [[Protein tertiary structure|three-dimensional structure]]. This is the topic of the scientific field of [[structural biology]], which employs techniques such as [[X-ray crystallography]], [[protein NMR|NMR spectroscopy]], [[Cryogenic electron microscopy|cryo-electron microscopy (cryo-EM)]] and [[dual polarisation interferometry]], to determine the structure of proteins.


Protein structures range in size from tens to several thousand amino acids.<ref name="Brocchieri2005">{{Cite journal|vauthors=Brocchieri L, Karlin S |title=Protein length in eukaryotic and prokaryotic proteomes |date=2005-06-10 |volume=33 |issue=10 |pages=3390–3400 |doi=10.1093/nar/gki615 |pmid=15951512 |journal=Nucleic Acids Research |pmc=1150220}}</ref> By physical size, proteins are classified as [[nanoparticle]]s, between 1–100&nbsp;nm. Very large [[protein complexes]] can be formed from [[protein subunit]]s. For example, many thousands of [[actin]] molecules assemble into a [[microfilament]].
Protein structures range in size from tens to several thousand amino acids.<ref name="Brocchieri2005">{{cite journal | vauthors = Brocchieri L, Karlin S | title = Protein length in eukaryotic and prokaryotic proteomes | journal = Nucleic Acids Research | volume = 33 | issue = 10 | pages = 3390–3400 | date = 2005-06-10 | pmid = 15951512 | pmc = 1150220 | doi = 10.1093/nar/gki615 }}</ref> By physical size, proteins are classified as [[nanoparticle]]s, between 1–100&nbsp;nm. Very large [[protein complexes]] can be formed from [[protein subunit]]s. For example, many thousands of [[actin]] molecules assemble into a [[microfilament]].


A protein usually undergoes [[Reversible process (thermodynamics)|reversible]] [[Conformational change|structural changes]] in performing its biological function. The alternative structures of the same protein are referred to as different [[conformational isomerism|conformations]], and transitions between them are called [[conformational change]]s.
A protein usually undergoes [[Reversible process (thermodynamics)|reversible]] [[Conformational change|structural changes]] in performing its biological function. The alternative structures of the same protein are referred to as different [[conformational isomerism|conformations]], and transitions between them are called [[conformational change]]s.
Line 15: Line 15:
===Primary structure===
===Primary structure===
{{Main|Protein primary structure}}
{{Main|Protein primary structure}}
The [[primary structure]] of a protein refers to the sequence of [[amino acid]]s in the polypeptide chain. The primary structure is held together by [[peptide bonds]] that are made during the process of [[protein biosynthesis]]. The two ends of the [[polypeptide chain]] are referred to as the [[carboxyl terminus]] (C-terminus) and the [[amino terminus]] (N-terminus) based on the nature of the free group on each extremity. Counting of residues always starts at the N-terminal end (NH<sub>2</sub>-group), which is the end where the amino group is not involved in a peptide bond. The primary structure of a protein is determined by the [[gene]] corresponding to the protein. A specific sequence of [[nucleotide]]s in [[DNA]] is [[transcription (genetics)|transcribed]] into [[mRNA]], which is read by the [[ribosome]] in a process called [[translation (biology)|translation]]. The sequence of amino acids in insulin was discovered by [[Frederick Sanger]], establishing that proteins have defining amino acid sequences.<ref>{{Cite journal|title = The amino-acid sequence in the phenylalanyl chain of insulin. I. The identification of lower peptides from partial hydrolysates|journal = The Biochemical Journal|date = 1951-09-01|issn = 0264-6021|pmc = 1197535|pmid = 14886310|pages = 463–481|volume = 49|issue = 4|first1 = F.|last1 = Sanger|first2 = H.|last2 = Tuppy|doi=10.1042/bj0490463}}</ref><ref>{{Cite journal|title = Chemistry of Insulin|journal = Science|date = 1959-05-15|issn = 0036-8075|pmid = 13658959|pages = 1340–1344|volume = 129|issue = 3359|doi = 10.1126/science.129.3359.1340|language = en|first = F.|last = Sanger|bibcode = 1959Sci...129.1340G}}</ref> The sequence of a protein is unique to that protein, and defines the structure and function of the protein. The sequence of a protein can be determined by methods such as [[Edman degradation]] or [[Mass spectrometry#Protein identification|tandem mass spectrometry]]. Often, however, it is read directly from the sequence of the gene using the [[genetic code]]. It is strictly recommended to use the words "amino acid residues" when discussing proteins because when a peptide bond is formed, a [[water molecule]] is lost, and therefore proteins are made up of amino acid residues. [[Post-translational modification]]s such as [[phosphorylation]]s and [[glycosylation]]s are usually also considered a part of the primary structure, and cannot be read from the gene. For example, [[insulin]] is composed of 51 amino acids in 2 chains. One chain has 31 amino acids, and the other has 20 amino acids.
The [[primary structure]] of a protein refers to the sequence of [[amino acid]]s in the polypeptide chain. The primary structure is held together by [[peptide bonds]] that are made during the process of [[protein biosynthesis]]. The two ends of the [[polypeptide chain]] are referred to as the [[carboxyl terminus]] (C-terminus) and the [[amino terminus]] (N-terminus) based on the nature of the free group on each extremity. Counting of residues always starts at the N-terminal end (NH<sub>2</sub>-group), which is the end where the amino group is not involved in a peptide bond. The primary structure of a protein is determined by the [[gene]] corresponding to the protein. A specific sequence of [[nucleotide]]s in [[DNA]] is [[transcription (genetics)|transcribed]] into [[mRNA]], which is read by the [[ribosome]] in a process called [[translation (biology)|translation]]. The sequence of amino acids in insulin was discovered by [[Frederick Sanger]], establishing that proteins have defining amino acid sequences.<ref>{{cite journal | vauthors = Sanger F, Tuppy H | title = The amino-acid sequence in the phenylalanyl chain of insulin. I. The identification of lower peptides from partial hydrolysates | journal = The Biochemical Journal | volume = 49 | issue = 4 | pages = 463–481 | date = September 1951 | pmid = 14886310 | pmc = 1197535 | doi = 10.1042/bj0490463 }}</ref><ref>{{cite journal | vauthors = Sanger F | title = Chemistry of insulin; determination of the structure of insulin opens the way to greater understanding of life processes | journal = Science | volume = 129 | issue = 3359 | pages = 1340–1344 | date = May 1959 | pmid = 13658959 | doi = 10.1126/science.129.3359.1340 | bibcode = 1959Sci...129.1340G }}</ref> The sequence of a protein is unique to that protein, and defines the structure and function of the protein. The sequence of a protein can be determined by methods such as [[Edman degradation]] or [[Mass spectrometry#Protein identification|tandem mass spectrometry]]. Often, however, it is read directly from the sequence of the gene using the [[genetic code]]. It is strictly recommended to use the words "amino acid residues" when discussing proteins because when a peptide bond is formed, a [[water molecule]] is lost, and therefore proteins are made up of amino acid residues. [[Post-translational modification]]s such as [[phosphorylation]]s and [[glycosylation]]s are usually also considered a part of the primary structure, and cannot be read from the gene. For example, [[insulin]] is composed of 51 amino acids in 2 chains. One chain has 31 amino acids, and the other has 20 amino acids.


===Secondary structure===
===Secondary structure===
[[File:Alpha helix.png|thumb|100px|An α-helix with hydrogen bonds (yellow dots)]]
[[File:Alpha helix.png|thumb|100px|An α-helix with hydrogen bonds (yellow dots)]]
{{Main|Protein secondary structure}}
{{Main|Protein secondary structure}}
[[Secondary structure]] refers to highly regular local sub-structures on the actual polypeptide backbone chain. Two main types of secondary structure, the [[alpha helix|α-helix]] and the [[beta strand|β-strand]] or [[beta sheet|β-sheet]]s, were suggested in 1951 by [[Linus Pauling]].<ref name="Pauling1951">{{Cite journal|vauthors=Pauling L, Corey RB, Branson HR |journal=Proc Natl Acad Sci USA |year=1951 |volume=37 |issue=4 |pages=205–211 |title=The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain |pmid=14816373 |doi=10.1073/pnas.37.4.205 |pmc=1063337|bibcode=1951PNAS...37..205P |doi-access=free }}</ref> These secondary structures are defined by patterns of [[hydrogen bonds]] between the main-chain peptide groups. They have a regular geometry, being constrained to specific values of the dihedral angles ψ and φ on the [[Ramachandran plot]]. Both the α-helix and the β-sheet represent a way of saturating all the hydrogen bond donors and acceptors in the peptide backbone. Some parts of the protein are ordered but do not form any regular structures. They should not be confused with [[random coil]], an unfolded polypeptide chain lacking any fixed three-dimensional structure. Several sequential secondary structures may form a "[[supersecondary structure|supersecondary unit]]".<ref name="ChiangYS2007">{{Cite journal|vauthors=Chiang YS, Gelfand TI, Kister AE, Gelfand IM |title=New classification of supersecondary structures of sandwich-like proteins uncovers strict patterns of strand assemblage. |journal=Proteins |volume=68 |issue=4 |pages=915–921 |year=2007 |pmid=17557333 |doi=10.1002/prot.21473|s2cid=29904865 }}</ref>
[[Secondary structure]] refers to highly regular local sub-structures on the actual polypeptide backbone chain. Two main types of secondary structure, the [[alpha helix|α-helix]] and the [[beta strand|β-strand]] or [[beta sheet|β-sheet]]s, were suggested in 1951 by [[Linus Pauling]].<ref name="Pauling1951">{{cite journal | vauthors = Pauling L, Corey RB, Branson HR | title = The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 37 | issue = 4 | pages = 205–211 | date = April 1951 | pmid = 14816373 | pmc = 1063337 | doi = 10.1073/pnas.37.4.205 | doi-access = free | bibcode = 1951PNAS...37..205P }}</ref> These secondary structures are defined by patterns of [[hydrogen bonds]] between the main-chain peptide groups. They have a regular geometry, being constrained to specific values of the dihedral angles ψ and φ on the [[Ramachandran plot]]. Both the α-helix and the β-sheet represent a way of saturating all the hydrogen bond donors and acceptors in the peptide backbone. Some parts of the protein are ordered but do not form any regular structures. They should not be confused with [[random coil]], an unfolded polypeptide chain lacking any fixed three-dimensional structure. Several sequential secondary structures may form a "[[supersecondary structure|supersecondary unit]]".<ref name="ChiangYS2007">{{cite journal | vauthors = Chiang YS, Gelfand TI, Kister AE, Gelfand IM | title = New classification of supersecondary structures of sandwich-like proteins uncovers strict patterns of strand assemblage | journal = Proteins | volume = 68 | issue = 4 | pages = 915–921 | date = September 2007 | pmid = 17557333 | doi = 10.1002/prot.21473 | s2cid = 29904865 }}</ref>


===Tertiary structure===
===Tertiary structure===
Line 28: Line 28:
===Quaternary structure===
===Quaternary structure===
{{Main|Protein quaternary structure}}
{{Main|Protein quaternary structure}}
Quaternary structure is the three-dimensional structure consisting of the aggregation of two or more individual polypeptide chains (subunits) that operate as a single functional unit ([[multimer]]). The resulting multimer is stabilized by the same [[non-covalent interaction]]s and disulfide bonds as in tertiary structure. There are many possible quaternary structure organisations.<ref name="pmid19059267">{{cite journal | vauthors = Moutevelis E, Woolfson DN | title = A periodic table of coiled-coil protein structures | journal = J. Mol. Biol. | volume = 385 | issue = 3 | pages = 726–32 | date = January 2009 | pmid = 19059267 |issn = 0022-2836 | doi = 10.1016/j.jmb.2008.11.028 }}</ref> Complexes of two or more polypeptides (i.e. multiple subunits) are called [[multimer]]s. Specifically it would be called a [[dimer (chemistry)|dimer]] if it contains two subunits, a [[trimer (chemistry)|trimer]] if it contains three subunits, a [[tetramer]] if it contains four subunits, and a [[pentamer]] if it contains five subunits. The subunits are frequently related to one another by [[symmetry group|symmetry operations]], such as a 2-fold axis in a dimer. Multimers made up of identical subunits are referred to with a prefix of "homo-" and those made up of different subunits are referred to with a prefix of "hetero-", for example, a heterotetramer, such as the two alpha and two beta chains of [[hemoglobin]].
Quaternary structure is the three-dimensional structure consisting of the aggregation of two or more individual polypeptide chains (subunits) that operate as a single functional unit ([[multimer]]). The resulting multimer is stabilized by the same [[non-covalent interaction]]s and disulfide bonds as in tertiary structure. There are many possible quaternary structure organisations.<ref name="pmid19059267">{{cite journal | vauthors = Moutevelis E, Woolfson DN | title = A periodic table of coiled-coil protein structures | journal = Journal of Molecular Biology | volume = 385 | issue = 3 | pages = 726–732 | date = January 2009 | pmid = 19059267 | doi = 10.1016/j.jmb.2008.11.028 }}</ref> Complexes of two or more polypeptides (i.e. multiple subunits) are called [[multimer]]s. Specifically it would be called a [[dimer (chemistry)|dimer]] if it contains two subunits, a [[trimer (chemistry)|trimer]] if it contains three subunits, a [[tetramer]] if it contains four subunits, and a [[pentamer]] if it contains five subunits. The subunits are frequently related to one another by [[symmetry group|symmetry operations]], such as a 2-fold axis in a dimer. Multimers made up of identical subunits are referred to with a prefix of "homo-" and those made up of different subunits are referred to with a prefix of "hetero-", for example, a heterotetramer, such as the two alpha and two beta chains of [[hemoglobin]].


==Domains, motifs, and folds in protein structure==
==Domains, motifs, and folds in protein structure==
Line 35: Line 35:


===Structural domain===
===Structural domain===
A [[structural domain]] is an element of the protein's overall structure that is self-stabilizing and often [[protein folding|folds]] independently of the rest of the protein chain. Many domains are not unique to the protein products of one [[gene]] or one [[gene family]] but instead appear in a variety of proteins. Domains often are named and singled out because they figure prominently in the biological function of the protein they belong to; for example, the "[[calcium]]-binding domain of [[calmodulin]]". Because they are independently stable, domains can be "swapped" by [[genetic engineering]] between one protein and another to make [[chimera (protein)|chimera]] proteins. A conservative combination of several domains that occur in different proteins, such as [[protein tyrosine phosphatase]] domain and [[C2 domain]] pair, was called "a superdomain" that may evolve as a single unit.<ref>{{Cite journal |vauthors=Haynie DT, Xue B |title=Superdomain in the protein structure hierarchy: the case of PTP-C2. |journal= Protein Science | date= 2015 |doi = 10.1002/pro.2664 | pmid = 25694109 |volume=24 |issue=5 |pages=874–82 |pmc=4420535}}</ref>
A [[structural domain]] is an element of the protein's overall structure that is self-stabilizing and often [[protein folding|folds]] independently of the rest of the protein chain. Many domains are not unique to the protein products of one [[gene]] or one [[gene family]] but instead appear in a variety of proteins. Domains often are named and singled out because they figure prominently in the biological function of the protein they belong to; for example, the "[[calcium]]-binding domain of [[calmodulin]]". Because they are independently stable, domains can be "swapped" by [[genetic engineering]] between one protein and another to make [[chimera (protein)|chimera]] proteins. A conservative combination of several domains that occur in different proteins, such as [[protein tyrosine phosphatase]] domain and [[C2 domain]] pair, was called "a superdomain" that may evolve as a single unit.<ref>{{cite journal | vauthors = Haynie DT, Xue B | title = Superdomains in the protein structure hierarchy: The case of PTP-C2 | journal = Protein Science | volume = 24 | issue = 5 | pages = 874–882 | date = May 2015 | pmid = 25694109 | pmc = 4420535 | doi = 10.1002/pro.2664 }}</ref>


===Structural and sequence motifs===
===Structural and sequence motifs===
Line 44: Line 44:


===Protein fold===
===Protein fold===
A protein fold refers to the general protein architecture, like a [[helix bundle]], [[beta barrel|β-barrel]], [[Rossmann fold]] or different "folds" provided in the [[Structural Classification of Proteins database]].<ref name="Govinda rajan">{{Cite journal |vauthors=Govindarajan S, Recabarren R, Goldstein RA |title=Estimating the total number of protein folds. |journal= Proteins |volume=35 |issue=4 |pages=408–414 |date=17 September 1999|url=http://www3.interscience.wiley.com/journal/65000323/abstract |archive-url=https://archive.today/2013.01.05-075413/http://www3.interscience.wiley.com/journal/65000323/abstract |url-status=dead |archive-date=5 January 2013 |doi=10.1002/(SICI)1097-0134(19990601)35:4<408::AID-PROT4>3.0.CO;2-A |pmid=10382668 |hdl=2027.42/34969 |s2cid=7147867 |hdl-access=free }}</ref> A related concept is [[protein topology]].
A protein fold refers to the general protein architecture, like a [[helix bundle]], [[beta barrel|β-barrel]], [[Rossmann fold]] or different "folds" provided in the [[Structural Classification of Proteins database]].<ref name="Govinda rajan">{{cite journal | vauthors = Govindarajan S, Recabarren R, Goldstein RA | title = Estimating the total number of protein folds | journal = Proteins | volume = 35 | issue = 4 | pages = 408–414 | date = June 1999 | pmid = 10382668 | doi = 10.1002/(SICI)1097-0134(19990601)35:4<408::AID-PROT4>3.0.CO;2-A | url = http://www3.interscience.wiley.com/journal/65000323/abstract | url-status = dead | hdl-access = free | hdl = 2027.42/34969 | s2cid = 7147867 | archive-url = https://archive.today/2013.01.05-075413/http://www3.interscience.wiley.com/journal/65000323/abstract | archive-date = 5 January 2013 }}</ref> A related concept is [[protein topology]].


==Protein dynamics and conformational ensembles==
==Protein dynamics and conformational ensembles==
Line 51: Line 51:


Proteins are not static objects, but rather populate ensembles of [[conformational change |conformational states]]. Transitions between these states typically occur on [[Nanoscopic scale|nanoscale]]s, and have been linked to functionally relevant phenomena such as [[Allosteric regulation|allosteric signaling]]<ref name="pmid21570668">{{cite book |vauthors=Bu Z, Callaway DJ |chapter=Proteins MOVE! Protein dynamics and long-range allostery in cell signaling |volume=83 |pages=163–221 |year=2011 |pmid=21570668 |doi=10.1016/B978-0-12-381262-9.00005-7 |chapter-url=http://linkinghub.elsevier.com/retrieve/pii/B978-0-12-381262-9.00005-7 |series=Advances in Protein Chemistry and Structural Biology |isbn=9780123812629|title=Protein Structure and Diseases |publisher=Academic Press }}</ref> and [[enzyme catalysis]].<ref>
Proteins are not static objects, but rather populate ensembles of [[conformational change |conformational states]]. Transitions between these states typically occur on [[Nanoscopic scale|nanoscale]]s, and have been linked to functionally relevant phenomena such as [[Allosteric regulation|allosteric signaling]]<ref name="pmid21570668">{{cite book |vauthors=Bu Z, Callaway DJ |chapter=Proteins MOVE! Protein dynamics and long-range allostery in cell signaling |volume=83 |pages=163–221 |year=2011 |pmid=21570668 |doi=10.1016/B978-0-12-381262-9.00005-7 |chapter-url=http://linkinghub.elsevier.com/retrieve/pii/B978-0-12-381262-9.00005-7 |series=Advances in Protein Chemistry and Structural Biology |isbn=9780123812629|title=Protein Structure and Diseases |publisher=Academic Press }}</ref> and [[enzyme catalysis]].<ref>
{{cite journal | vauthors = Fraser JS, Clarkson MW, Degnan SC, Erion R, Kern D, Alber T | title = Hidden alternative structures of proline isomerase essential for catalysis | journal = Nature | volume = 462 | issue = 7273 | pages = 669–673 | date = Dec 2009 | pmid = 19956261 | doi = 10.1038/nature08615 | bibcode = 2009Natur.462..669F | pmc=2805857}}</ref> [[Protein dynamics]] and [[conformational change]]s allow proteins to function as nanoscale [[biological machine]]s within cells, often in the form of [[Protein complex|multi-protein complexes]].<ref>{{Cite book|title=Biochemistry|last=Donald|first=Voet|date=2011|publisher=John Wiley & Sons|others=Voet, Judith G.|isbn=9780470570951|edition= 4th|location=Hoboken, NJ|oclc=690489261}}</ref> Examples include [[motor proteins]], such as [[myosin]], which is responsible for [[muscle]] contraction, [[kinesin]], which moves cargo inside cells away from the [[Cell nucleus|nucleus]] along [[microtubules]], and [[dynein]], which moves cargo inside cells towards the nucleus and produces the axonemal beating of [[cilia#Motile cilia|motile cilia]] and [[flagella]]. "[I]n effect, the [motile cilium] is a nanomachine composed of perhaps over 600 proteins in molecular complexes, many of which also function independently as nanomachines...[[Flexible linker]]s allow the [[Protein domain#Domains and protein flexibility|mobile protein domains]] connected by them to recruit their binding partners and induce long-range [[allostery]] via [[Protein dynamics#Global flexibility: multiple domains|protein domain dynamics]]. "<ref name="Satir2008">{{cite journal
{{cite journal | vauthors = Fraser JS, Clarkson MW, Degnan SC, Erion R, Kern D, Alber T | title = Hidden alternative structures of proline isomerase essential for catalysis | journal = Nature | volume = 462 | issue = 7273 | pages = 669–673 | date = December 2009 | pmid = 19956261 | pmc = 2805857 | doi = 10.1038/nature08615 | bibcode = 2009Natur.462..669F }}</ref> [[Protein dynamics]] and [[conformational change]]s allow proteins to function as nanoscale [[biological machine]]s within cells, often in the form of [[Protein complex|multi-protein complexes]].<ref>{{Cite book|title=Biochemistry| vauthors = Voet D, Voet JG |date=2011|publisher=John Wiley & Sons |isbn=9780470570951|edition= 4th|location=Hoboken, NJ|oclc=690489261}}</ref> Examples include [[motor proteins]], such as [[myosin]], which is responsible for [[muscle]] contraction, [[kinesin]], which moves cargo inside cells away from the [[Cell nucleus|nucleus]] along [[microtubules]], and [[dynein]], which moves cargo inside cells towards the nucleus and produces the axonemal beating of [[cilia#Motile cilia|motile cilia]] and [[flagella]]. "[I]n effect, the [motile cilium] is a nanomachine composed of perhaps over 600 proteins in molecular complexes, many of which also function independently as nanomachines...[[Flexible linker]]s allow the [[Protein domain#Domains and protein flexibility|mobile protein domains]] connected by them to recruit their binding partners and induce long-range [[allostery]] via [[Protein dynamics#Global flexibility: multiple domains|protein domain dynamics]]. "<ref name="Satir2008">{{cite journal | vauthors = Satir P, Christensen ST | title = Structure and function of mammalian cilia | journal = Histochemistry and Cell Biology | volume = 129 | issue = 6 | pages = 687–693 | date = June 2008 | pmid = 18365235 | pmc = 2386530 | doi = 10.1007/s00418-008-0416-9 | id = 1432-119X }}</ref>
| last = Satir
| first = Peter
|author2=Søren T. Christensen
| title = Structure and function of mammalian cilia
| journal = Histochemistry and Cell Biology
| volume = 129
| issue = 6
| pages = 687–93
| date = 2008-03-26
| doi = 10.1007/s00418-008-0416-9
| id = 1432-119X
| pmid = 18365235
| pmc = 2386530 }}</ref>


[[File:Schematic view of the two main ensemble modeling approaches.jpg|thumb|right|500px|Schematic view of the two main ensemble modeling approaches.<ref name=":2" />]]
[[File:Schematic view of the two main ensemble modeling approaches.jpg|thumb|right|500px|Schematic view of the two main ensemble modeling approaches.<ref name=":2" />]]


Proteins are often thought of as relatively stable [[Protein tertiary structure|tertiary structures]] that experience conformational changes after being affected by interactions with other proteins or as a part of enzymatic activity. However, proteins may have varying degrees of stability, and some of the less stable variants are [[intrinsically disordered proteins]]. These proteins exist and function in a relatively 'disordered' state lacking a stable [[Protein tertiary structure|tertiary structure]]. As a result, they are difficult to describe by a single fixed [[Protein tertiary structure|tertiary structure]]. [[Conformational ensembles]] have been devised as a way to provide a more accurate and 'dynamic' representation of the conformational state of [[intrinsically disordered proteins]].<ref>[https://web.archive.org/web/20180310010556/http://pedb.vib.be/ Protein Ensemble Database]</ref><ref name=":2">{{Cite journal|title = Computational approaches for inferring the functions of intrinsically disordered proteins|journal = Frontiers in Molecular Biosciences|date = 2015-01-01|pmc = 4525029|pmid = 26301226|pages = 45|doi = 10.3389/fmolb.2015.00045|first1 = Mihaly|last1 = Varadi|first2 = Wim|last2 = Vranken|first3 = Mainak|last3 = Guharoy|first4 = Peter|last4 = Tompa|volume=2|doi-access = free}}</ref>
Proteins are often thought of as relatively stable [[Protein tertiary structure|tertiary structures]] that experience conformational changes after being affected by interactions with other proteins or as a part of enzymatic activity. However, proteins may have varying degrees of stability, and some of the less stable variants are [[intrinsically disordered proteins]]. These proteins exist and function in a relatively 'disordered' state lacking a stable [[Protein tertiary structure|tertiary structure]]. As a result, they are difficult to describe by a single fixed [[Protein tertiary structure|tertiary structure]]. [[Conformational ensembles]] have been devised as a way to provide a more accurate and 'dynamic' representation of the conformational state of [[intrinsically disordered proteins]].<ref>[https://web.archive.org/web/20180310010556/http://pedb.vib.be/ Protein Ensemble Database]</ref><ref name=":2">{{cite journal | vauthors = Varadi M, Vranken W, Guharoy M, Tompa P | title = Computational approaches for inferring the functions of intrinsically disordered proteins | journal = Frontiers in Molecular Biosciences | volume = 2 | pages = 45 | date = 2015-01-01 | pmid = 26301226 | pmc = 4525029 | doi = 10.3389/fmolb.2015.00045 | doi-access = free }}</ref>


Protein [[Conformational ensembles|ensemble]] files are a representation of a protein that can be considered to have a flexible structure. Creating these files requires determining which of the various theoretically possible protein conformations actually exist. One approach is to apply computational algorithms to the protein data in order to try to determine the most likely set of conformations for an [[Conformational ensembles|ensemble]] file. There are multiple methods for preparing data for the [https://web.archive.org/web/20180310010556/http://pedb.vib.be/ Protein Ensemble Database] that fall into two general methodologies – pool and molecular dynamics (MD) approaches (diagrammed in the figure). The pool based approach uses the protein’s amino acid sequence to create a massive pool of random conformations. This pool is then subjected to more computational processing that creates a set of theoretical parameters for each conformation based on the structure. Conformational subsets from this pool whose average theoretical parameters closely match known experimental data for this protein are selected. The alternative molecular dynamics approach takes multiple random conformations at a time and subjects all of them to experimental data. Here the experimental data is serving as limitations to be placed on the conformations (e.g. known distances between atoms). Only conformations that manage to remain within the limits set by the experimental data are accepted. This approach often applies large amounts of experimental data to the conformations which is a very computationally demanding task.<ref name=":2" />
Protein [[Conformational ensembles|ensemble]] files are a representation of a protein that can be considered to have a flexible structure. Creating these files requires determining which of the various theoretically possible protein conformations actually exist. One approach is to apply computational algorithms to the protein data in order to try to determine the most likely set of conformations for an [[Conformational ensembles|ensemble]] file. There are multiple methods for preparing data for the [https://web.archive.org/web/20180310010556/http://pedb.vib.be/ Protein Ensemble Database] that fall into two general methodologies – pool and molecular dynamics (MD) approaches (diagrammed in the figure). The pool based approach uses the protein’s amino acid sequence to create a massive pool of random conformations. This pool is then subjected to more computational processing that creates a set of theoretical parameters for each conformation based on the structure. Conformational subsets from this pool whose average theoretical parameters closely match known experimental data for this protein are selected. The alternative molecular dynamics approach takes multiple random conformations at a time and subjects all of them to experimental data. Here the experimental data is serving as limitations to be placed on the conformations (e.g. known distances between atoms). Only conformations that manage to remain within the limits set by the experimental data are accepted. This approach often applies large amounts of experimental data to the conformations which is a very computationally demanding task.<ref name=":2" />


The conformational ensembles were generated for a number of highly dynamic and partially unfolded proteins, such as [[Sic1]]/[[Cell division control protein 4|Cdc4]],<ref>{{Cite journal|title = Structure/function implications in a dynamic complex of the intrinsically disordered Sic1 with the Cdc4 subunit of an SCF ubiquitin ligase|journal = Structure|date = 2010-03-14|issn = 1878-4186|pmc = 2924144|pmid = 20399186|pages = 494–506|volume = 18|issue = 4|doi = 10.1016/j.str.2010.01.020|first1 = Tanja|last1 = Mittag|first2 = Joseph|last2 = Marsh|first3 = Alexander|last3 = Grishaev|first4 = Stephen|last4 = Orlicky|first5 = Hong|last5 = Lin|first6 = Frank|last6 = Sicheri|first7 = Mike|last7 = Tyers|first8 = Julie D.|last8 = Forman-Kay}}</ref> [[KIAA0101|p15 PAF]],<ref>{{Cite journal|title = p15PAF is an intrinsically disordered protein with nonrandom structural preferences at sites of interaction with other proteins|journal = Biophysical Journal|date = 2014-02-18|issn = 1542-0086|pmc = 3944474|pmid = 24559989|pages = 865–874|volume = 106|issue = 4|doi = 10.1016/j.bpj.2013.12.046|first1 = Alfredo|last1 = De Biasio|first2 = Alain|last2 = Ibáñez de Opakua|first3 = Tiago N.|last3 = Cordeiro|first4 = Maider|last4 = Villate|first5 = Nekane|last5 = Merino|first6 = Nathalie|last6 = Sibille|first7 = Moreno|last7 = Lelli|first8 = Tammo|last8 = Diercks|first9 = Pau|last9 = Bernadó|bibcode = 2014BpJ...106..865D}}</ref> [[MAP2K7|MKK7]],<ref>{{Cite journal|title = Structure and dynamics of the MKK7-JNK signaling complex|journal = Proceedings of the National Academy of Sciences of the United States of America|date = 2015-03-17|issn = 1091-6490|pmc = 4371970|pmid = 25737554|pages = 3409–3414|volume = 112|issue = 11|doi = 10.1073/pnas.1419528112|first1 = Jaka|last1 = Kragelj|first2 = Andrés|last2 = Palencia|first3 = Max H.|last3 = Nanao|first4 = Damien|last4 = Maurin|first5 = Guillaume|last5 = Bouvignies|first6 = Martin|last6 = Blackledge|first7 = Malene Ringkjøbing|last7 = Jensen|bibcode = 2015PNAS..112.3409K|doi-access = free}}</ref> [[Beta-synuclein]]<ref>{{Cite journal|title = A relationship between the transient structure in the monomeric state and the aggregation propensities of α-synuclein and β-synuclein|journal = Biochemistry|date = 2014-11-25|issn = 1520-4995|pmc = 4245978|pmid = 25389903|pages = 7170–7183|volume = 53|issue = 46|doi = 10.1021/bi5009326|first1 = Jane R.|last1 = Allison|first2 = Robert C.|last2 = Rivers|first3 = John C.|last3 = Christodoulou|first4 = Michele|last4 = Vendruscolo|first5 = Christopher M.|last5 = Dobson}}</ref> and [[CDKN1B|P27]]<ref>{{Cite journal|title = Disordered p27Kip1 exhibits intrinsic structure resembling the Cdk2/cyclin A-bound conformation|journal = Journal of Molecular Biology|date = 2005-11-11|issn = 0022-2836|pmid = 16214166|pages = 1118–1128|volume = 353|issue = 5|doi = 10.1016/j.jmb.2005.08.074|first1 = Sivashankar G.|last1 = Sivakolundu|first2 = Donald|last2 = Bashford|first3 = Richard W.|last3 = Kriwacki}}</ref>
The conformational ensembles were generated for a number of highly dynamic and partially unfolded proteins, such as [[Sic1]]/[[Cell division control protein 4|Cdc4]],<ref>{{cite journal | vauthors = Mittag T, Marsh J, Grishaev A, Orlicky S, Lin H, Sicheri F, Tyers M, Forman-Kay JD | display-authors = 6 | title = Structure/function implications in a dynamic complex of the intrinsically disordered Sic1 with the Cdc4 subunit of an SCF ubiquitin ligase | journal = Structure | volume = 18 | issue = 4 | pages = 494–506 | date = March 2010 | pmid = 20399186 | pmc = 2924144 | doi = 10.1016/j.str.2010.01.020 }}</ref> [[KIAA0101|p15 PAF]],<ref>{{cite journal | vauthors = De Biasio A, Ibáñez de Opakua A, Cordeiro TN, Villate M, Merino N, Sibille N, Lelli M, Diercks T, Bernadó P, Blanco FJ | display-authors = 6 | title = p15PAF is an intrinsically disordered protein with nonrandom structural preferences at sites of interaction with other proteins | journal = Biophysical Journal | volume = 106 | issue = 4 | pages = 865–874 | date = February 2014 | pmid = 24559989 | pmc = 3944474 | doi = 10.1016/j.bpj.2013.12.046 | bibcode = 2014BpJ...106..865D }}</ref> [[MAP2K7|MKK7]],<ref>{{cite journal | vauthors = Kragelj J, Palencia A, Nanao MH, Maurin D, Bouvignies G, Blackledge M, Jensen MR | title = Structure and dynamics of the MKK7-JNK signaling complex | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 112 | issue = 11 | pages = 3409–3414 | date = March 2015 | pmid = 25737554 | pmc = 4371970 | doi = 10.1073/pnas.1419528112 | doi-access = free | bibcode = 2015PNAS..112.3409K }}</ref> [[Beta-synuclein]]<ref>{{cite journal | vauthors = Allison JR, Rivers RC, Christodoulou JC, Vendruscolo M, Dobson CM | title = A relationship between the transient structure in the monomeric state and the aggregation propensities of α-synuclein and β-synuclein | journal = Biochemistry | volume = 53 | issue = 46 | pages = 7170–7183 | date = November 2014 | pmid = 25389903 | pmc = 4245978 | doi = 10.1021/bi5009326 }}</ref> and [[CDKN1B|P27]]<ref>{{cite journal | vauthors = Sivakolundu SG, Bashford D, Kriwacki RW | title = Disordered p27Kip1 exhibits intrinsic structure resembling the Cdk2/cyclin A-bound conformation | journal = Journal of Molecular Biology | volume = 353 | issue = 5 | pages = 1118–1128 | date = November 2005 | pmid = 16214166 | doi = 10.1016/j.jmb.2005.08.074 }}</ref>


==Protein folding==
==Protein folding==
Line 80: Line 67:
{{Main|Protein folding}}
{{Main|Protein folding}}


As it is translated, polypeptides exit the [[ribosome]] mostly as a [[random coil]] and folds into its [[native state]].<ref>{{Cite journal|date=2011-02-01|title=Folding at the birth of the nascent chain: coordinating translation with co-translational folding|journal=Current Opinion in Structural Biology|language=en|volume=21|issue=1|pages=25–31|doi=10.1016/j.sbi.2010.10.008|pmid=21111607|issn=0959-440X|last1=Zhang|first1=Gong|last2=Ignatova|first2=Zoya}}</ref><ref name="Alberts">{{cite book|title=Molecular Biology of the Cell; Fourth Edition|last=Alberts|first=Bruce|author2=Alexander Johnson|author3=Julian Lewis|author4=Martin Raff|author5=Keith Roberts|author6=Peter Walters|publisher=Garland Science|year=2002|isbn=978-0-8153-3218-3|location=New York and London|chapter=The Shape and Structure of Proteins|author-link=Bruce Alberts|chapter-url=https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Search&db=books&doptcmdl=GenBookHL&term=mboc4%5Bbook%5D+AND+372270%5Buid%5D&rid=mboc4.section.388}}</ref> The final structure of the protein chain is generally assumed to be determined by its amino acid sequence ([[Anfinsen's dogma]]).<ref name="Anfinsen">{{cite journal|author=Anfinsen, C.|author-link=Christian B. Anfinsen|year=1972|title=The formation and stabilization of protein structure|journal=Biochem. J.|volume=128|issue=4|pages=737–49|doi=10.1042/bj1280737|pmc=1173893|pmid=4565129}}</ref>
As it is translated, polypeptides exit the [[ribosome]] mostly as a [[random coil]] and folds into its [[native state]].<ref>{{cite journal | vauthors = Zhang G, Ignatova Z | title = Folding at the birth of the nascent chain: coordinating translation with co-translational folding | journal = Current Opinion in Structural Biology | volume = 21 | issue = 1 | pages = 25–31 | date = February 2011 | pmid = 21111607 | doi = 10.1016/j.sbi.2010.10.008 }}</ref><ref name="Alberts">{{cite book|title=Molecular Biology of the Cell | edition = Fourth | vauthors = Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walters P |publisher=Garland Science|year=2002|isbn=978-0-8153-3218-3|location=New York and London|chapter=The Shape and Structure of Proteins|author-link=Bruce Alberts|chapter-url=https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Search&db=books&doptcmdl=GenBookHL&term=mboc4%5Bbook%5D+AND+372270%5Buid%5D&rid=mboc4.section.388}}</ref> The final structure of the protein chain is generally assumed to be determined by its amino acid sequence ([[Anfinsen's dogma]]).<ref name="Anfinsen">{{cite journal | vauthors = Anfinsen CB | title = The formation and stabilization of protein structure | journal = The Biochemical Journal | volume = 128 | issue = 4 | pages = 737–749 | date = July 1972 | pmid = 4565129 | pmc = 1173893 | doi = 10.1042/bj1280737 | author-link = Christian B. Anfinsen }}</ref>


== Protein stability ==
== Protein stability ==
{{main|Equilibrium unfolding}}
{{main|Equilibrium unfolding}}
Thermodynamic stability of proteins represents the [[Gibbs free energy|free energy difference]] between the folded and [[Denaturation (biochemistry)|unfolded]] protein states. This free energy difference is very sensitive to temperature, hence a change in temperature may result in unfolding or denaturation. [[Denaturation (biochemistry)|Protein denaturation]] may result in loss of function, and loss of native state. The free energy of stabilization of soluble globular proteins typically does not exceed 50 kJ/mol.{{Cn|date=August 2018}} Taking into consideration the large number of hydrogen bonds that take place for the stabilization of secondary structures, and the stabilization of the inner core through hydrophobic interactions, the free energy of stabilization emerges as small difference between large numbers.<ref>{{Cite journal|last1=Jaenicke|first1=R.|last2=Heber|first2=U.|last3=Franks|first3=F.|last4=Chapman|first4=D.|last5=Griffin|first5=Mary C. A.|last6=Hvidt|first6=A.|last7=Cowan|first7=D. A.|date=1990|title=Protein Structure and Function at Low Temperatures [and Discussion]|journal=Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences|volume=326|issue=1237|pages=535–553|jstor=2398703|doi=10.1098/rstb.1990.0030|pmid=1969647|doi-access=free}}</ref>
Thermodynamic stability of proteins represents the [[Gibbs free energy|free energy difference]] between the folded and [[Denaturation (biochemistry)|unfolded]] protein states. This free energy difference is very sensitive to temperature, hence a change in temperature may result in unfolding or denaturation. [[Denaturation (biochemistry)|Protein denaturation]] may result in loss of function, and loss of native state. The free energy of stabilization of soluble globular proteins typically does not exceed 50 kJ/mol.{{Cn|date=August 2018}} Taking into consideration the large number of hydrogen bonds that take place for the stabilization of secondary structures, and the stabilization of the inner core through hydrophobic interactions, the free energy of stabilization emerges as small difference between large numbers.<ref>{{cite journal | vauthors = Jaenicke R | title = Protein structure and function at low temperatures | journal = Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences | volume = 326 | issue = 1237 | pages = 535–553 | date = January 1990 | pmid = 1969647 | doi = 10.1098/rstb.1990.0030 | doi-access = free | jstor = 2398703 }}</ref>


==Protein structure determination==
==Protein structure determination==
[[File:Protein structure examples.png|thumb|left|Examples of protein structures from the [[Protein Data Bank|PDB]] ]]
[[File:Protein structure examples.png|thumb|left|Examples of protein structures from the [[Protein Data Bank|PDB]] ]]
[[File:Rate of Protein Structure Determination-2014.png|thumb|400px|Rate of Protein Structure Determination by Method and Year]]
[[File:Rate of Protein Structure Determination-2014.png|thumb|400px|Rate of Protein Structure Determination by Method and Year]]
Around 90% of the protein structures available in the [[Protein Data Bank]] have been determined by [[X-ray crystallography]].<ref>{{Cite journal|author=Kendrew, J.C.|author2=Bodo, G.|author3=Dintzis, H. M.|author4=Parrish, R. G.|author5=Wyckoff, H.|author6=Phillips, D.C.|journal=Nature|volume=181|issue=4610|pages=662–666|year=1958|title=A Three-Dimensional Model of the Myoglobin Molecule Obtained by X-Ray Analysis|doi=10.1038/181662a0|pmid=13517261|bibcode=1958Natur.181..662K|s2cid=4162786}}</ref> This method allows one to measure the three-dimensional (3-D) density distribution of [[electron]]s in the protein, in the [[crystallized]] state, and thereby [[infer]] the 3-D coordinates of all the [[atom]]s to be determined to a certain resolution. Roughly 7% of the known protein structures have been obtained by [[protein NMR|nuclear magnetic resonance]] (NMR) techniques.<ref>{{Cite web |date=2022-10-01 |title=PDB Statistics |url=https://www.rcsb.org/stats/summary}}</ref> For larger protein complexes, [[cryo-electron microscopy]] can determine protein structures. The resolution is typically lower than that of X-ray crystallography, or NMR, but the maximum resolution is steadily increasing. This technique is still a particularly valuable for very large protein complexes such as [[virus coat protein]]s and [[amyloid]] fibers.
Around 90% of the protein structures available in the [[Protein Data Bank]] have been determined by [[X-ray crystallography]].<ref>{{cite journal | vauthors = Kendrew JC, Bodo G, Dintzis HM, Parrish RG, Wyckoff H, Phillips DC | title = A three-dimensional model of the myoglobin molecule obtained by x-ray analysis | journal = Nature | volume = 181 | issue = 4610 | pages = 662–666 | date = March 1958 | pmid = 13517261 | doi = 10.1038/181662a0 | s2cid = 4162786 | bibcode = 1958Natur.181..662K }}</ref> This method allows one to measure the three-dimensional (3-D) density distribution of [[electron]]s in the protein, in the [[crystallized]] state, and thereby [[infer]] the 3-D coordinates of all the [[atom]]s to be determined to a certain resolution. Roughly 7% of the known protein structures have been obtained by [[protein NMR|nuclear magnetic resonance]] (NMR) techniques.<ref>{{Cite web |date=2022-10-01 |title=PDB Statistics |url=https://www.rcsb.org/stats/summary}}</ref> For larger protein complexes, [[cryo-electron microscopy]] can determine protein structures. The resolution is typically lower than that of X-ray crystallography, or NMR, but the maximum resolution is steadily increasing. This technique is still a particularly valuable for very large protein complexes such as [[virus coat protein]]s and [[amyloid]] fibers.


General secondary structure composition can be determined via [[circular dichroism]]. [[Vibrational spectroscopy]] can also be used to characterize the conformation of peptides, polypeptides, and proteins.<ref name="pmid3541539">{{cite book | vauthors = Krimm S, Bandekar J | title = Advances in Protein Chemistry Volume 38 | chapter = Vibrational spectroscopy and conformation of peptides, polypeptides, and proteins | journal = Adv. Protein Chem. | volume = 38 | pages = 181–364 | date = 1986 | pmid = 3541539 | doi = 10.1016/S0065-3233(08)60528-8|series = Advances in Protein Chemistry|isbn = 9780120342389}}</ref> [[Two-dimensional infrared spectroscopy]] has become a valuable method to investigate the structures of flexible peptides and proteins that cannot be studied with other methods.<ref>{{Cite journal|last=Lessing|first=J.|author2=Roy, S.| author3=Reppert, M.|author4=Baer, M.|author5=Marx, D.|author6=Jansen, T.L.C.|author7=Knoester, J.|author8=Tokmakoff, A. |title=Identifying Residual Structure in Intrinsically Disordered Systems: A 2D IR Spectroscopic Study of the GVGXPGVG Peptide|year=2012|volume=134|issue=11|pages=5032–5035|doi=10.1021/ja2114135|journal=J. Am. Chem. Soc.|pmid=22356513|url=https://www.rug.nl/research/portal/en/publications/identifying-residual-structure-in-intrinsically-disordered-systems(ff19c09b-088a-48f0-afee-2111a9b19252).html|hdl=11370/ff19c09b-088a-48f0-afee-2111a9b19252|hdl-access=free}}<!--https://pure.rug.nl/ws/files/6776580/2012JAmChemSocLessing.pdf--></ref><ref>{{Cite journal|last=Jansen|first=T.L.C.|author2=Knoester, J.|title=Two-dimensional infrared population transfer spectroscopy for enhancing structural markers of proteins|year=2008|volume=94|issue=5|pages=1818–1825|journal=Biophys. J.|doi=10.1529/biophysj.107.118851|pmid=17981904|pmc=2242754|bibcode=2008BpJ....94.1818J}}</ref> A more qualitative picture of protein structure is often obtained by [[proteolysis]], which is also useful to screen for more crystallizable protein samples. Novel implementations of this approach, including [[fast parallel proteolysis (FASTpp)]], can probe the structured fraction and its stability without the need for purification.<ref name="pmid23056252">{{cite journal | vauthors = Minde DP, Maurice MM, Rüdiger SG | title = Determining biophysical protein stability in lysates by a fast proteolysis assay, FASTpp | journal = PLOS ONE | volume = 7 | issue = 10 | pages = e46147 | date = 2012 | pmid = 23056252 | pmc = 3463568 | doi = 10.1371/journal.pone.0046147 | bibcode = 2012PLoSO...746147M | doi-access = free }}</ref> Once a protein's structure has been experimentally determined, further detailed studies can be done computationally, using [[Molecular dynamics|molecular dynamic]] simulations of that structure.<ref name="pmid28637405">{{cite journal | vauthors = Kumari I, Sandhu P, Ahmed M, Akhter Y | title = Molecular Dynamics Simulations, Challenges and Opportunities: A Biologist's Prospective | journal = Curr. Protein Pept. Sci. | volume = 18 | issue = 11 | pages = 1163–1179 | date = August 2017 | pmid = 28637405 | doi = 10.2174/1389203718666170622074741 }}</ref>
General secondary structure composition can be determined via [[circular dichroism]]. [[Vibrational spectroscopy]] can also be used to characterize the conformation of peptides, polypeptides, and proteins.<ref name="pmid3541539">{{cite book | vauthors = Krimm S, Bandekar J | title = Advances in Protein Chemistry Volume 38 | chapter = Vibrational spectroscopy and conformation of peptides, polypeptides, and proteins | journal = Adv. Protein Chem. | volume = 38 | pages = 181–364 | date = 1986 | pmid = 3541539 | doi = 10.1016/S0065-3233(08)60528-8|series = Advances in Protein Chemistry|isbn = 9780120342389}}</ref> [[Two-dimensional infrared spectroscopy]] has become a valuable method to investigate the structures of flexible peptides and proteins that cannot be studied with other methods.<ref>{{cite journal | vauthors = Lessing J, Roy S, Reppert M, Baer M, Marx D, Jansen TL, Knoester J, Tokmakoff A | display-authors = 6 | title = Identifying residual structure in intrinsically disordered systems: a 2D IR spectroscopic study of the GVGXPGVG peptide | journal = Journal of the American Chemical Society | volume = 134 | issue = 11 | pages = 5032–5035 | date = March 2012 | pmid = 22356513 | doi = 10.1021/ja2114135 | hdl-access = free | hdl = 11370/ff19c09b-088a-48f0-afee-2111a9b19252 }}<!--https://pure.rug.nl/ws/files/6776580/2012JAmChemSocLessing.pdf--></ref><ref>{{cite journal | vauthors = Jansen TL, Knoester J | title = Two-dimensional infrared population transfer spectroscopy for enhancing structural markers of proteins | journal = Biophysical Journal | volume = 94 | issue = 5 | pages = 1818–1825 | date = March 2008 | pmid = 17981904 | pmc = 2242754 | doi = 10.1529/biophysj.107.118851 | bibcode = 2008BpJ....94.1818J }}</ref> A more qualitative picture of protein structure is often obtained by [[proteolysis]], which is also useful to screen for more crystallizable protein samples. Novel implementations of this approach, including [[fast parallel proteolysis (FASTpp)]], can probe the structured fraction and its stability without the need for purification.<ref name="pmid23056252">{{cite journal | vauthors = Minde DP, Maurice MM, Rüdiger SG | title = Determining biophysical protein stability in lysates by a fast proteolysis assay, FASTpp | journal = PloS One | volume = 7 | issue = 10 | pages = e46147 | date = 2012 | pmid = 23056252 | pmc = 3463568 | doi = 10.1371/journal.pone.0046147 | doi-access = free | bibcode = 2012PLoSO...746147M }}</ref> Once a protein's structure has been experimentally determined, further detailed studies can be done computationally, using [[Molecular dynamics|molecular dynamic]] simulations of that structure.<ref name="pmid28637405">{{cite journal | vauthors = Kumari I, Sandhu P, Ahmed M, Akhter Y | title = Molecular Dynamics Simulations, Challenges and Opportunities: A Biologist's Prospective | journal = Current Protein & Peptide Science | volume = 18 | issue = 11 | pages = 1163–1179 | date = August 2017 | pmid = 28637405 | doi = 10.2174/1389203718666170622074741 }}</ref>


==Protein structure databases==
==Protein structure databases==
A [[protein structure database]] is a database that is [[data modeling|modeled]] around the various [[#Protein structure determination|experimentally determined]] protein structures. The aim of most protein structure databases is to organize and annotate the protein structures, providing the biological community access to the experimental data in a useful way. Data included in protein structure databases often includes 3D coordinates as well as experimental information, such as unit cell dimensions and angles for [[X-ray crystallography#Biological macromolecular crystallography|x-ray crystallography]] determined structures. Though most instances, in this case either proteins or a specific structure determinations of a protein, also contain sequence information and some databases even provide means for performing sequence based queries, the primary attribute of a structure database is structural information, whereas [[sequence database]]s focus on sequence information, and contain no structural information for the majority of entries. Protein structure databases are critical for many efforts in [[computational biology]] such as [[Drug design#Structure based|structure based drug design]], both in developing the computational methods used and in providing a large experimental dataset used by some methods to provide insights about the function of a protein.<ref>{{Cite journal | last1 = Laskowski | first1 = RA| title = Protein structure databases | journal = Mol Biotechnol| volume = 48| issue = 2| pages = 183–98|pmid = 21225378| year = 2011| doi = 10.1007/s12033-010-9372-4| s2cid = 45184564}}</ref>
A [[protein structure database]] is a database that is [[data modeling|modeled]] around the various [[#Protein structure determination|experimentally determined]] protein structures. The aim of most protein structure databases is to organize and annotate the protein structures, providing the biological community access to the experimental data in a useful way. Data included in protein structure databases often includes 3D coordinates as well as experimental information, such as unit cell dimensions and angles for [[X-ray crystallography#Biological macromolecular crystallography|x-ray crystallography]] determined structures. Though most instances, in this case either proteins or a specific structure determinations of a protein, also contain sequence information and some databases even provide means for performing sequence based queries, the primary attribute of a structure database is structural information, whereas [[sequence database]]s focus on sequence information, and contain no structural information for the majority of entries. Protein structure databases are critical for many efforts in [[computational biology]] such as [[Drug design#Structure based|structure based drug design]], both in developing the computational methods used and in providing a large experimental dataset used by some methods to provide insights about the function of a protein.<ref>{{cite journal | vauthors = Laskowski RA | title = Protein structure databases | journal = Molecular Biotechnology | volume = 48 | issue = 2 | pages = 183–198 | date = June 2011 | pmid = 21225378 | doi = 10.1007/s12033-010-9372-4 | s2cid = 45184564 }}</ref>


==Structural classifications of proteins==
==Structural classifications of proteins==
Protein structures can be grouped based on their structural similarity, [[circuit topology|topological class]] or a common [[evolution]]ary origin. The [[Structural Classification of Proteins database]]<ref name="pmid7723011">{{cite journal | vauthors = Murzin AG, Brenner SE, Hubbard T, Chothia C | title = SCOP: a structural classification of proteins database for the investigation of sequences and structures | journal = Journal of Molecular Biology | volume = 247 | issue = 4 | pages = 536–540 | date = April 1995 | pmid = 7723011 | doi = 10.1016/S0022-2836(05)80134-2 | url = http://scop.mrc-lmb.cam.ac.uk/scop/ref/1995-jmb-scop.pdf | url-status = dead | archive-date = 26 April 2012 | df = dmy-all | archive-url = https://web.archive.org/web/20120426170732/http://scop.mrc-lmb.cam.ac.uk/scop/ref/1995-jmb-scop.pdf | author-link4 = Cyrus Chothia | author-link2 = Steven E. Brenner | author-link3 = Tim Hubbard }}</ref> and [[CATH]] database<ref name="pmid9309224">{{cite journal | vauthors = Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM | title = CATH--a hierarchic classification of protein domain structures | journal = Structure | volume = 5 | issue = 8 | pages = 1093–1108 | date = August 1997 | pmid = 9309224 | doi = 10.1016/S0969-2126(97)00260-8 | doi-access = free | author-link6 = Janet Thornton | author-link1 = Christine Orengo }}</ref> provide two different structural classifications of proteins. When the structural similarity is large the two proteins have possibly diverged from a common ancestor,<ref name="Pascual2009">{{cite journal | vauthors = Pascual-García A, Abia D, Ortiz AR, Bastolla U | title = Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures | journal = PLoS Computational Biology | volume = 5 | issue = 3 | pages = e1000331 | date = March 2009 | pmid = 19325884 | pmc = 2654728 | doi = 10.1371/journal.pcbi.1000331 | bibcode = 2009PLSCB...5E0331P }}</ref> and shared structure between proteins is considered evidence of [[Homology (biology)| homology]]. Structure similarity can then be used to group proteins together into [[protein superfamilies]].<ref>{{cite journal | vauthors = Holm L, Rosenström P | title = Dali server: conservation mapping in 3D | journal = Nucleic Acids Research | volume = 38 | issue = Web Server issue | pages = W545-W549 | date = July 2010 | pmid = 20457744 | pmc = 2896194 | doi = 10.1093/nar/gkq366 }}</ref> If shared structure is significant but the fraction shared is small, the fragment shared may be the consequence of a more dramatic evolutionary event such as [[horizontal gene transfer]], and joining proteins sharing these fragments into protein superfamilies is no longer justified.<ref name="Pascual2009"></ref> Topology of a protein can be used to classify proteins as well. [[Knot theory]] and [[circuit topology]] are two topology frameworks developed for classification of protein folds based on chain crossing and intrachain contacts respectively.
Protein structures can be grouped based on their structural similarity, [[circuit topology|topological class]] or a common [[evolution]]ary origin. The [[Structural Classification of Proteins database]]<ref name="pmid7723011">{{Cite journal
|last1 = Murzin
|first1 = A. G.
|last2 = Brenner
|first2 = S.
|author-link2 = Steven E. Brenner
|last3 = Hubbard
|first3 = T.
|author-link3 = Tim Hubbard
|last4 = Chothia
|first4 = C.
|author-link4 = Cyrus Chothia
|title = SCOP: A structural classification of proteins database for the investigation of sequences and structures
|journal = Journal of Molecular Biology
|volume = 247
|issue = 4
|pages = 536–540
|year = 1995
|doi = 10.1016/S0022-2836(05)80134-2
|pmid = 7723011
|url = http://scop.mrc-lmb.cam.ac.uk/scop/ref/1995-jmb-scop.pdf
|url-status = dead
|archive-url = https://web.archive.org/web/20120426170732/http://scop.mrc-lmb.cam.ac.uk/scop/ref/1995-jmb-scop.pdf
|archive-date = 26 April 2012
|df = dmy-all
}}</ref> and [[CATH]] database<ref name="pmid9309224">{{Cite journal
| last1 = Orengo | first1 = C. A. | author-link1 = Christine Orengo
| last2 = Michie | first2 = A. D.
| last3 = Jones | first3 = S.
| last4 = Jones | first4 = D. T.
| last5 = Swindells | first5 = M. B.
| last6 = Thornton | first6 = J. M. | author-link6 = Janet Thornton
| title = CATH--a hierarchic classification of protein domain structures
| journal = Structure
| volume = 5
| issue = 8
| pages = 1093–1108
| year = 1997
| pmid = 9309224 | doi=10.1016/S0969-2126(97)00260-8
| doi-access = free}}</ref> provide two different structural classifications of proteins. When the structural similarity is large the two proteins have possibly diverged from a common ancestor,<ref name="Pascual2009">{{cite journal|author1=Pascual-García, A. |author2=Abia, D.| author3=Ortiz, A.R. | author4=Bastolla, U. |title=Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures.|journal= PLOS Computational Biology|date=2009|volume=5|issue=3|pages=e1000331|doi=10.1371/journal.pcbi.1000331
|pmid=19325884|pmc=2654728|bibcode=2009PLSCB...5E0331P}}</ref> and shared structure between proteins is considered evidence of [[Homology (biology)| homology]]. Structure similarity can then be used to group proteins together into [[protein superfamilies]].<ref>{{cite journal|last=Holm|first=L|author2=Rosenström, P|title=Dali server: conservation mapping in 3D.|journal=Nucleic Acids Research|date=July 2010|volume=38|issue=Web Server issue|pages=W545–9|pmid=20457744|doi=10.1093/nar/gkq366|pmc=2896194}}</ref> If shared structure is significant but the fraction shared is small, the fragment shared may be the consequence of a more dramatic evolutionary event such as [[horizontal gene transfer]], and joining proteins sharing these fragments into protein superfamilies is no longer justified.<ref name="Pascual2009"></ref> Topology of a protein can be used to classify proteins as well. [[Knot theory]] and [[circuit topology]] are two topology frameworks developed for classification of protein folds based on chain crossing and intrachain contacts respectively.


==Computational prediction of protein structure==
==Computational prediction of protein structure==
{{Main|Protein structure prediction}}
{{Main|Protein structure prediction}}
The generation of a [[protein sequence]] is much easier than the determination of a protein structure. However, the structure of a protein gives much more insight in the function of the protein than its sequence. Therefore, a number of methods for the computational prediction of protein structure from its sequence have been developed.<ref name="zhang2008">{{Cite journal|author=Zhang Y |title=Progress and challenges in protein structure prediction |journal=Curr Opin Struct Biol |volume=18 |issue=3 |pages=342–348 |year=2008 |doi=10.1016/j.sbi.2008.02.004 |pmid=18436442 |pmc=2680823}}</ref> ''Ab initio'' prediction methods use just the sequence of the protein. [[Threading (protein sequence)|Threading]] and [[homology modeling]] methods can build a 3-D model for a protein of unknown structure from experimental structures of evolutionarily-related proteins, called a [[protein family]].
The generation of a [[protein sequence]] is much easier than the determination of a protein structure. However, the structure of a protein gives much more insight in the function of the protein than its sequence. Therefore, a number of methods for the computational prediction of protein structure from its sequence have been developed.<ref name="zhang2008">{{cite journal | vauthors = Zhang Y | title = Progress and challenges in protein structure prediction | journal = Current Opinion in Structural Biology | volume = 18 | issue = 3 | pages = 342–348 | date = June 2008 | pmid = 18436442 | pmc = 2680823 | doi = 10.1016/j.sbi.2008.02.004 }}</ref> ''Ab initio'' prediction methods use just the sequence of the protein. [[Threading (protein sequence)|Threading]] and [[homology modeling]] methods can build a 3-D model for a protein of unknown structure from experimental structures of evolutionarily-related proteins, called a [[protein family]].


==See also==
== See also ==
* [[Biomolecular structure]]
* [[Biomolecular structure]]
* [[Gene structure]]
* [[Gene structure]]
Line 150: Line 97:
* [[Ribbon diagram]] 3D schematic representation of proteins
* [[Ribbon diagram]] 3D schematic representation of proteins


==References==
== References ==
{{Reflist}}
{{Reflist}}


==Further reading==
== Further reading ==
*[http://publications.nigms.nih.gov/psi/timeline_text.html 50 Years of Protein Structure Determination Timeline - HTML Version - National Institute of General Medical Sciences] at [[NIH]]
*[http://publications.nigms.nih.gov/psi/timeline_text.html 50 Years of Protein Structure Determination Timeline - HTML Version - National Institute of General Medical Sciences] at [[NIH]]


==External links==
== External links ==
*{{Commonscatinline|Protein structures}}
*{{Commonscatinline|Protein structures}}



Revision as of 18:25, 24 July 2023

Protein primary structureProtein secondary structureProtein tertiary structureProtein quaternary structure
The image above contains clickable links
The image above contains clickable links
This diagram (which is interactive) of protein structure uses PCNA as an example. (PDB: 1AXC​)

Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers – specifically polypeptides – formed from sequences of amino acids, which are the monomers of the polymer. A single amino acid monomer may also be called a residue, which indicates a repeating unit of a polymer. Proteins form by amino acids undergoing condensation reactions, in which the amino acids lose one water molecule per reaction in order to attach to one another with a peptide bond. By convention, a chain under 30 amino acids is often identified as a peptide, rather than a protein.[1] To be able to perform their biological function, proteins fold into one or more specific spatial conformations driven by a number of non-covalent interactions, such as hydrogen bonding, ionic interactions, Van der Waals forces, and hydrophobic packing. To understand the functions of proteins at a molecular level, it is often necessary to determine their three-dimensional structure. This is the topic of the scientific field of structural biology, which employs techniques such as X-ray crystallography, NMR spectroscopy, cryo-electron microscopy (cryo-EM) and dual polarisation interferometry, to determine the structure of proteins.

Protein structures range in size from tens to several thousand amino acids.[2] By physical size, proteins are classified as nanoparticles, between 1–100 nm. Very large protein complexes can be formed from protein subunits. For example, many thousands of actin molecules assemble into a microfilament.

A protein usually undergoes reversible structural changes in performing its biological function. The alternative structures of the same protein are referred to as different conformations, and transitions between them are called conformational changes.

Levels of protein structure

There are four distinct levels of protein structure.

Four levels of protein structure

Primary structure

The primary structure of a protein refers to the sequence of amino acids in the polypeptide chain. The primary structure is held together by peptide bonds that are made during the process of protein biosynthesis. The two ends of the polypeptide chain are referred to as the carboxyl terminus (C-terminus) and the amino terminus (N-terminus) based on the nature of the free group on each extremity. Counting of residues always starts at the N-terminal end (NH2-group), which is the end where the amino group is not involved in a peptide bond. The primary structure of a protein is determined by the gene corresponding to the protein. A specific sequence of nucleotides in DNA is transcribed into mRNA, which is read by the ribosome in a process called translation. The sequence of amino acids in insulin was discovered by Frederick Sanger, establishing that proteins have defining amino acid sequences.[3][4] The sequence of a protein is unique to that protein, and defines the structure and function of the protein. The sequence of a protein can be determined by methods such as Edman degradation or tandem mass spectrometry. Often, however, it is read directly from the sequence of the gene using the genetic code. It is strictly recommended to use the words "amino acid residues" when discussing proteins because when a peptide bond is formed, a water molecule is lost, and therefore proteins are made up of amino acid residues. Post-translational modifications such as phosphorylations and glycosylations are usually also considered a part of the primary structure, and cannot be read from the gene. For example, insulin is composed of 51 amino acids in 2 chains. One chain has 31 amino acids, and the other has 20 amino acids.

Secondary structure

An α-helix with hydrogen bonds (yellow dots)

Secondary structure refers to highly regular local sub-structures on the actual polypeptide backbone chain. Two main types of secondary structure, the α-helix and the β-strand or β-sheets, were suggested in 1951 by Linus Pauling.[5] These secondary structures are defined by patterns of hydrogen bonds between the main-chain peptide groups. They have a regular geometry, being constrained to specific values of the dihedral angles ψ and φ on the Ramachandran plot. Both the α-helix and the β-sheet represent a way of saturating all the hydrogen bond donors and acceptors in the peptide backbone. Some parts of the protein are ordered but do not form any regular structures. They should not be confused with random coil, an unfolded polypeptide chain lacking any fixed three-dimensional structure. Several sequential secondary structures may form a "supersecondary unit".[6]

Tertiary structure

Tertiary structure refers to the three-dimensional structure created by a single protein molecule (a single polypeptide chain). It may include one or several domains. The α-helices and β-pleated-sheets are folded into a compact globular structure. The folding is driven by the non-specific hydrophobic interactions, the burial of hydrophobic residues from water, but the structure is stable only when the parts of a protein domain are locked into place by specific tertiary interactions, such as salt bridges, hydrogen bonds, and the tight packing of side chains and disulfide bonds. The disulfide bonds are extremely rare in cytosolic proteins, since the cytosol (intracellular fluid) is generally a reducing environment.

Quaternary structure

Quaternary structure is the three-dimensional structure consisting of the aggregation of two or more individual polypeptide chains (subunits) that operate as a single functional unit (multimer). The resulting multimer is stabilized by the same non-covalent interactions and disulfide bonds as in tertiary structure. There are many possible quaternary structure organisations.[7] Complexes of two or more polypeptides (i.e. multiple subunits) are called multimers. Specifically it would be called a dimer if it contains two subunits, a trimer if it contains three subunits, a tetramer if it contains four subunits, and a pentamer if it contains five subunits. The subunits are frequently related to one another by symmetry operations, such as a 2-fold axis in a dimer. Multimers made up of identical subunits are referred to with a prefix of "homo-" and those made up of different subunits are referred to with a prefix of "hetero-", for example, a heterotetramer, such as the two alpha and two beta chains of hemoglobin.

Domains, motifs, and folds in protein structure

Protein domains. The two shown protein structures share a common domain (maroon), the PH domain, which is involved in phosphatidylinositol (3,4,5)-trisphosphate binding

Proteins are frequently described as consisting of several structural units. These units include domains, motifs, and folds. Despite the fact that there are about 100,000 different proteins expressed in eukaryotic systems, there are many fewer different domains, structural motifs and folds.

Structural domain

A structural domain is an element of the protein's overall structure that is self-stabilizing and often folds independently of the rest of the protein chain. Many domains are not unique to the protein products of one gene or one gene family but instead appear in a variety of proteins. Domains often are named and singled out because they figure prominently in the biological function of the protein they belong to; for example, the "calcium-binding domain of calmodulin". Because they are independently stable, domains can be "swapped" by genetic engineering between one protein and another to make chimera proteins. A conservative combination of several domains that occur in different proteins, such as protein tyrosine phosphatase domain and C2 domain pair, was called "a superdomain" that may evolve as a single unit.[8]

Structural and sequence motifs

The structural and sequence motifs refer to short segments of protein three-dimensional structure or amino acid sequence that were found in a large number of different proteins

Supersecondary structure

Tertiary protein structures can have multiple secondary elements on the same polypeptide chain. The supersecondary structure refers to a specific combination of secondary structure elements, such as β-α-β units or a helix-turn-helix motif. Some of them may be also referred to as structural motifs.

Protein fold

A protein fold refers to the general protein architecture, like a helix bundle, β-barrel, Rossmann fold or different "folds" provided in the Structural Classification of Proteins database.[9] A related concept is protein topology.

Protein dynamics and conformational ensembles

Proteins are not static objects, but rather populate ensembles of conformational states. Transitions between these states typically occur on nanoscales, and have been linked to functionally relevant phenomena such as allosteric signaling[10] and enzyme catalysis.[11] Protein dynamics and conformational changes allow proteins to function as nanoscale biological machines within cells, often in the form of multi-protein complexes.[12] Examples include motor proteins, such as myosin, which is responsible for muscle contraction, kinesin, which moves cargo inside cells away from the nucleus along microtubules, and dynein, which moves cargo inside cells towards the nucleus and produces the axonemal beating of motile cilia and flagella. "[I]n effect, the [motile cilium] is a nanomachine composed of perhaps over 600 proteins in molecular complexes, many of which also function independently as nanomachines...Flexible linkers allow the mobile protein domains connected by them to recruit their binding partners and induce long-range allostery via protein domain dynamics. "[13]

Schematic view of the two main ensemble modeling approaches.[14]

Proteins are often thought of as relatively stable tertiary structures that experience conformational changes after being affected by interactions with other proteins or as a part of enzymatic activity. However, proteins may have varying degrees of stability, and some of the less stable variants are intrinsically disordered proteins. These proteins exist and function in a relatively 'disordered' state lacking a stable tertiary structure. As a result, they are difficult to describe by a single fixed tertiary structure. Conformational ensembles have been devised as a way to provide a more accurate and 'dynamic' representation of the conformational state of intrinsically disordered proteins.[15][14]

Protein ensemble files are a representation of a protein that can be considered to have a flexible structure. Creating these files requires determining which of the various theoretically possible protein conformations actually exist. One approach is to apply computational algorithms to the protein data in order to try to determine the most likely set of conformations for an ensemble file. There are multiple methods for preparing data for the Protein Ensemble Database that fall into two general methodologies – pool and molecular dynamics (MD) approaches (diagrammed in the figure). The pool based approach uses the protein’s amino acid sequence to create a massive pool of random conformations. This pool is then subjected to more computational processing that creates a set of theoretical parameters for each conformation based on the structure. Conformational subsets from this pool whose average theoretical parameters closely match known experimental data for this protein are selected. The alternative molecular dynamics approach takes multiple random conformations at a time and subjects all of them to experimental data. Here the experimental data is serving as limitations to be placed on the conformations (e.g. known distances between atoms). Only conformations that manage to remain within the limits set by the experimental data are accepted. This approach often applies large amounts of experimental data to the conformations which is a very computationally demanding task.[14]

The conformational ensembles were generated for a number of highly dynamic and partially unfolded proteins, such as Sic1/Cdc4,[16] p15 PAF,[17] MKK7,[18] Beta-synuclein[19] and P27[20]

Protein folding

As it is translated, polypeptides exit the ribosome mostly as a random coil and folds into its native state.[21][22] The final structure of the protein chain is generally assumed to be determined by its amino acid sequence (Anfinsen's dogma).[23]

Protein stability

Thermodynamic stability of proteins represents the free energy difference between the folded and unfolded protein states. This free energy difference is very sensitive to temperature, hence a change in temperature may result in unfolding or denaturation. Protein denaturation may result in loss of function, and loss of native state. The free energy of stabilization of soluble globular proteins typically does not exceed 50 kJ/mol.[citation needed] Taking into consideration the large number of hydrogen bonds that take place for the stabilization of secondary structures, and the stabilization of the inner core through hydrophobic interactions, the free energy of stabilization emerges as small difference between large numbers.[24]

Protein structure determination

Examples of protein structures from the PDB
Rate of Protein Structure Determination by Method and Year

Around 90% of the protein structures available in the Protein Data Bank have been determined by X-ray crystallography.[25] This method allows one to measure the three-dimensional (3-D) density distribution of electrons in the protein, in the crystallized state, and thereby infer the 3-D coordinates of all the atoms to be determined to a certain resolution. Roughly 7% of the known protein structures have been obtained by nuclear magnetic resonance (NMR) techniques.[26] For larger protein complexes, cryo-electron microscopy can determine protein structures. The resolution is typically lower than that of X-ray crystallography, or NMR, but the maximum resolution is steadily increasing. This technique is still a particularly valuable for very large protein complexes such as virus coat proteins and amyloid fibers.

General secondary structure composition can be determined via circular dichroism. Vibrational spectroscopy can also be used to characterize the conformation of peptides, polypeptides, and proteins.[27] Two-dimensional infrared spectroscopy has become a valuable method to investigate the structures of flexible peptides and proteins that cannot be studied with other methods.[28][29] A more qualitative picture of protein structure is often obtained by proteolysis, which is also useful to screen for more crystallizable protein samples. Novel implementations of this approach, including fast parallel proteolysis (FASTpp), can probe the structured fraction and its stability without the need for purification.[30] Once a protein's structure has been experimentally determined, further detailed studies can be done computationally, using molecular dynamic simulations of that structure.[31]

Protein structure databases

A protein structure database is a database that is modeled around the various experimentally determined protein structures. The aim of most protein structure databases is to organize and annotate the protein structures, providing the biological community access to the experimental data in a useful way. Data included in protein structure databases often includes 3D coordinates as well as experimental information, such as unit cell dimensions and angles for x-ray crystallography determined structures. Though most instances, in this case either proteins or a specific structure determinations of a protein, also contain sequence information and some databases even provide means for performing sequence based queries, the primary attribute of a structure database is structural information, whereas sequence databases focus on sequence information, and contain no structural information for the majority of entries. Protein structure databases are critical for many efforts in computational biology such as structure based drug design, both in developing the computational methods used and in providing a large experimental dataset used by some methods to provide insights about the function of a protein.[32]

Structural classifications of proteins

Protein structures can be grouped based on their structural similarity, topological class or a common evolutionary origin. The Structural Classification of Proteins database[33] and CATH database[34] provide two different structural classifications of proteins. When the structural similarity is large the two proteins have possibly diverged from a common ancestor,[35] and shared structure between proteins is considered evidence of homology. Structure similarity can then be used to group proteins together into protein superfamilies.[36] If shared structure is significant but the fraction shared is small, the fragment shared may be the consequence of a more dramatic evolutionary event such as horizontal gene transfer, and joining proteins sharing these fragments into protein superfamilies is no longer justified.[35] Topology of a protein can be used to classify proteins as well. Knot theory and circuit topology are two topology frameworks developed for classification of protein folds based on chain crossing and intrachain contacts respectively.

Computational prediction of protein structure

The generation of a protein sequence is much easier than the determination of a protein structure. However, the structure of a protein gives much more insight in the function of the protein than its sequence. Therefore, a number of methods for the computational prediction of protein structure from its sequence have been developed.[37] Ab initio prediction methods use just the sequence of the protein. Threading and homology modeling methods can build a 3-D model for a protein of unknown structure from experimental structures of evolutionarily-related proteins, called a protein family.

See also

References

  1. ^ Stoker HS (1 January 2015). Organic and Biological Chemistry. Cengage Learning. p. 371. ISBN 978-1-305-68645-8.
  2. ^ Brocchieri L, Karlin S (10 June 2005). "Protein length in eukaryotic and prokaryotic proteomes". Nucleic Acids Research. 33 (10): 3390–3400. doi:10.1093/nar/gki615. PMC 1150220. PMID 15951512.
  3. ^ Sanger F, Tuppy H (September 1951). "The amino-acid sequence in the phenylalanyl chain of insulin. I. The identification of lower peptides from partial hydrolysates". The Biochemical Journal. 49 (4): 463–481. doi:10.1042/bj0490463. PMC 1197535. PMID 14886310.
  4. ^ Sanger F (May 1959). "Chemistry of insulin; determination of the structure of insulin opens the way to greater understanding of life processes". Science. 129 (3359): 1340–1344. Bibcode:1959Sci...129.1340G. doi:10.1126/science.129.3359.1340. PMID 13658959.
  5. ^ Pauling L, Corey RB, Branson HR (April 1951). "The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain". Proceedings of the National Academy of Sciences of the United States of America. 37 (4): 205–211. Bibcode:1951PNAS...37..205P. doi:10.1073/pnas.37.4.205. PMC 1063337. PMID 14816373.
  6. ^ Chiang YS, Gelfand TI, Kister AE, Gelfand IM (September 2007). "New classification of supersecondary structures of sandwich-like proteins uncovers strict patterns of strand assemblage". Proteins. 68 (4): 915–921. doi:10.1002/prot.21473. PMID 17557333. S2CID 29904865.
  7. ^ Moutevelis E, Woolfson DN (January 2009). "A periodic table of coiled-coil protein structures". Journal of Molecular Biology. 385 (3): 726–732. doi:10.1016/j.jmb.2008.11.028. PMID 19059267.
  8. ^ Haynie DT, Xue B (May 2015). "Superdomains in the protein structure hierarchy: The case of PTP-C2". Protein Science. 24 (5): 874–882. doi:10.1002/pro.2664. PMC 4420535. PMID 25694109.
  9. ^ Govindarajan S, Recabarren R, Goldstein RA (June 1999). "Estimating the total number of protein folds". Proteins. 35 (4): 408–414. doi:10.1002/(SICI)1097-0134(19990601)35:4<408::AID-PROT4>3.0.CO;2-A. hdl:2027.42/34969. PMID 10382668. S2CID 7147867. Archived from the original on 5 January 2013.
  10. ^ Bu Z, Callaway DJ (2011). "Proteins MOVE! Protein dynamics and long-range allostery in cell signaling". Protein Structure and Diseases. Advances in Protein Chemistry and Structural Biology. Vol. 83. Academic Press. pp. 163–221. doi:10.1016/B978-0-12-381262-9.00005-7. ISBN 9780123812629. PMID 21570668.
  11. ^ Fraser JS, Clarkson MW, Degnan SC, Erion R, Kern D, Alber T (December 2009). "Hidden alternative structures of proline isomerase essential for catalysis". Nature. 462 (7273): 669–673. Bibcode:2009Natur.462..669F. doi:10.1038/nature08615. PMC 2805857. PMID 19956261.
  12. ^ Voet D, Voet JG (2011). Biochemistry (4th ed.). Hoboken, NJ: John Wiley & Sons. ISBN 9780470570951. OCLC 690489261.
  13. ^ Satir P, Christensen ST (June 2008). "Structure and function of mammalian cilia". Histochemistry and Cell Biology. 129 (6): 687–693. doi:10.1007/s00418-008-0416-9. PMC 2386530. PMID 18365235. 1432-119X.
  14. ^ a b c Varadi M, Vranken W, Guharoy M, Tompa P (1 January 2015). "Computational approaches for inferring the functions of intrinsically disordered proteins". Frontiers in Molecular Biosciences. 2: 45. doi:10.3389/fmolb.2015.00045. PMC 4525029. PMID 26301226.
  15. ^ Protein Ensemble Database
  16. ^ Mittag T, Marsh J, Grishaev A, Orlicky S, Lin H, Sicheri F, et al. (March 2010). "Structure/function implications in a dynamic complex of the intrinsically disordered Sic1 with the Cdc4 subunit of an SCF ubiquitin ligase". Structure. 18 (4): 494–506. doi:10.1016/j.str.2010.01.020. PMC 2924144. PMID 20399186.
  17. ^ De Biasio A, Ibáñez de Opakua A, Cordeiro TN, Villate M, Merino N, Sibille N, et al. (February 2014). "p15PAF is an intrinsically disordered protein with nonrandom structural preferences at sites of interaction with other proteins". Biophysical Journal. 106 (4): 865–874. Bibcode:2014BpJ...106..865D. doi:10.1016/j.bpj.2013.12.046. PMC 3944474. PMID 24559989.
  18. ^ Kragelj J, Palencia A, Nanao MH, Maurin D, Bouvignies G, Blackledge M, Jensen MR (March 2015). "Structure and dynamics of the MKK7-JNK signaling complex". Proceedings of the National Academy of Sciences of the United States of America. 112 (11): 3409–3414. Bibcode:2015PNAS..112.3409K. doi:10.1073/pnas.1419528112. PMC 4371970. PMID 25737554.
  19. ^ Allison JR, Rivers RC, Christodoulou JC, Vendruscolo M, Dobson CM (November 2014). "A relationship between the transient structure in the monomeric state and the aggregation propensities of α-synuclein and β-synuclein". Biochemistry. 53 (46): 7170–7183. doi:10.1021/bi5009326. PMC 4245978. PMID 25389903.
  20. ^ Sivakolundu SG, Bashford D, Kriwacki RW (November 2005). "Disordered p27Kip1 exhibits intrinsic structure resembling the Cdk2/cyclin A-bound conformation". Journal of Molecular Biology. 353 (5): 1118–1128. doi:10.1016/j.jmb.2005.08.074. PMID 16214166.
  21. ^ Zhang G, Ignatova Z (February 2011). "Folding at the birth of the nascent chain: coordinating translation with co-translational folding". Current Opinion in Structural Biology. 21 (1): 25–31. doi:10.1016/j.sbi.2010.10.008. PMID 21111607.
  22. ^ Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walters P (2002). "The Shape and Structure of Proteins". Molecular Biology of the Cell (Fourth ed.). New York and London: Garland Science. ISBN 978-0-8153-3218-3.
  23. ^ Anfinsen CB (July 1972). "The formation and stabilization of protein structure". The Biochemical Journal. 128 (4): 737–749. doi:10.1042/bj1280737. PMC 1173893. PMID 4565129.
  24. ^ Jaenicke R (January 1990). "Protein structure and function at low temperatures". Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences. 326 (1237): 535–553. doi:10.1098/rstb.1990.0030. JSTOR 2398703. PMID 1969647.
  25. ^ Kendrew JC, Bodo G, Dintzis HM, Parrish RG, Wyckoff H, Phillips DC (March 1958). "A three-dimensional model of the myoglobin molecule obtained by x-ray analysis". Nature. 181 (4610): 662–666. Bibcode:1958Natur.181..662K. doi:10.1038/181662a0. PMID 13517261. S2CID 4162786.
  26. ^ "PDB Statistics". 1 October 2022.
  27. ^ Krimm S, Bandekar J (1986). "Vibrational spectroscopy and conformation of peptides, polypeptides, and proteins". Advances in Protein Chemistry Volume 38. Advances in Protein Chemistry. Vol. 38. pp. 181–364. doi:10.1016/S0065-3233(08)60528-8. ISBN 9780120342389. PMID 3541539. {{cite book}}: |journal= ignored (help)
  28. ^ Lessing J, Roy S, Reppert M, Baer M, Marx D, Jansen TL, et al. (March 2012). "Identifying residual structure in intrinsically disordered systems: a 2D IR spectroscopic study of the GVGXPGVG peptide". Journal of the American Chemical Society. 134 (11): 5032–5035. doi:10.1021/ja2114135. hdl:11370/ff19c09b-088a-48f0-afee-2111a9b19252. PMID 22356513.
  29. ^ Jansen TL, Knoester J (March 2008). "Two-dimensional infrared population transfer spectroscopy for enhancing structural markers of proteins". Biophysical Journal. 94 (5): 1818–1825. Bibcode:2008BpJ....94.1818J. doi:10.1529/biophysj.107.118851. PMC 2242754. PMID 17981904.
  30. ^ Minde DP, Maurice MM, Rüdiger SG (2012). "Determining biophysical protein stability in lysates by a fast proteolysis assay, FASTpp". PloS One. 7 (10): e46147. Bibcode:2012PLoSO...746147M. doi:10.1371/journal.pone.0046147. PMC 3463568. PMID 23056252.
  31. ^ Kumari I, Sandhu P, Ahmed M, Akhter Y (August 2017). "Molecular Dynamics Simulations, Challenges and Opportunities: A Biologist's Prospective". Current Protein & Peptide Science. 18 (11): 1163–1179. doi:10.2174/1389203718666170622074741. PMID 28637405.
  32. ^ Laskowski RA (June 2011). "Protein structure databases". Molecular Biotechnology. 48 (2): 183–198. doi:10.1007/s12033-010-9372-4. PMID 21225378. S2CID 45184564.
  33. ^ Murzin AG, Brenner SE, Hubbard T, Chothia C (April 1995). "SCOP: a structural classification of proteins database for the investigation of sequences and structures" (PDF). Journal of Molecular Biology. 247 (4): 536–540. doi:10.1016/S0022-2836(05)80134-2. PMID 7723011. Archived from the original (PDF) on 26 April 2012.
  34. ^ Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM (August 1997). "CATH--a hierarchic classification of protein domain structures". Structure. 5 (8): 1093–1108. doi:10.1016/S0969-2126(97)00260-8. PMID 9309224.
  35. ^ a b Pascual-García A, Abia D, Ortiz AR, Bastolla U (March 2009). "Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures". PLoS Computational Biology. 5 (3): e1000331. Bibcode:2009PLSCB...5E0331P. doi:10.1371/journal.pcbi.1000331. PMC 2654728. PMID 19325884.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  36. ^ Holm L, Rosenström P (July 2010). "Dali server: conservation mapping in 3D". Nucleic Acids Research. 38 (Web Server issue): W545–W549. doi:10.1093/nar/gkq366. PMC 2896194. PMID 20457744.
  37. ^ Zhang Y (June 2008). "Progress and challenges in protein structure prediction". Current Opinion in Structural Biology. 18 (3): 342–348. doi:10.1016/j.sbi.2008.02.004. PMC 2680823. PMID 18436442.

Further reading