BIOCHEMISTRY - DR. JAKUBOWSKI
Last Update: 3/9/16
Learning Goals/Objectives for Chapter 2G: After class and this reading, students will be able to:
|
With the solving of the human genome, intensive effort has been devoted to analysis of the human genome to determine the number and transcriptional regulation of the encoded genes. Much has been learned from comparative genomics, as genomes from mice, rats, chimpanzees, and a variety of prokaryotes are compared in an effort to help understand the nature of genes and their transcriptional regulation. The vast amount of genomic data that has to be "mined" has required the development of computational and computer programs to enable the analysis. Two relatively new fields have subsequently arisen: bioinformatics and computational biology. (In a personal note, the words computational biology seem somewhat restrictive since the field of computational chemistry, which has a longer history, has significant overlap with "computational biology". I prefer computational biochemistry). These fields have significant overlap (as do physical chemistry/chemical physics and biochemistry/molecular biology/chemical biology), so I defer to others to define them.
The NIH Biomedical Information Science and Technology Initiative Consortium: "This consortium has agreed on the following definitions of bioinformatics and computational biology, recognizing that no definition could completely eliminate overlap with other activities or preclude variations in interpretation by different individuals and organizations.
Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.
Computational Biology: The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems."
This web book has been developed as a first semester biochemistry text and choices have been made to limit the scope of the material to exclude content covered in detail in a molecular biology/genetics class. Hence, this text will not discuss in significant detail the genome and transcriptome, and mechanisms of replication, transcription, or translation. However, with its emphasis on protein structure and function, proteomics, the characterization of structure and function of all proteins within a cell, is a logical candidate for inclusion.
In the last several years, computational biology/chemistry and web-based programs have become available for the systematic analysis of individual proteins, and for the comparative analysis of many proteins, based on either their DNA or amino acid sequence. Clearly the ultimate goal in the description of a protein would be to determine, from the amino acid or nucleotide sequence, the three dimensional structure of a protein and its biological function, including all its binding partners.
Here is a list of proteome web resources and tutorials
Voluminous databases of biomolecule sequence and structural data, as well as analysis software packages, are available at a variety of web sites, including:
The NCBI has an extensive array of available tools (free), including:
A summary of three important sites:
• NCBI-Protein:
The Protein database is a collection of sequences from several sources,
including translations from annotated coding regions in GenBank, RefSeq
and TPA, as well as records from SwissProt, PIR, PRF, and PDB. Protein
sequences are the fundamental determinants of biological structure and
function
• Uniprot: The
UniProt Knowledgebase (UniProtKB) is the central hub for the collection
of functional information on proteins, with accurate, consistent and
rich annotation. In addition to capturing the core data mandatory for
each UniProtKB entry (mainly, the amino acid sequence, protein name or
description, taxonomic data and citation information), as much
annotation information as possible is added.
•
Gene Card: GeneCards is a
searchable, integrative database that provides comprehensive,
user-friendly information on all annotated and predicted human genes. It
automatically integrates gene-centric data from ~125 web sources,
including genomic, transcriptomic, proteomic, genetic, clinical and
functional information
The table below (directly taken from Wikipedia) shows some of the incredible information available the proteome and genome of each human chromosome.
Table: Human proteome and genome from
Wikipedia
(Data source:
Ensembl genome browser release 68, July 2012)
Chromsome | Length (mm) | BP | Variations | Confirmed Proteins | Putative Proteins | Pseudogenes | miRNA | rRNA | snRNA | snoRNA | misc ncRNA | Links |
1 | 85 | 249,250,621 | 4,401,091 | 2,012 | 31 | 1,130 | 134 | 66 | 221 | 145 | 106 | EBI |
2 | 83 | 243,199,373 | 4,607,702 | 1,203 | 50 | 948 | 115 | 40 | 161 | 117 | 93 | EBI |
3 | 67 | 198,022,430 | 3,894,345 | 1,040 | 25 | 719 | 99 | 29 | 138 | 87 | 77 | EBI |
4 | 65 | 191,154,276 | 3,673,892 | 718 | 39 | 698 | 92 | 24 | 120 | 56 | 71 | EBI |
5 | 62 | 180,915,260 | 3,436,667 | 849 | 24 | 676 | 83 | 25 | 106 | 61 | 68 | EBI |
6 | 58 | 171,115,067 | 3,360,890 | 1,002 | 39 | 731 | 81 | 26 | 111 | 73 | 67 | EBI |
7 | 54 | 159,138,663 | 3,045,992 | 866 | 34 | 803 | 90 | 24 | 90 | 76 | 70 | EBI |
8 | 50 | 146,364,022 | 2,890,692 | 659 | 39 | 568 | 80 | 28 | 86 | 52 | 42 | EBI |
9 | 48 | 141,213,431 | 2,581,827 | 785 | 15 | 714 | 69 | 19 | 66 | 51 | 55 | EBI |
10 | 46 | 135,534,747 | 2,609,802 | 745 | 18 | 500 | 64 | 32 | 87 | 56 | 56 | EBI |
11 | 46 | 135,006,516 | 2,607,254 | 1,258 | 48 | 775 | 63 | 24 | 74 | 76 | 53 | EBI |
12 | 45 | 133,851,895 | 2,482,194 | 1,003 | 47 | 582 | 72 | 27 | 106 | 62 | 69 | EBI |
13 | 39 | 115,169,878 | 1,814,242 | 318 | 8 | 323 | 42 | 16 | 45 | 34 | 36 | EBI |
14 | 36 | 107,349,540 | 1,712,799 | 601 | 50 | 472 | 92 | 10 | 65 | 97 | 46 | EBI |
15 | 35 | 102,531,392 | 1,577,346 | 562 | 43 | 473 | 78 | 13 | 63 | 136 | 39 | EBI |
16 | 31 | 90,354,753 | 1,747,136 | 805 | 65 | 429 | 52 | 32 | 53 | 58 | 34 | EBI |
17 | 28 | 81,195,210 | 1,491,841 | 1,158 | 44 | 300 | 61 | 15 | 80 | 71 | 46 | EBI |
18 | 27 | 78,077,248 | 1,448,602 | 268 | 20 | 59 | 32 | 13 | 51 | 36 | 25 | EBI |
19 | 20 | 59,128,983 | 1,171,356 | 1,399 | 26 | 181 | 110 | 13 | 29 | 31 | 15 | EBI |
20 | 21 | 63,025,520 | 1,206,753 | 533 | 13 | 213 | 57 | 15 | 46 | 37 | 34 | EBI |
21 | 16 | 48,129,895 | 787,784 | 225 | 8 | 150 | 16 | 5 | 21 | 19 | 8 | EBI |
22 | 17 | 51,304,566 | 745,778 | 431 | 21 | 308 | 31 | 5 | 23 | 23 | 23 | EBI |
X | 53 | 155,270,560 | 2,174,952 | 815 | 23 | 780 | 128 | 22 | 85 | 64 | 52 | EBI |
Y | 20 | 59,373,566 | 286,812 | 45 | 8 | 327 | 15 | 7 | 17 | 3 | 2 | EBI |
mtDNA | 0.0054 | 16,569 | 929 | 13 | 0 | 0 | 0 | 2 | 0 | 0 | 22 | EBI |
This chapter will describe programs that allow predictions of secondary and tertiary structures of proteins. Specific exercises using web-based bioinformatics programs can be found at the end.
Navigation
Return to Chapter 2G: Predicting Protein Properties from Sequences
Return to Biochemistry Online Table of Contents
Archived version of full Chapter 2G: Predicting Protein Properties from Sequences