Biochemistry Online: An Approach Based on Chemical Logic

CHAPTER 2 - PROTEIN STRUCTURE

G: PREDICTING PROTEIN PROPERTIES FROM SEQUENCES

BIOCHEMISTRY - DR. JAKUBOWSKI

Last Update: 3/9/16

Learning Goals/Objectives for Chapter 2G: After class and this reading, students will be able to:

find web based proteomics protein to analyze protein sequences and structures
describe the basis for methods used to predict the secondary structure and hydrophobic structures of proteins
analyze secondary structure and hydropathy plots from web-based proteomics programs.
describe differences between integral and peripheral membranes proteins, and how each could be purified.
explain how hydropathy and secondary structure plots can be used to predict membrane spanning sequences of proteins
describe in general the theoretical and empirically based methods to predict protein tertiary structure from a primary sequence
describe possible early intermediates in protein folding as determined by theoretical methods

G7. Proteomics Problem Set 2

You will study a protein, Myelin Regulatory Factor (MYRF), which may be a transcription factor. One way to learn more about the features and likely function of the MYRF protein is to explore the structure of the 1,139 amino acid sequence in silico.

You will analyze the protein sequence using a variety of web-based proteomics programs. For most of these programs you will need to input the amino acid sequence in FASTA format. Here is the FASTA amino acid sequence (in single letter amino acid code).

Use these programs to gain information about this protein. If you have any problem with any of the programs (lots of error messages), skip that particular program.

a. Sequence Manipulation Suite: Determine the molecular weight of the protein.

b. Eukaryotic Linear Motif : Linear motifs are short, evolutionarily plastic components of regulatory proteins and provide low-affinity interaction interfaces. These compact modules play central roles in mediating every aspect of the regulatory functionality of the cell. They are particularly prominent in mediating cell signaling, controlling protein turnover and directing protein localization. Given their importance, our understanding of motifs is surprisingly limited, largely as a result of the difficulty of discovery, both experimentally and computationally. The Eukaryotic Linear Motif (ELM) provides the biological community with a comprehensive database of known experimentally validated motifs, and an exploratory tool to discover putative linear motifs in user-submitted protein sequences.

c. PSORT II: programs for prediction of eukaryotic sequence subcellular localization as well as other datasets and resources relevant to cellular localization prediction. After running it, examine the link shown as PSORT features and traditional PSORTII prediction.
You might get an error message saying the protein does not begin with an N (Met). Met is the first amino acid encoded from a gene sequence in eukaryotes (using the codon AUG). It is usually removed after or during protein synthesis. Don’t’ worry about it. Either way, the output shows you the number of homologous proteins found and where they are located (cyto, nuc, secreted, etc). Go to the Details link and the protein are listed. The ones on top are most homologous to the MYRF.

d. NucPred: analyses a eukaryotic protein sequence and predicts if the protein spends at least some time in the nucleus or spends no time in the nucleus

e. TMPRED: The TMpred program makes a prediction of membrane-spanning regions and their orientation. The algorithm is based on the statistical analysis of TMbase, a database of naturally occurring transmembrane proteins. The prediction is made using a combination of several weight-matrices for scoring

f. CCTOP - Prediction of transmembrane helices and topology of proteins. Select the advanced tab. This program might not work. In the output under each amino acid you will see I (inside), O (outside), H for transmembrane helical region, and i of indeterminate.

g. Das-TMfilter: might have to remove nonsequence part of fasta file

h. TopPred 1.1 – Topoloyg predictor for membrane proteins at the Pasteur Institute. You will have to input your email address. http://bioweb.pasteur.fr/seqanal/interfaces/toppred.html

i. PFAM – multiple analyses of Protein FAMilies. View a sequence. Look at the domain organization of a protein sequence. Input MRF_Mouse. Click on the various domains discovered based on sequence homology.

j. Prosite: input your sequence in the fast scan region. Prosite can determine the likely function of the protein MYRF based on presence of "patterns, motifs, or signatures " in the protein sequences which are characteristic of a specific biological function, such as ligand binding, catalysis, in vivo chemical modification. We will only use it to probe for post-translational modification sites. Select Scan a sequence against PROSITE patterns and profiles, and see possible sites for in vivo chemical modification of the protein. In Prosite Tools uncheck exclude patterns of high probability of occurrence.

k. HHPRED will give you homology detection and structure prediction, returning domain information and alignment with other proteins of known function. Select the input link (FASTA format) to input your sequence.

l. NCBI Standard Protein BLAST:

m. Use CATH (Protein Structure Classification - Class, Architecture, Topology, homology Superfamilies) to determine its domain structure and the superfamily it resides in. Select Search and type in 1XWW in the ID/Key Word box. Select return. Determine its class, architecture, topology and homologous Superfamily classifications. After search, select the BLAST tab, then select CATH Code OR click CATH Code Superfamily (whichever works)Go to

n. UniProt and input the mouse MYRF sequence (accession number Q3UR85)for a trove of information which you have probably just discovered. Do the in silico analysis support the fact that the protein is a transcription factor?

back Navigation

Return to Chapter 2G: Predicting Protein Properties from Sequences

Return to Biochemistry Online Table of Contents

Archived version of full Chapter 2G: Predicting Protein Property from Sequences

Biochemistry Online by Henry Jakubowski is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.