BIOCHEMISTRY - DR. JAKUBOWSKI
Last Update: 3/9/16
Learning Goals/Objectives for Chapter 2G: After class and this reading, students will be able to:
|
You will study a protein, Myelin Regulatory Factor (MYRF), which
may be a transcription factor. One way to learn more about the features and
likely function of the MYRF protein is to explore the structure of the 1,139
amino acid sequence in silico.
You will analyze the protein sequence using a variety of web-based
proteomics programs. For most of these programs you will need to input
the amino acid sequence in FASTA format. Here is the
FASTA amino acid
sequence (in single letter amino acid code).
Use these programs to gain information about this protein. If
you have any problem with any of the programs (lots of error messages), skip
that particular program.
a.
Sequence
Manipulation Suite: Determine the molecular weight of the
protein.
b.
Eukaryotic Linear Motif
: Linear motifs are short, evolutionarily
plastic components of regulatory proteins and provide
low-affinity interaction interfaces. These compact modules play central
roles in mediating every aspect of the regulatory functionality of the cell.
They are particularly prominent in mediating cell signaling, controlling
protein turnover and directing protein localization. Given their importance,
our understanding of motifs is surprisingly limited, largely as a result of
the difficulty of discovery, both experimentally and computationally. The
Eukaryotic Linear Motif (ELM) provides the biological community with a
comprehensive database of known experimentally validated motifs, and an
exploratory tool to discover putative linear motifs in user-submitted
protein sequences.
c.
PSORT II:
programs for prediction of eukaryotic sequence subcellular localization as
well as other datasets and resources relevant to cellular localization
prediction. After running it, examine the link shown as PSORT features and
traditional PSORTII prediction.
You might get an error message saying the
protein does not begin with an N (Met). Met is the first amino
acid encoded from a gene sequence in eukaryotes (using the codon AUG).
It is usually removed after or during protein synthesis. Don’t’ worry
about it. Either way, the output shows you the number of homologous
proteins found and where they are located (cyto, nuc, secreted, etc). Go to
the Details link and the protein are listed. The ones on top are most
homologous to the MYRF.
d. NucPred:
analyses a eukaryotic protein sequence and predicts if the protein spends at
least some time in the nucleus or spends no time in the nucleus
f.
CCTOP -
Prediction of transmembrane helices and topology of proteins. Select
the advanced tab. This program might not work. In the
output under each amino acid you will see I (inside), O (outside), H for
transmembrane helical region, and i of indeterminate.
g.
Das-TMfilter:
might have to remove nonsequence part of fasta file
h.
TopPred 1.1 – Topoloyg predictor for membrane proteins at
the Pasteur Institute. You will have to input your email address.
http://bioweb.pasteur.fr/seqanal/interfaces/toppred.html
i.
PFAM
– multiple analyses of Protein FAMilies. View a sequence. Look
at the domain organization of a protein sequence. Input MRF_Mouse.
Click on the various domains discovered based on sequence homology.
j.
Prosite:
input your sequence in the fast scan region.
Prosite can
determine the likely function of the protein MYRF based on presence of
"patterns, motifs, or signatures " in the protein sequences which are
characteristic of a specific biological function, such as ligand binding,
catalysis, in vivo chemical modification. We will only use it to probe
for post-translational modification sites. Select Scan a
sequence against PROSITE patterns and profiles, and see possible sites
for in vivo chemical modification of the protein. In Prosite Tools uncheck
exclude patterns of high probability of occurrence.
k.
HHPRED will give you homology detection and structure
prediction, returning domain information and alignment with other proteins
of known function. Select the input link (FASTA format) to input your
sequence.
l.
NCBI Standard Protein BLAST:
m. Use CATH
(Protein Structure Classification - Class, Architecture, Topology,
homology Superfamilies) to determine its domain structure and the
superfamily it resides in. Select Search and type in 1XWW in the
ID/Key Word box. Select return. Determine its class,
architecture, topology and homologous Superfamily classifications.
After search, select the BLAST tab, then select CATH Code OR click CATH Code
Superfamily (whichever works)Go to
n.
UniProt and input
the mouse MYRF sequence (accession number Q3UR85)for a trove of information
which you have probably just discovered.
Navigation
Return to Chapter 2G: Predicting Protein Properties from Sequences
Return to Biochemistry Online Table of Contents
Archived version of full Chapter 2G: Predicting Protein Property from Sequences