Nucleic Acids Research, 2003, Vol. 31, No. 1 474-477
© 2003 Oxford University Press
MMDB: Entrez's 3D-structure database
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
*To whom correspondence should be addressed. Tel: +1 3014357792; Fax: +1 3014809241; Email: bryant{at}ncbi.nlm.nih.gov
Received September 30, 2002; Revised and Accepted October 9, 2002
ABSTRACT
Three-dimensional structures are now known within most protein families and it is likely, when searching a sequence database, that one will identify a homolog of known structure. The goal of Entrez's 3D-structure database is to make structure information and the functional annotation it can provide easily accessible to molecular biologists. To this end, Entrez's search engine provides several powerful features: (i) links between databases, for example between a protein's sequence and structure; (ii) pre-computed sequence and structure neighbors; and (iii) structure and sequence/structure alignment visualization. Here, we focus on a new feature of Entrez's Molecular Modeling Database (MMDB): Graphical summaries of the biological annotation available for each 3D structure, based on the results of automated comparative analysis. MMDB is available at: http://www.ncbi.nlm.nih.gov/Entrez/structure.html.
CONTENTS
Access
Molecular Modeling Database (MMDB) is Entrez's Structure database (1). Querying by terms, for example, one may identify structures of interest based on a protein name. Links between databases provide other search mechanisms. A query of Entrez's MEDLINE® database, for example, can identify articles referring to a particular protein name. Links from this set of articles to Structure may identify structures not found by direct query, since MEDLINE abstracts contains additional descriptive terms. At the time of writing, MMDB serves about 50 000 queries per day.
Data sources
Experimental 3D structure data are retrieved from the Protein Data Bank (2). Agreement of atomic coordinate and sequence data for each structure is checked and sequences are automatically modified, if necessary, to achieve exact agreement with coordinates. Data are mapped into a computer-friendly format encoded in ASN.1. This validation and encoding supports interoperable sequence, structure and alignment displays. MMDB currently contains about 20 000 structure entries, corresponding to about 40 000 chains and 70 000 3D domains.
Links, neighbours, and visualization
Sequences derived from MMDB are entered into Entrez's protein or nucleic acid sequence database, preserving a link to the corresponding structure. Links to MEDLINE are generated by citation matching (1). Links to Entrez's organism taxonomy database are validated manually (3). Sequence neighbours are identified by BLAST (4), and links to the Conserved Domain Database (CDD) by the reverse PSI-BLAST algorithm (5). Structure neighbours are identified by VAST (6). Entrez's integrated viewer, Cn3D (7), provides molecular-graphics visualization.
ANNOTATION
Structure summaries
Entrez's Structure summary provides a concise description of the contents of an MMDB entry and available annotation. Figure 1 presents an example, Hck Kinase, 1QCF (8). Links to MEDLINE and Taxon are provided together with descriptive text and a View control to launch molecular-graphics visualization. The remainder of the display presents a graphical summary of macromolecular components. Each polypeptide (or polynucleotide) is described by a sequence ruler that indicates chain lengths and the locations of protein domains. This graphical display links to annotation pertaining to individual chains and protein domains.
|
MMDB employs two distinct but related definitions of protein domain. 3D domains are identified automatically as compact units within a polypeptide chain. As shown in Figure 1, colouring of 3D domains in the molecular graphics display matches that of the boxes indicating their locations on the sequence ruler. 3D domains are the units for which automated structure neighbour calculations are performed, and the box for each 3D domain (and complete chain) links to a display of its structure neighbours. A link to Entrez's text-listing of 3D domains is useful for advanced queries combining structural similarity with other attributes (3).
Entrez's CDD defines protein domains as recurrent evolutionary modules. In Figure 1, for example, a CDD oval indicates that the region corresponding to the second 3D domain contains a member of the SH2 family. The SH2 oval links to a detailed sequence/structure alignment, as predefined in CDD (5). Correspondence between 3D domains and conserved domains is not exact. The tyrosine kinase domain (TyrKc) defined in CDD, for example, corresponds to two 3D domains, each representing a compact lobe in the structure.
STRUCTURE NEIGHBOURS
Structure neighbours are a rich source of biological annotation. Figure 2 shows an example, the structure neighbours of the SH2 domain of 1QCF. The structure of loop regions contributing to the intra-molecular phosphotyrosine binding site is preserved in 1JYR, a complex of Grb2 SH2 domain with a phosphotyrosine-containing peptide (9), and in 1FBV, a complex of c-Cbl with a phosphotyrosine-containing peptide (10). One may infer that proteins preserving this site are likely to bind phosphotyrosine. Consistent with this inference, structure 1G99, an Archaeal acetate kinase (11), does not preserve this site. If this protein shares a common ancestor with SH2 domains, it presumably belongs to a lineage that diverged prior to evolution of phophotyrosine binding. While superpositions based on 3D domains are normally adequate for structure-function analyses of this kind, Cn3D's alignment editing tools may be used to modify alignments and superpositions when necessary.
|
On average, there are over 600 structure neighbours for each 3D domain in MMDB. To help identify neighbours that provide useful annotation, Entrez's VAST Summary provides a series of controls for selecting and sorting structure neighbours. As illustrated in Figure 2, the alignment footprint of each neighbour indicates the region on the 3D domain serving as query that can be well superposed onto that neighbour. This display identifies structure neighbours similar to one another, where visualization of multiple-structure superpositions is informative. Other controls sort structure neighbours by measures of similarity and select subsets that include only one representative of sequence-similar subgroups. VAST-Search, which identifies neighbours of user-submitted structures, provides the same analysis tools.
FUTURE DIRECTIONS
Links to protein classifications like CDD are a valuable source of annotation, since descriptions and functional-site definitions are the result of expert curation. CDD alignments also identify the conserved core and in future we plan to use this information in sorting structure neighbours (12). Automated identification of sequence and structure neighbours provides the raw material for curated resources, however, and allows Entrez users to discover new relationships not yet described there. We plan to further improve tools for identification of informative sequence and structure neighbours.
ACKNOWLEDGEMENTS
We thank the NIH Intramural Research Program for support. Questions should be addressed to: info{at}ncbi.nlm.nih.gov.
REFERENCES
- Wheeler,D.L., Church,D.M., Lash,A.E., Leipe,D.D., Madden,T.L., Pontius,J.U., Schuler,G.D., Schriml,L.M., Tatusova,T.A. and Wagner. (2003) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res., 31, 2833.
[Abstract/Free Full Text] - Westbrook,J., Feng,Z., Jain,S., Bhat,T.N., Thanki,N., Ravichandran,V., Gilliland,G.L., Bluhm,W., Weissig,H., Greer,D.S., Bourne,P.E. and Berman,H.M. (2003) The Protein Data Bank: unifying the archive. Nucleic Acids Res., 31, 489491.
[Abstract/Free Full Text] - Wang,Y., Anderson,J.B., Chen,J., Geer,L.Y., He,S., Hurwitz,D.I., Liebert,C.A., Madej,T., Marchler,G.H., Marchler-Bauer,A., Panchenko,A.R., Shoemaker,B.A., Song,J.S., Thiessen,P.A., Yamashita,R.A. and Bryant,S.H. (2002) MMDB: Entrez's 3D-structure database. Nucleic Acids Res., 30, 249252.
[Abstract/Free Full Text] - Altschul,S.F., Madden,T.L., Schaäffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 33893402.
[Abstract/Free Full Text] - Marchler-Bauer,A., Anderson,J., Fedorova,N., DeWeese-Scott,C., Geer,L.Y., He,S., Hurwitz,D.I., Jackson,J.D., Jacobs,A., Lanczycki,C., Liebert,C., Liu,C., Madej,T., Marchler,G.A., Mazumder,R., Nikolskaya,A., Panchenko,A.R., Shoemaker,B.A., Song,J., Rao,R.B., Thiessen,P.A., Vasudevan,S., Wang,Y., Yamashita,R., Yin,J. and Bryant,S.H. (2003) CDD: A curated Entrez database of conserved domain alignments. Nucleic Acids Res., 31, 383387.
[Abstract/Free Full Text] - Gibrat,J.F., Madej,T. and Bryant,S.H. (1996) Surprising similarities in structure comparison. Curr. Opin. Struct. Biol., 6, 377385.[CrossRef][ISI][Medline]
- Wang,Y., Geer,L.Y., Chappey,C., Kans,J.A. and Bryant,S.H. (2000) Cn3D: sequence and structure views for Entrez. Trends Biochem. Sci., 25, 300302.[CrossRef][ISI][Medline]
- Schindler,T., Sicheri,F., Pico,A., Gazit,A., Levitzki,A. and Kuriyan,J. (1999) Crystal structure of Hck in complex with a Src family-selective tyrosine kinase inhibitor. Mol. Cell, 3, 639648.[CrossRef][ISI][Medline]
- Nioche,P., Liu,W.Q., Broutin,I., Charbonnier,F., Latreille,M.T., Vidal,M., Roques,B., Garbay,C. and Ducruix,A. (2002) Crystal structures of the SH2 domain of Grb2: highlight on the binding of a new high-affinity inhibitor. J. Mol. Biol., 315, 11671177.[CrossRef][Medline]
- Zheng,N., Wang,P., Jeffrey,P.D. and Pavletich,N.P. (2000) Structure of a c-Cbl-UbcH7 complex: RING domain function in ubiquitin-protein ligases. Cell, 102, 533539.[CrossRef][ISI][Medline]
- Buss,K.A., Cooper,D.R., Ingram-Smith,C., Ferry,J.G., Sanders,D.A. and Hasson,M.S. (2001) Urkinase: structure of acetate kinase, a member of the ASKHA superfamily of phosphotransferases. J. Bacteriol., 183, 680686.
[Abstract/Free Full Text] - Matsuo,Y. and Bryant,S.H. (1999) Identification of homologous core structures. Proteins, 35, 7079.[CrossRef][ISI][Medline]
This article has been cited by other articles:
![]() |
R. A. Laskowski PDBsum new things Nucleic Acids Res., November 7, 2008; (2008) gkn860v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Z. de Pina, H. Vazquez-Meza, J. P. Pardo, J. L. Rendon, R. Villalobos-Molina, H. Riveros-Rosas, and E. Pina Signaling the Signal, Cyclic AMP-dependent Protein Kinase Inhibition by Insulin-formed H2O2 and Reactivation by Thioredoxin J. Biol. Chem., May 2, 2008; 283(18): 12373 - 12386. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Luk, Z. Malam, and J. C. Marshall Pre-B cell colony-enhancing factor (PBEF)/visfatin: a novel mediator of innate immunity J. Leukoc. Biol., April 1, 2008; 83(4): 804 - 816. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Tawara, K. Hasegawa, Y. Sugiura, K. Harada, T. Miura, S. Hayashi, T. Tahara, M. Ishikawa, H. Yoshida, K. Kubo, et al. Complement Activation Plays a Key Role in Antibody-Induced Infusion Toxicity in Monkeys and Rats J. Immunol., February 15, 2008; 180(4): 2294 - 2298. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. C. Mandal, S. Gayen, A. Basu, K. S. Ghosh, S. Dasgupta, M. K. Maiti, and S. K. Sen Prediction-based protein engineering of domain I of Cry2A entomocidal toxin of Bacillus thuringiensis for the enhancement of toxicity against lepidopteran insects Protein Eng. Des. Sel., December 1, 2007; 20(12): 599 - 606. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. L. Schwartz, C. Cao, O. Pylypenko, A. Rak, and A. Wandinger-Ness Rab GTPases at a glance J. Cell Sci., November 15, 2007; 120(22): 3905 - 3910. [Full Text] [PDF] |
||||
![]() |
M. Krenz, S. Sadayappan, H. E. Osinska, J. A. Henry, S. Beck, D. M. Warshaw, and J. Robbins Distribution and Structure-Function Relationship of Myosin Heavy Chain Isoforms in the Adult Mouse Heart J. Biol. Chem., August 17, 2007; 282(33): 24057 - 24064. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. N. Finn Vertebrate Yolk Complexes and the Functional Implications of Phosvitins and Other Subdomains in Vitellogenins Biol Reprod, June 1, 2007; 76(6): 926 - 935. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Fernandez, B. Clotet, and M. A. Martinez Fitness Landscape of Human Immunodeficiency Virus Type 1 Protease Quasispecies J. Virol., March 1, 2007; 81(5): 2485 - 2496. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. L. Wheeler, T. Barrett, D. A. Benson, S. H. Bryant, K. Canese, V. Chetvernin, D. M. Church, M. DiCuccio, R. Edgar, S. Federhen, et al. Database resources of the National Center for Biotechnology Information Nucleic Acids Res., January 12, 2007; 35(suppl_1): D5 - D12. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. M. Leslin, A. Abyzov, and V. A. Ilyin TOPOFIT-DB, a database of protein structural alignments based on the TOPOFIT method Nucleic Acids Res., January 12, 2007; 35(suppl_1): D317 - D321. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. Ren, K. Chen, and I. T. Paulsen TransportDB: a comprehensive database resource for cytoplasmic membrane transport systems and outer membrane channels Nucleic Acids Res., January 12, 2007; 35(suppl_1): D274 - D279. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Wang, K. J. Addess, J. Chen, L. Y. Geer, J. He, S. He, S. Lu, T. Madej, A. Marchler-Bauer, P. A. Thiessen, et al. MMDB: annotating protein sequences with Entrez's 3D-structure database Nucleic Acids Res., January 12, 2007; 35(suppl_1): D298 - D300. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Oria-Hernandez, H. Riveros-Rosas, and L. Ramirez-Silva Dichotomic Phylogenetic Tree of the Pyruvate Kinase Family: K+-DEPENDENT AND -INDEPENDENT ENZYMES J. Biol. Chem., October 13, 2006; 281(41): 30717 - 30724. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Okun, S. Malarchuk, E. Dubrovskaya, A. Khvat, S. Tkachenko, V. Kysil, D. Kravchenko, and A. Ivachtchenko Screening for Caspase-3 Inhibitors: Effect of a Reducing Agent on Identified Hit Chemotypes J Biomol Screen, September 1, 2006; 11(6): 694 - 703. [Abstract] [PDF] |
||||
![]() |
S. Sacquin-Mora and R. Lavery Investigating the Local Flexibility of Functional Residues in Hemoproteins Biophys. J., April 15, 2006; 90(8): 2706 - 2717. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. L. Anderson, B. P. Delisle, B. D. Anson, J. A. Kilby, M. L. Will, D. J. Tester, Q. Gong, Z. Zhou, M. J. Ackerman, and C. T. January Most LQT2 Mutations Reduce Kv11.1 (hERG) Current by a Class 2 (Trafficking-Deficient) Mechanism Circulation, January 24, 2006; 113(3): 365 - 373. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Chakrabarti, C. J. Lanczycki, A. R. Panchenko, T. M. Przytycka, P. A. Thiessen, and S. H. Bryant Refining multiple sequence alignments with conserved core regions. Nucleic Acids Res., January 1, 2006; 34(9): 2598 - 2606. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. B. Sawyer Heart Failure Research Continues to Reveal the Flaws in Nature's Unintelligent Design Circulation, November 8, 2005; 112(19): 2891 - 2893. [Full Text] [PDF] |
||||
![]() |
H. Kato, D. B. Goto, R. A. Martienssen, T. Urano, K. Furukawa, and Y. Murakami RNA Polymerase II Is Required for RNAi-Dependent Heterochromatin Assembly Science, July 15, 2005; 309(5733): 467 - 469. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Hanzawa, T. Money, and D. Bradley A single amino acid converts a repressor to an activator of flowering PNAS, May 24, 2005; 102(21): 7748 - 7753. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Andrei, J. Balzarini, P. Fiten, E. De Clercq, G. Opdenakker, and R. Snoeck Characterization of Herpes Simplex Virus Type 1 Thymidine Kinase Mutants Selected under a Single Round of High-Dose Brivudin J. Virol., May 1, 2005; 79(9): 5863 - 5869. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. G. Kann, P. A. Thiessen, A. R. Panchenko, A. A. Schaffer, S. F. Altschul, and S. H. Bryant A structure-based method for protein sequence alignment Bioinformatics, April 15, 2005; 21(8): 1451 - 1456. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. R. McLemore and B. Aouizerat Introducing the MUC16 Gene: Implications for Prevention and Early Detection in Epithelial Ovarian Cancer Biol Res Nurs, April 1, 2005; 6(4): 262 - 267. [Abstract] [PDF] |
||||
![]() |
Y. Han, J. A. Englert, R. Yang, R. L. Delude, and M. P. Fink Ethyl Pyruvate Inhibits Nuclear Factor-{kappa}B-Dependent Signaling by Directly Targeting p65 J. Pharmacol. Exp. Ther., March 1, 2005; 312(3): 1097 - 1105. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Paiardini, F. Bossa, and S. Pascarella Evolutionarily conserved regions and hydrophobic contacts at the superfamily level: The case of the fold-type I, pyridoxal-5'-phosphate-dependent enzymes Protein Sci., November 1, 2004; 13(11): 2992 - 3005. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. K. Lishko, N. P. Podolnikova, V. P. Yakubenko, S. Yakovlev, L. Medved, S. P. Yadav, and T. P. Ugarova Multiple Binding Sites in Fibrinogen for Integrin {alpha}M{beta}2 (Mac-1) J. Biol. Chem., October 22, 2004; 279(43): 44897 - 44906. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. D. Thompson, V. Prigent, and O. Poch LEON: multiple aLignment Evaluation Of Neighbours Nucleic Acids Res., February 24, 2004; 32(4): 1298 - 1307. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and D. L. Wheeler GenBank: update Nucleic Acids Res., January 1, 2004; 32(90001): D23 - 26. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. L. Wheeler, D. M. Church, R. Edgar, S. Federhen, W. Helmberg, T. L. Madden, J. U. Pontius, G. D. Schuler, L. M. Schriml, E. Sequeira, et al. Database resources of the National Center for Biotechnology Information: update Nucleic Acids Res., January 1, 2004; 32(90001): D35 - 40. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. A. Selvam and R. Sasidharan DomIns: a web resource for domain insertions in known protein structures Nucleic Acids Res., January 1, 2004; 32(90001): D193 - 195. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. L. Wheeler, D. M. Church, S. Federhen, A. E. Lash, T. L. Madden, J. U. Pontius, G. D. Schuler, L. M. Schriml, E. Sequeira, T. A. Tatusova, et al. Database resources of the National Center for Biotechnology Nucleic Acids Res., January 1, 2003; 31(1): 28 - 33. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Marchler-Bauer, J. B. Anderson, C. DeWeese-Scott, N. D. Fedorova, L. Y. Geer, S. He, D. I. Hurwitz, J. D. Jackson, A. R. Jacobs, C. J. Lanczycki, et al. CDD: a curated Entrez database of conserved domain alignments Nucleic Acids Res., January 1, 2003; 31(1): 383 - 387. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


















