The Molecule Finder is used to facilitate the selection of source antigens, immunogens, and epitopes. There are two versions of the Molecule Finder – one specifically for non-peptidic molecules found in the Epitope box in the results page, and another that includes peptidic and non-peptidic branches, found in the Antigen box in the results page. The former contains the structures curated by the Chemical Entities of Biological Interest (ChEBI) database. An example of the non-peptidic finder is shown in Figure 1.
Figure 1. The non-peptidic version of the Molecule Finder. This finder is accessed in the Epitope filter on the search results page.
With regard to the protein or peptidic branch of the Molecule Finder, individual GenPept proteins utilized by IEDB data are assigned to parent proteins from reference proteomes by sequence homology. These GenPept proteins are not displayed in the Molecule Finder, instead their reference proteome parents are shown in order to provide a tree which is simpler to navigate. These reference proteomes are graded by a star system described below that reflects the quality and completeness of each. An example of the star system and the protein tree can be seen in Figure 2.
Figure 2. An example of the protein branch of the Molecule Finder
★★★ For some well-studied species UniProt provides reference proteomes that contain a full set of all proteins expressed by the species. For some bacterial species having inconsistent protein expression, additional proteins have been added to the reference proteome to create metaproteomes. These reference proteomes or metaproteomes are designated by three stars.
★★ For other species that have been completely sequenced, UniProt provides complete proteomes. In addition, for some species expressing allergens, formal nomenclature designated by the International Union of Immunological Societies (IUIS) exists to describe these allergens. Complete proteomes that are not considered reference proteomes, or ones that contain formal IUIS allergen nomenclature for a subset of proteins, are designated by two stars.
★ For some species, a proteome does not currently exist in UniProt, but GenBank provides a set of proteins representative of the species. These GenBank proteomes are designated by a single star.
☆ For species that have no proteome in UniProt or GenBank, and no IUIS nomenclature, UniProt may still contain some records that can be used as parents. This case is designated with an unfilled star.
No Star. Species having no proteome in either UniProt or GenBank are designated by no stars.
Within each species’ proteome, individual, “parent” proteins serve to group multiple distinct GenPept sequences. These GenPept entries are the “children” for each proteome protein in the Molecule Tree. This allows users to search IEDB data by selecting the parent protein from the reference proteome, rather than having to select each individual GenPept entry. The “parent” proteins within each proteome also use stars to denote the quality of information provided by each.
★★ UniProt reviewed proteins or proteins having official IUIS allergen nomenclature have two stars.
★ UniProt unreviewed proteins or proteins from GenBank have a single star.
☆ Nodes of the protein branch of the molecule tree containing GenPept and IEDB internal protein accessions having no homology to any protein within a reference proteome are designated with an unfilled star.
Organizational nodes, utilized by the Molecule tree to clarify the relationship between groups of similar proteins have no stars. An example of these nodes is “Immunoglobulin” used to group all immunoglobulin proteins from a single species.
As one can see in Figure 2, a user can search the molecule tree by entering text, including synonyms, in the Name field in the upper lefthand corner of the finder. The user can also specify the source organism of the molecule of interest using the autocomplete field or the organism finder. For example, this can facilitate the specification of a hemagglutinin is a particular strain of Influenza.