Ward Fleri
posted this on August 07, 2012 04:40 pm
Finders are available to help facilitate selections and control vocabulary usage, thus improving result outputs. At times the potential list of selections can be quite extensive, and the finders help users make selections from large lists. Multiple selections can be made when utilizing finders during a query.
The Molecule Finder is used to facilitate the selection of source antigens, immunogens, and epitopes. Records in the Source Organism Finder that is contained within the Molecule Finder come from GenPept, ChEBI, UniProt, and IEDB curators.
The Molecule Finder designed to include two parallel trees, one for non-peptidics and the other for protein molecules. The first contains the structures curated by the Chemical Entities of Biological Interest (ChEBI) database. An example is shown in Figure 1.
|
Figure 1. Example of the non-peptidic branch of the Molecule Finder tree showing the branches before and after amoxicillin. |
The development team determined that the most logical way to group the proteins was by organism. In order to accomplish this, the NCBI species was determined for each of the proteins in the database. For viruses and bacteria, this involved traversing the NCBI taxonomy from the sub-species (strain) level up to the species level. For each species, a set of reference proteins was selected from the NCBI protein database based upon the availability of a complete genome for the species. All proteins for each species were BLASTed against the reference protein set to determine their homologs. These data were used to build the protein tree in a way that mirrors a pruned version of the NCBI taxonomy. The result is a coherent tree that is divided along major taxonomic categories and is quickly traversed with proteins grouped logically below each species. The user can perform a free text search for Molecule Name and can specify the source species with the Organism Finder. Figure 2 shows the results for all Influenza A haemagglutinin (HA) proteins. The user can click on Select to populate the Current Selection box with their desired molecule, or they can click Highlight in Tree to see where it appears in the Protein tree, as shown in Figure 3. The user can thus select all Influenza A haemagglutinin (HA) proteins by selecting one node of the tree rather than individually clicking on the 100+ different HA proteins in the database.
|
Figure 2. Searching for haemagglutinin molecules for Influenza A virus |
|
Figure 3. Example of the protein tree as found in the Source Antigen Molecule Finder on the IEDB home page |