IEDB Epitope Clustering
Clicking the ‘Cluster’ button will group all of the linear, peptidic epitopes in the query result according to their sequence identity. By default, 80% identity is used but this may be changed by selecting a different cutoff in the dropdown. Each epitope in the result set will be assigned an EScore (Evidence Score) that is a proxy for how often it appears in the database. Epitopes are ranked by their EScore and the highest ranking epitope is used as the seed around which the first cluster is formed. All epitopes that are within X% identity (80% by default) of the seed epitope will be included in this first cluster. Moving down the list of epitopes ranked by Escore, the next epitope that is not already included in a cluster is used as seed for a new cluster. This process is repeated until all epitopes are included in a cluster. The CScore (Cluster Score) is calculated as the sum of EScores for all of the epitopes in the cluster. By default, clusters will be sorted by this score. Please see below for screenshots and a more detailed descriptions of terms.
Percent Identity Calculation
All linear, peptidic epitopes in the IEDB are compared against one another for their maximum sequence identity. There are three possible scenarios:
- The aligned sequences are the same length. In this case, the percent idenity is calculated over the length of the sequence.
- One sequence is shorter than the other and aligns to a region completely within the larger sequence. In this case, the percent identity is calculated over the length of the shorter sequence.
- The best alignment between two epitopes occurs between the end of one sequence and the start of another. In this case, the alignment of the sequences must overlap by at least 5 residues. The percent identity is calculated over the length of this overlap.
- Rank peptides by 'evidence' score (EScore):
- 100 x the number of references in which it is reported
- 10 x the number of assays in which it yielded a positive response
- 1 x the number of assays in which it was tested
- The first peptide will be the seed around which to form the first cluster.
- All peptides that are within X% identity of the seed peptide are placed into this cluster.
- Moving down the list, to the next unclustered epitope, find the next seed.
- Repeat steps 3-4 until all epitopes are clustered.
Note that all epitopes except for seed epitopes can be members of more than one cluster.
Clustering of epitopes by sequence is possible for any epitope result set. On the epitope results page, you can click on the 'Cluster' button (1) to perform the clustering. By default, the percent identity cutoff is set to 80%. However, this can be changed by selecting a different value from the dropdown (2). The results are normally returned within a few seconds. However, if you are clustering a large (>5,000 epitopes) result set it may take longer.
Cluster Results (collapsed)
This screenshot shows the clustering results on an example query.
- Cluster - The alignment consensus sequence of the cluster.
- CScore (Cluster Score) - The sum of all EScores within a cluster. This is an approximation of the coverage of this cluster by references and assays in the IEDB.
- Count - The number of epitopes in the cluster.
- Epitope - In the collapsed view, this displays the seed epitope sequence only. In expanded view, the seed epitope is the first one listed, followed by the alignment of each of the member epitopes to the seed.
- EScore (Evidence Score) - An indirect measure of the frequency of occurrence of the epitope in the IEDB. Defined as 100 x the number of references + 10 x the number of positive assays + 1 times the number of assays.
- Source Antigen and Source Organism - These are the same as on the epitope results page.
- Expand/Collapse buttons - Clicking on these buttons will result in the expanded display of the cluster which will show the alignment of all member epitopes to the seed epitope (see below).
- Cluster controls - The results can be reclustered at a different percent identity cutoff using these controls.
Cluster Results (expanded)
Upon clicking on the expand/collapse control (1), the alignment of each epitope within the cluster to the seed epitope is displayed (2).
Clustering by sequence identity is only possible for linear, peptidic epitopes that are at least 5 residues in length. Discontinuous and non-peptidic epitopes will not appear in the result set.