The Population Coverage Tool can be found at http://tools.immuneepitope.org/main/html/analysis_tools.html.
T cells recognize a complex between a specific MHC type and a particular pathogen-derived epitope and thus a given epitope will elicit a response only in individuals that express an MHC molecule capable of binding that particular epitope. MHC molecules are extremely polymorphic (over a thousand different variants are known in humans). Therefore, selecting multiple peptides with different MHC binding specificities will afford increased coverage of the patient population targeted as vaccine recipients. The issue of population coverage in relation to MHC polymorphism is further complicated by the fact that different MHC types are expressed at dramatically different frequencies in different ethnicities. Thus, without careful consideration, a vaccine with ethnically biased population coverage could result. To address this issue, the actual/predicted binding capacity of potential epitopes to as many different MHC molecules possible (and when available, also restriction data of T cell responses recognizing the epitope) can be used to project the population coverage in different ethnicities of different vaccine candidates or epitope sets. Accordingly, epitope-based vaccines or diagnostics can be designed to maximize population coverage, while minimizing complexity (that is, the number of different epitopes included in the diagnostic or vaccine), and also minimizing the variability of coverage obtained or projected in different ethnic groups.
An important consideration in the process of epitope selection is that the patient population coverage afforded by a given set is not simply corresponding to the sum of the coverage of its individual components. Thus, to calculate the coverage afforded by a given mixture of epitopes, a more comprehensive approach and a suitable algorithm has been developed for this specific purpose (Bui et al. BMC Bioinformatics 2006). This method calculates the fraction of individuals predicted to respond to a given epitope set on the basis of HLA genotypic frequencies, assuming non-linkage disequilibrium between HLA loci, and on the basis of MHC binding and/or T cell restriction data. The algorithm is briefly explained here. First, genotypic frequencies of various MHC are tabulated. Each time a peptide binds to a given MHC, a “hit” is recorded for that MHC. The process is repeated for all peptides. Then the hits for MHC are tallied. Next, the frequency of each possible diploid MHC combination (phenotype) is calculated. For n MHC types, this corresponds to an n x n tabulation of the frequency at which each specific pair of MHCs will be found in the population from which the MHC frequencies are derived. A similar table is generated to contain the number of hits per each of the MHC combinations by adding the number of hits associated with each of the two alleles of MHC in the combination (a simple exception is the case of homozygous combinations, where the number of hits is simply the number of hits of the given MHC). From these two tables, a frequency distribution is assembled, tabulating the genotypic frequency of all MHC combinations associated with a certain number of hits. The result of the analysis is displayed as a frequency distribution histogram and a cumulative frequency plot.
We have derived HLA allele genotypic frequencies from the dbMHC database (http://www.ncbi.nlm.nih.gov/mhc/) and stored them in a database on the IEDB tool server. At present, dbMHC provides allele frequencies for 78 populations and 11 different geographical areas. It is envisioned that the compiled data will be updated regularly as further HLA frequency data become available. Furthermore, customized frequency data can be utilized in the calculation, should studies of specific and particular patient populations be of interest to a given user. Multiple population coverages can be simultaneously calculated and an average population coverage is generated. Since MHC class I and II restricted epitopes elicit immune responses from two different T cell populations (CTL and Th cells, respectively), the program provides three different coverage calculation modes – (1) class I separate, (2) class II separate, and (3) class I and class II combined.