Introduction to the IEDB Query API

Welcome to the new IEDB Query API (IQ-API)! We have now made it possible for you to programmatically query the IEDB using a multitude of endpoints, enabling users to complete most queries available from the IEDB home page and work with the data directly in their preferred environment. We hope this help article provides additional context on the new IQ-API, however it is important to note that this is a beta version and we will continue to improve the features and usability. Similarly, the help material will continue to be refined as additional insight is gained from our users. Be sure to contact us via email at help@iedb.org to provide your feedback.

What is the IQ-API?

The IQ-API is built upon a PostgREST platform that allows for transparent access to the PostGres tables on the backend. Each table can be queried through individual endpoints that are described in this interactive Swagger documentation.

What endpoints are available?

Core search endpoints

While there are many endpoints provided, we expect the majority of users will want to search against one or more of the following tables, which correspond to the tabbed search results on the IEDB:

epitope_search
antigen_search
tcell_search (assays)
bcell_search (assays)
mhc_search (assays)
tcr_search (receptors)
bcr_search (receptors)
reference_search

Export endpoints

In addition to the search endpoints, we also provide relevant endpoints that match up exactly with the structure and naming conventions to the custom exports on the IEDB website. These are match up to the search endpoints with matching table prefixes with the exception to antigen since it would match almost exactly to the existing antigen_search table:

epitope_export
tcell_export (assays)
bcell_export (assays)
mhc_export (assays)
tcr_export (assays)
bcr_export (assays)
reference_export

To most efficiently incorporate these into your searches, we recommend first searching in the relevant search table and then using resource embedding to grab the needed data from it's associated export table.

Supporting endpoints

Additional supporting endpoints are available that map identifiers between the various tables. These endpoints have names like ‘TABLEX_to_TABLEY’. For instance, the ‘bcell_to_reference’ table maps records between the ‘bcell_search’ and ‘reference_search’ records to link related information. Each of these tables has exactly two columns and maps the unique identifiers (fields with a suffix of ‘_id’) between the tables.

An additional endpoint, ‘curie_map’, is available that links CURIE prefixes (e.g., PMID) to their full IRIs (e.g., https://www.ncbi.nlm.nih.gov/pubmed/?term=). Further details on CURIEs and IRIs can be found below.

Finally, the ‘api_metrics’ endpoint simply provides record counts and build dates for each of the core endpoints.

How can I query the IQ-API?

As IQ-API is based upon PostgREST, queries must be performed using its rich and expressive query syntax, described here in detail. Detailed API walkthroughs are available in our IQ-API use case repository as Jupyter and RMarkdown notebooks, so please have a look there for more.

The most basic example of querying for the first 10 epitopes is provided here, using the ‘curl’ command.

curl "https://query-api.iedb.org/epitope_search?limit=10" | jq
[
  {
    "structure_id": 7355,
    "structure_iri": "IEDB_EPITOPE:7355",
    "structure_descriptions": [
      "CYDLSCNQTVCQ"
    ],
    "structure_starting_positions": [
      136
    ],
...

Only the first part of the response is shown above, as the full response includes many fields. Note the ‘pipe’ to ‘jq’. This is unnecessary but was used in this case to format the data nicely for display.

Note that, by default, results are returned in JSON format. If TSV format is preferred, an additional header needs to be provided in the GET query. The same query above would become:

curl "https://query-api.iedb.org/epitope_search?limit=10" -H  "accept: text/csv"

Output is not shown as there are too many fields to display. However, we can limit the output to a subset of fields with the ‘select’ parameter. For example, if we only want the ‘structure_id’ and ‘linear_sequence’ of the first 10 epitopes and we want it returned in TSV format, the query becomes:

curl "https://query-api.iedb.org/epitope_search?limit=10&select=structure_id,linear_sequence" -H  "accept: text/csv" 
structure_id,linear_sequence
7355,CYDLSCNQTVCQ
7356,CYEDEATSVIPP
7357,CYEIKCKEPVECSGEPVLVK
7358,CYENDNPGL
7359,CYESLSEEY
7360,CYFDCSKSPPGA
7361,CYFEPQIRIL
7362,CYFILIFNI
7363,CYFILIFNII
7364,CYFMVFLQT

More examples will be added to this help article in the future.

What are IDs, IRIs, and CURIEs?

Several types of identifiers are used throughout the database to track unique records. First, there are internal integer record identifiers denoted with the suffix ‘_id’, e.g., ‘reference_id’. These are generally in the first field of each table. As they are internal to the IEDB, they cannot be linked directly to other resources. Many of the tables in the database also have fields that end in ‘_iri’, e.g., ‘reference_iri’. These are identifiers that resolve uniquely and unambiguously to records both within and outside of the IEDB. The Internationalized Resource Identifier (IRI) specification includes Uniform Resource Locators (URLs), which we use as globally unique identifiers, e.g., https://www.iedb.org/reference/1002786. A shortened version of an IRI, called a CURIE, can be constructed by replacing a portion of the IRI with a common prefix. The above IRI can be represented in CURIE format as ‘IEDB_REFERENCE:1002786’. By querying against the ‘curie_map’ endpoint, it is possible to find prefixes for converting between the two representations, e.g.: https://query-api.iedb.org/curie_map?limit=3

Troubleshooting and FAQs

Common issues and idiosyncrasies

IRIs for Antigens

Users may notice that for the ‘antigen_id’ field in the 'antigen_search' table, an IRI is used, rather than an integer ID. This is because we currently do not have numeric IDs for antigens. We are working to improve how we handle IRIs and CURIEs in the IEDB, and this may change in the future.

Curated vs Parent Terms

Users will see that there are some terms called "curated", such as curated_source_antigens, while other similar terms use the phrase "parent", such as parent_source_antigen_names. "Curated" refers to the precise source protein isoform that matches exactly what an author referred to as the source of a peptide epitope in a specific publication. The curated_source_antigen will 100% BLAST match to the epitope sequence. "Parent" refers to the reference proteome protein that is representative of all protein isoforms that epitopes having the same sequence may have ever been assigned to and associated with many different publications. The parent term is used to group all isoforms and will not 100% BLAST match to the epitope sequence. Similarly, the source_organism_name (shown nested under curated_source_antigens) reflects the precise source organism strain that matches exactly what an author referred to as the source of an epitope in a specific publication. While the parent_source_antigen_source_org_name is the species level organism name that groups all strains that might ever have been associated with that same epitope sequence across all publications. This help desk article goes into further detail: https://help.iedb.org/hc/en-us/articles/114094147251.

Results Page Limit & Default Page Size

By default, the IQ-API has a maximum page size of 10,000 records. In practice, this means that queries that result in more than 10,000 results will be divided into pages and only the first 10,000 records will be returned by the initial query.

NOTE: If a query requires paging, it is critical to also provide the 'order' parameter to determine how the rows are sorted. If it is not provided, rows will be returned in a random order and pages will be inconsistent between queries.

The API will always return a count of the records matching the query, as well as the number of pages of results. This information is embedded in the ‘content-range’ response header, e.g.:

curl -I "https://query-api.iedb.org/antigen_search"                                                                                             
HTTP/2 200 
server: nginx/1.19.2
date: Fri, 11 Jun 2021 18:19:28 GMT
content-type: application/json; charset=utf-8
vary: Accept-Encoding
content-range: 0-9999/*
content-location: /antigen_search
strict-transport-security: max-age=15724800; includeSubDomains

Above we can see that the server returned the first 10,000 records (indexed as 0-9999). The trailing ‘/*’ indicates that there are more records matching the query but the total number has not been calculated. To get an exact count of matching records, you must provide the header 'Prefer: count=exact' in your GET request.

curl -I "https://query-api.iedb.org/antigen_search" -H 'Prefer: count=exact'
HTTP/2 206 
server: nginx/1.19.2
date: Fri, 11 Jun 2021 18:23:42 GMT
content-type: application/json; charset=utf-8
content-range: 0-9999/73254
content-location: /antigen_search
strict-transport-security: max-age=15724800; includeSubDomains

Now we can see that there are 73,254 matching records, which would correspond to 8 pages of results. Note, that adding this header can be detrimental to query performance. To retrieve the last page of the results, we add the ‘offset’ parameter to the query:

curl -I "https://query-api.iedb.org/antigen_search?offset=73000&order=parent_source_antigen_id" -H 'Prefer: count=exact'                                                 
HTTP/2 206 
server: nginx/1.19.2
date: Fri, 11 Jun 2021 18:28:47 GMT
content-type: application/json; charset=utf-8
content-range: 73000-73253/73254
content-location: /antigen_search?offset=73000&order=parent_source_antigen_id
strict-transport-security: max-age=15724800; includeSubDomains

If an offset is defined that is higher than the number of matching records, an empty result will be returned.

Error messages

Large Query Error - “Cannot enlarge string buffer containing...out of memory”

This message is the result of a large amount of information being returned. Many of the tables in the database contain fields that are information-dense, which can cause buffering issues on the PostGres backend. If a user receives this message, the recommended workflow is to:

Try adding a 'limit' parameter to your query to fetch the first X (e.g., 100) records
Simultaneously, add the request header 'Prefer: count=exact' so you are aware of the total number of records matching the query
Update the query to filter rows (on field values) or unnecessary columns (using ‘select’) and/or continue to page through the results by adding an 'offset' parameter

FAQs

Can I save API links from the IEDB?

Not yet - this feature is currently in development. You will soon be able to retrieve the relevant API links on the ‘Results’ page using the ‘Export’ function.

Is there a mailing list I can join to receive updates on the IQ-API?

Yes, please email help@iedb.org to be added to the current mailing list.

Immune Epitope Database Query API (IQ-API)

Introduction to the IEDB Query API

What is the IQ-API?

What endpoints are available?

Core search endpoints

Export endpoints

Supporting endpoints

How can I query the IQ-API?

What are IDs, IRIs, and CURIEs?

Troubleshooting and FAQs

Common issues and idiosyncrasies

IRIs for Antigens

Curated vs Parent Terms

Results Page Limit & Default Page Size

Error messages

Large Query Error - “Cannot enlarge string buffer containing...out of memory”

FAQs

Comments

Introduction to the IEDB Query API

What is the IQ-API?

What endpoints are available?

Core search endpoints

Export endpoints

Supporting endpoints

How can I query the IQ-API?

What are IDs, IRIs, and CURIEs?

Troubleshooting and FAQs

Common issues and idiosyncrasies

IRIs for Antigens

Curated vs Parent Terms

Results Page Limit & Default Page Size

Error messages

Large Query Error - “Cannot enlarge string buffer containing...out of memory”

FAQs

Related articles