IEDB Query API (IQ-API) - Use Case 1A

Goal: Search for information related to a specific linear epitope, using 'SIINFEKL' as an example.

This document illustrates some basic usage of the IEDB query API. It is by no means meant to be comprehensive or authoritative, as many tasks can be accomplished in multiple different ways. Here we focus on simple queries of individual tables. For more information on the expressive syntax of PostgresT, refer to this document. For more details on the tables that are part of the API, refer to the swagger documetation.

Some of the queries in this example will take longer than patient. Also please keep in mind that the database is rebuilt weekly so results may change from one run of this document to the next. Along those lines, also note that this is an early beta so it is possible that some of the table or column names may change prior to the production release.

With all that in mind, let's have some fun!

First, let's import required modules, set some globals, and define a function to print the corresponding CURL command for each request. I've tried to include that CURL command for each example so that you can copy/paste it into your terminal. You may want to pipe the output to a tool like 'jq' to have it render neatly.

This may or may not have resulted in a warning about lzma compression. That can be safely ignored...

Search for all epitopes with 'SIINFEKL' as the linear sequence. We use the postgresT 'eq' operator to denote equality.

OK we have the let's have a look. Note: We only print the first record that is returned here since the output can be long and confusing. You'll see...

OK that's hard to parse, let's have a look at a table representation instead.

That matches our search on the IEDB website, where there are 3 epitope records returned.

What if we don't need all of the columns that are returned? Maybe we only want the structure IDs and a few other fields. We can accomplish that by passing the 'select' parameter with a list of the fields we want to retrieve.

Oops, we made a spelling error. Look at the helpful error message! Let's try again....

Awesome. Note the additional complexity in the URL of the last two queries. There are two parameters (linear_sequence & select), multiple values for the latter parameter, and many URL escape codes for the commas. Python's 'request' module handles this all for you, but one should be aware that all portions of the query need to be URL-escaped.

What if we want to search for multiple sequences? Then we'll need to use the postgres 'in' operator in our search term. E.g., here we search for two different sequences.

Cool. And since we've pulled everything into a pandas dataframe, we can opt do to additional filtering here. For instance, if we only want the epitopes that have associated B cell assays:

Search for all antigens that are a parent protein of 'SIINFEKL'. Since the 'linear_sequences' field is an array of linear sequences associated with the antigen, we must use the postgres 'contains' operator, expressed as 'cs' in postgresT.

Search for all T cell assays that test the linear sequence 'SIINFEKL'. Similar to the 'epitope' search, we use the 'eq' operator.