Skip to main content

ImmuneWatch DETECT

Accurately annotate the epitope specificity of your T-cell receptors



Introduction to ImmuneWatch DETECT

Install

ImmuneWatch DETECT is compatible with Linux and MacOS operating systems. Please ensure Docker is installed on your system as a prerequisite. For Docker installation guidelines, refer to the official documentation.

Following Docker setup, you can easily install this tool using the following command:

docker pull public.ecr.aws/q5z5g5i0/detect:latest && wget https://imw-public.s3.eu-central-1.amazonaws.com/detect/imw_detect && chmod +x imw_detect

Run

To run ImmuneWatch DETECT, execute the minimal command provided below from your command line. This process involves taking your TCR repertoire data, retraining the algorithm using the specified database, and generating predictions.

Note that a valid licence is required for operation. To obtain a license, please request access through our website.

./imw_detect \
-i repertoire.tsv \
-o predictions.tsv \
-d imwdb \
-l license

Arguments

-i, --inputfile

ImmuneWatch DETECT supports multiple TCR repertoire file formats, detailed in the table below. Ensure files contain a header row with at least the required columns. All common delimiters, including CSV and TSV, are accepted.

Paired data/Single cell pairing is supported by including one of the following columns: cell_id, clone_id or barcode. Values in any of these columns cannot be empty. It's also possible to annotate both single cell and bulk data within the same file.

Supported File Formats

File FormatRequired Columns
AIRRjunction_aa, v_call, j_call
Adaptive Biotech ImmunoSEQaminoAcid, vGeneName, jGeneName
Adaptive Biotech ImmunoSEQ v4amino_acid, v_resolved, j_resolved
MiXCRaaSeqCDR3, allVHitsWithScore, allJHitsWithScore
QiagenCDR3 amino acid seq, V-region, J-region, chain
10x Genomicscdr3, v_gene, j_gene, is_cell, barcode

Example AIRR

junction_aav_callj_call
CASSIRSSYEQYFTRBV19*02TRBJ2-7*01
CARNTGNQFYFTRAV24*01TRAJ49*01

Example AIRR Single Cell

clone_idjunction_aav_callj_call
1CASSIRSSYEQYFTRBV19*02TRBJ2-7*01
1CAGDDQGGKLIFTRAV27*01TRAJ23*01
2CARNTGNQFYFTRAV24*01TRAJ49*01

-o, --outputfile

The output of ImmuneWatch DETECT is provided in TSV format, and contains the AIRR standard columns junction_aa, v_call, and j_call. Below is a small example of what the output file may look like, including the header and a few sample rows.

ImmuneWatch DETECT will add two extra columns:

  • Epitope: The ImmuneWatch DETECT algorithm annotates the epitope specificity of TCRs based on the likeliest epitope from the provided database to be recognized by the TCR. If no corresponding epitope is identified, the value will be 'None'.
  • Score: Represents the binding score, which ranges from 0 (no binding) to 1 (highest binding), indicating the likelihood of the TCR recognizing the annotated epitope. For most purposes, when using the IMWdb as the database, we recommend a score cut-off of 0.2 for determining reliable predictions. This recommendation also applies when using the --epitope argument. Detailed information on scoring can be found here.
junction_aav_callj_callEpitopeScore
CASSIRSSYEQYFTRBV19*02TRBJ2-7*01GILGFVFTL0.3987
CARNTGNQFYFTRAV24*01TRAJ49*01NLVPMVATV0.2836

Explainability

When using the IMWdb database, the output file will include an additional column: Reference TCRs. These are TCRs that support the given epitope annotation, and come together with the DOI of the publication where the TCR-Epitope pair was reported.

junction_aav_callj_callEpitopeScoreReference TCRs
CASSIRSSYEQYFTRBV19*02TRBJ2-7*01GILGFVFTL0.3987[('CASSSRSSYEQYF', '10.1073/pnas.1603106113')]
CARNTGNQFYFTRAV24*01TRAJ49*01NLVPMVATV0.2836[('CAFNTGNQFYF', '10.4049/jimmunol.1303147'), ('CASNTGNQFYF', '10.1016/j.celrep.2017.03.072')]

Additional Epitope Information: Antigen, Species and HLA

When using the IMWdb or VDJdb databases, the output file will include two additional columns: Antigen and Species.
With the IMWdb database, an extra column is included, reporting the HLA of the predicted target Epitope.

junction_aav_callj_callEpitopeScoreAntigenSpeciesHLA
CASSIRSSYEQYFTRBV19*02TRBJ2-7*01GILGFVFTL0.3987MInfluenzaAHLA-A*0201
CARNTGNQFYFTRAV24*01TRAJ49*01NLVPMVATV0.2836pp65CMVHLA-B*27

-d, --database

ImmuneWatch DETECT is designed to work with the IMWdb. However the core program is sufficiently versatile to start from any TCR-Epitope data, allowing users to leverage their own training datasets to annotate TCR specificity. Unfortunately, the same level of high quality predictions cannot be guaranteed. Open an issue if you would like to discuss this further or see your data included.

Supported File Formats

For those opting to use their own TCR-Epitope datasets, ImmuneWatch DETECT currently supports data in the AIRR (Adaptive Immune Receptor Repertoire) format. All common delimiters, including CSV and TSV, are accepted

File FormatRequired Columns
AIRRjunction_aa, v_call, j_call, epitope

If you do not have your own TCR-Epitope data, we have curated a list of recommended databases that are compatible with ImmuneWatch DETECT. You can follow the download instructions and subsequent database argument available below to directly use these databases to make predictions.

DatabaseDownloadArgument
IMWdbImmuneWatch's own database. Use this for best performance. No additional download necessary-d imwdb
VDJdbwget https://github.com/antigenomics/vdjdb-db/releases/download/2023-06-01/vdjdb-2023-06-01.zip && unzip vdjdb-2023-06-01.zip -d vdjdb-d vdjdb/vdjdb.txt

Note that it is the responsibility of each user to ensure that they comply with the terms and conditions of any external database before downloading and using it. We strongly advise you to review these terms and conditions carefully to ensure full compliance.


--epitope

Utilising the optional argument --epitope shifts ImmuneWatch DETECT's functionality from identifying the most likely binding epitope for your TCRs to calculating the binding score of each TCR against the specific epitopes you provide. This feature allows for targeted analysis, focusing on the interaction between your TCR repertoire and particular epitopes of interest.

You can supply one or multiple epitopes of interest to the --epitope argument. Utilising this argument results in some changes to the output format in comparison to the standard output format:

  • For each epitope provided, a corresponding score column is created, such as Score (GILGFVFTL) and Score (NLVPMVATV).
  • With usage of the IMWdb, there is a column with the Reference TCRs for each epitope, such as Reference TCRs (GILGFVFTL) and Reference TCRs (NLVPMVATV).
  • The score columns now range from -1 to 1. A negative score indicates that the predicted target space of the TCR likely does not include the query epitope, with the magnitude of the negative score reflecting the confidence of this prediction.
  • The Epitope column is removed.

--epitope GILGFVFTL

junction_aav_callj_callScore (GILGFVFTL)
CASSIRSSYEQYFTRBV19*02TRBJ2-7*010.4901
CAGRLWTDKLIFTRAV27TRAJ340.1435
CASGPLLLMTNEQFFTRBV12-4*01TRBJ2-1*01-0.7181

--epitope GILGFVFTL NLVPMVATV

junction_aav_callj_callScore (GILGFVFTL)Score (NLVPMVATV)
CASSIRSSYEQYFTRBV19*02TRBJ2-7*010.4901-0.4901
CAGRLWTDKLIFTRAV27TRAJ340.14350.0064
CASGPLLLMTNEQFFTRBV12-4*01TRBJ2-1*01-0.71810.7181

Unseen Epitope Predictions?

ImmuneWatch DETECT is an algorithm that falls into the seen-epitope category. An annotation with a certain epitope generally requires the database to contain training data for that epitope. However, predictions for unseen-epitopes are also supported to a limited extent. When an epitope is similar to an epitope in the database, ImmuneWatch DETECT can make predictions for it. You can use the check-epitope-support command to verify whether your epitope of interest is supported by the IMWdb.


Advanced arguments

--add-custom-data-to-imwdb

If you have your own TCR-epitope data, it could be useful to add these to the training database to improve the annotation scores of ImmuneWatch DETECT for these specific epitopes. You can use the --add-custom-data-to-imwdb to add these TCR-epitope pairs to IMWdb. It is important to realise that this data will be added to IMWdb locally and thus will never leave your machine. You can thus be ensured that your TCR-epitope data remains private.

The --add-custom-data-to-imwdb argument is recommended over the --database argument. Using --add-custom-data-to-imwdb enhances the training dataset for the ImmuneWatch DETECT algorithm by including both the existing (IMWdb) and custom data. In contrast, the --database argument only utilises the custom data, which considerably reduces the size of the training dataset, leading to less accurate TCR-epitope annotations.

Supported File Formats

Similarly to the --database option, the --add-custom-data-to-imwdb supports data in the AIRR (Adaptive Immune Receptor Repertoire) format. All common delimiters, including CSV and TSV, are accepted.

File FormatRequired Columns
AIRRjunction_aa, v_call, j_call, epitope

--motif

Disclaimer: This feature is still under development, and its performance may differ from our standard model. We encourage your feedback to help us improve this feature.

The motif option offers an alternative approach to predicting TCR-epitope binding. Unlike the standard output of ImmuneWatch DETECT, which identifies the sequence of the most likely epitope binder, the motif argument will additionally generate a sequence motif. This motif represents a potentially diverse set of epitopes that a given TCR might bind to.

Its output includes two new columns and a modification to the score calculation:

  • Epitope Motif: A human-readable summary of the epitope motif generated by ImmuneWatch DETECT, highlighting the most frequent amino acids at each position. It only includes amino acids that appear with a frequency of 5% or higher, omitting those with lower frequencies.
  • Epitope Motif (MEME): The complete motif is provided in MEME format. For detailed information about this file format, please refer to the MEME documentation. Additionally, you can view an example output of an epitope motif in MEME format.
  • Score: The score reflects how closely the target epitope aligns with the generated sequence motif. By default, ImmuneWatch DETECT identifies the most likely epitope binder and calculates the score by comparing the predicted epitope with the motif. However, if target epitopes are specified using the --epitope option, each target epitope is compared against the generated motif, and the score is calculated based on this comparison. Please be aware that this score is calculated differently than what is described on our scoring page. We are currently researching optimal cut-off scores.

junction_aav_callj_callScoreEpitopeEpitope MotifEpitope Motif (MEME)
CASSIRSSYEQYFTRBV19*02TRBJ2-7*010.923GILGFVFTL|0:G94|1:I94|2:L94|3:G84E10|4:F89|5:V94|6:F94|7:T94|8:L94MEME version 4 ...
CARNTGNQFYFTRAV24*01TRAJ49*010.894NLVPMVATV|0:N89|1:L91|2:V91|3:P89|4:M89|5:V89|6:A89|7:T89|8:V89MEME version 4 ...

Epitope Motif Interpretation

The motif for the first TCR is summarized as |0:G94|1:I94|2:L94|3:G84E10|4:F89|5:V94|6:F94|7:T94|8:L94.
In this summary:

  • At position 0, the amino acid G dominates with a frequency of 94%.
  • Position 1 is similarly dominated by the amino acid I, also at 94%.
  • This pattern continues for other positions, where the most frequent amino acid is listed with its corresponding percentage.
  • At position 3, however, while G is the most frequent with 84%, the amino acid E also appears with a frequency of 10%, indicating a secondary presence in the motif.

Epitope Motif Visualisation

The code snippet below allows us to generate a weblogo of the epitope motif, offering a visual representation of the TCR's likely epitope binding preferences. Alternatively, you can use the --output-html feature, which will automatically generate visualisations of these epitope motifs.




Visualising the Epitope Motif using the MEME Format

To run the code, the following pip packages have to be installed

pip install pymemesuite weblogo numpy
import tempfile
import numpy as np
from pymemesuite.common import MotifFile
from weblogo import LogoData, LogoOptions, png_formatter, LogoFormat, unambiguous_protein_alphabet


def parse_meme_file(file_path):
with MotifFile(file_path) as f:
return f.read()


def parse_meme_string(meme_str):
# Write the MEME string to a temporary file
with tempfile.NamedTemporaryFile(delete=False) as temp_file:
temp_file.write(meme_str.encode())
temp_file_path = temp_file.name

# Parse the temporary file
with MotifFile(temp_file_path) as f:
return f.read()


def generate_logo(motif, output_file):
# Generate the sequence logo using WebLogo
logo_data = LogoData.from_counts(unambiguous_protein_alphabet, np.array(motif.frequencies))
logo_options = LogoOptions()
logo_options.resolution = 200
logo_format = LogoFormat(logo_data, logo_options)

# Save the logo to a PNG file
with open(output_file, 'wb') as f:
f.write(png_formatter(logo_data, logo_format))


motif = parse_meme_file('input.meme')
generate_logo(motif, 'output.png')

--output-html

Using the --output-html option, you can convert your ImmuneWatch DETECT predictions into an interactive HTML table, providing an easy way to explore the results.

When combined with the --motif option, this option automatically generates visual representations of the Epitope Motifs for you. It also simplifies the output by hiding the Epitope Motif summary and the complete MEME format, making the data easier to interpret.

--motif --output-html --epitope GILGFVFTL NLVPMVATV

Take a look at this example output, or go to the full screen version.


Performance

The performance of ImmuneWatch DETECT has been evaluated during the IMMREP23 benchmark competition. Here's a quote from the paper:

Most methods in the G2 group have comparable performance with the exception IMW DETECT that shows an substantial predictive advantage.

Figure 5 from the IMMREP23 paper.

Citation

When using ImmuneWatch DETECT please cite as follows:

ImmuneWatch DETECT, Version 1.0. Developed by ImmuneWatch BV. 2024. Available at: "https://www.immunewatch.com/detect"