ImmuneWatch DETECT
Accurately annotate the epitope specificity of your T-cell receptors
Introduction to ImmuneWatch DETECT
Install
ImmuneWatch DETECT is compatible with Linux and MacOS operating systems. Please ensure Docker is installed on your system as a prerequisite. For Docker installation guidelines, refer to the official documentation.
Following Docker setup, you can easily install this tool using the following command:
docker pull public.ecr.aws/q5z5g5i0/detect:latest && wget https://imw-public.s3.eu-central-1.amazonaws.com/detect/imw_detect && chmod +x imw_detect
Run
To run ImmuneWatch DETECT, execute the minimal command provided below from your command line. This process involves taking your TCR repertoire data, retraining the algorithm using the specified database, and generating predictions.
Note that a valid licence is required for operation. To obtain a license, please request access through our website.
./imw_detect \
-i repertoire.tsv \
-o predictions.tsv \
-d imwdb \
-l license
Arguments
-i, --inputfile
ImmuneWatch DETECT supports multiple TCR repertoire file formats, detailed in the table below. Ensure files contain a header row with at least the required columns. All common delimiters, including CSV and TSV, are accepted.
Paired data/Single cell pairing is supported by including one of the following columns: cell_id
, clone_id
or barcode
.
Values in any of these columns cannot be empty. It's also possible to annotate both single cell and bulk data within the same file.
Supported File Formats
File Format | Required Columns |
---|---|
AIRR | junction_aa , v_call , j_call |
Adaptive Biotech ImmunoSEQ | aminoAcid , vGeneName , jGeneName |
Adaptive Biotech ImmunoSEQ v4 | amino_acid , v_resolved , j_resolved |
MiXCR | aaSeqCDR3 , allVHitsWithScore , allJHitsWithScore |
Qiagen | CDR3 amino acid seq , V-region , J-region , chain |
10x Genomics | cdr3 , v_gene , j_gene , is_cell , barcode |
Example AIRR
junction_aa | v_call | j_call |
---|---|---|
CASSIRSSYEQYF | TRBV19*02 | TRBJ2-7*01 |
CARNTGNQFYF | TRAV24*01 | TRAJ49*01 |
Example AIRR Single Cell
clone_id | junction_aa | v_call | j_call |
---|---|---|---|
1 | CASSIRSSYEQYF | TRBV19*02 | TRBJ2-7*01 |
1 | CAGDDQGGKLIF | TRAV27*01 | TRAJ23*01 |
2 | CARNTGNQFYF | TRAV24*01 | TRAJ49*01 |
-o, --outputfile
The output of ImmuneWatch DETECT is provided in TSV format, and contains the AIRR standard columns junction_aa
, v_call
, and j_call
. Below is a small example of what the output file may look like, including the header and a few sample rows.
ImmuneWatch DETECT will add two extra columns:
Epitope
: The ImmuneWatch DETECT algorithm annotates the epitope specificity of TCRs based on the likeliest epitope from the provided database to be recognized by the TCR. If no corresponding epitope is identified, the value will be 'None'.Score
: Represents the binding score, which ranges from 0 (no binding) to 1 (highest binding), indicating the likelihood of the TCR recognizing the annotated epitope. For most purposes, when using the IMWdb as the database, we recommend a score cut-off of 0.2 for determining reliable predictions. This recommendation also applies when using the--epitope
argument. Detailed information on scoring can be found here.
junction_aa | v_call | j_call | Epitope | Score |
---|---|---|---|---|
CASSIRSSYEQYF | TRBV19*02 | TRBJ2-7*01 | GILGFVFTL | 0.3987 |
CARNTGNQFYF | TRAV24*01 | TRAJ49*01 | NLVPMVATV | 0.2836 |
Explainability
When using the IMWdb database, the output file will include an additional column: Reference TCRs
.
These are TCRs that support the given epitope annotation, and come together with the DOI of the publication where the TCR-Epitope pair was reported.
junction_aa | v_call | j_call | Epitope | Score | Reference TCRs |
---|---|---|---|---|---|
CASSIRSSYEQYF | TRBV19*02 | TRBJ2-7*01 | GILGFVFTL | 0.3987 | [('CASSSRSSYEQYF', '10.1073/pnas.1603106113')] |
CARNTGNQFYF | TRAV24*01 | TRAJ49*01 | NLVPMVATV | 0.2836 | [('CAFNTGNQFYF', '10.4049/jimmunol.1303147'), ('CASNTGNQFYF', '10.1016/j.celrep.2017.03.072')] |
Additional Epitope Information: Antigen, Species and HLA
When using the IMWdb or VDJdb databases, the output file will include two additional columns: Antigen
and Species
.
With the IMWdb database, an extra column is included, reporting the HLA
of the predicted target Epitope.
junction_aa | v_call | j_call | Epitope | Score | Antigen | Species | HLA |
---|---|---|---|---|---|---|---|
CASSIRSSYEQYF | TRBV19*02 | TRBJ2-7*01 | GILGFVFTL | 0.3987 | M | InfluenzaA | HLA-A*0201 |
CARNTGNQFYF | TRAV24*01 | TRAJ49*01 | NLVPMVATV | 0.2836 | pp65 | CMV | HLA-B*27 |
-d, --database
ImmuneWatch DETECT is designed to work with the IMWdb. However the core program is sufficiently versatile to start from any TCR-Epitope data, allowing users to leverage their own training datasets to annotate TCR specificity. Unfortunately, the same level of high quality predictions cannot be guaranteed. Open an issue if you would like to discuss this further or see your data included.
Supported File Formats
For those opting to use their own TCR-Epitope datasets, ImmuneWatch DETECT currently supports data in the AIRR (Adaptive Immune Receptor Repertoire) format. All common delimiters, including CSV and TSV, are accepted
File Format | Required Columns |
---|---|
AIRR | junction_aa , v_call , j_call , epitope |
Recommended Databases
If you do not have your own TCR-Epitope data, we have curated a list of recommended databases that are compatible with ImmuneWatch DETECT. You can follow the download instructions and subsequent database argument available below to directly use these databases to make predictions.
Database | Download | Argument |
---|---|---|
IMWdb | ImmuneWatch's own database. Use this for best performance. No additional download necessary | -d imwdb |
VDJdb | wget https://github.com/antigenomics/vdjdb-db/releases/download/2023-06-01/vdjdb-2023-06-01.zip && unzip vdjdb-2023-06-01.zip -d vdjdb | -d vdjdb/vdjdb.txt |
Note that it is the responsibility of each user to ensure that they comply with the terms and conditions of any external database before downloading and using it. We strongly advise you to review these terms and conditions carefully to ensure full compliance.
--epitope
Utilising the optional argument --epitope
shifts ImmuneWatch DETECT's functionality from identifying the most likely binding epitope for your TCRs to calculating the binding score of each TCR against the specific epitopes you provide.
This feature allows for targeted analysis, focusing on the interaction between your TCR repertoire and particular epitopes of interest.
You can supply one or multiple epitopes of interest to the --epitope
argument. Utilising this argument results in some changes to the output format in comparison to the standard output format:
- For each epitope provided, a corresponding score column is created, such as
Score (GILGFVFTL)
andScore (NLVPMVATV)
. - With usage of the IMWdb, there is a column with the Reference TCRs for each epitope, such as
Reference TCRs (GILGFVFTL)
andReference TCRs (NLVPMVATV)
. - The score columns now range from -1 to 1. A negative score indicates that the predicted target space of the TCR likely does not include the query epitope, with the magnitude of the negative score reflecting the confidence of this prediction.
- The
Epitope
column is removed.
--epitope GILGFVFTL
junction_aa | v_call | j_call | Score (GILGFVFTL) |
---|---|---|---|
CASSIRSSYEQYF | TRBV19*02 | TRBJ2-7*01 | 0.4901 |
CAGRLWTDKLIF | TRAV27 | TRAJ34 | 0.1435 |
CASGPLLLMTNEQFF | TRBV12-4*01 | TRBJ2-1*01 | -0.7181 |
--epitope GILGFVFTL NLVPMVATV
junction_aa | v_call | j_call | Score (GILGFVFTL) | Score (NLVPMVATV) |
---|---|---|---|---|
CASSIRSSYEQYF | TRBV19*02 | TRBJ2-7*01 | 0.4901 | -0.4901 |
CAGRLWTDKLIF | TRAV27 | TRAJ34 | 0.1435 | 0.0064 |
CASGPLLLMTNEQFF | TRBV12-4*01 | TRBJ2-1*01 | -0.7181 | 0.7181 |
ImmuneWatch DETECT is an algorithm that falls into the seen-epitope category. An annotation with a certain epitope generally requires the database to contain training data for that epitope.
However, predictions for unseen-epitopes are also supported to a limited extent. When an epitope is similar to an epitope in the database, ImmuneWatch DETECT can make predictions for it.
You can use the check-epitope-support
command to verify whether your epitope of interest is supported by the IMWdb.
Advanced arguments
--add-custom-data-to-imwdb
If you have your own TCR-epitope data, it could be useful to add these to the training database to improve the annotation scores of ImmuneWatch DETECT for these specific epitopes. You can use the --add-custom-data-to-imwdb
to add these TCR-epitope pairs to IMWdb. It is important to realise that this data will be added to IMWdb locally and thus will never leave your machine. You can thus be ensured that your TCR-epitope data remains private.
The --add-custom-data-to-imwdb
argument is recommended over the --database
argument. Using --add-custom-data-to-imwdb
enhances the training dataset for the ImmuneWatch DETECT algorithm by including both the existing (IMWdb) and custom data. In contrast, the --database
argument only utilises the custom data, which considerably reduces the size of the training dataset, leading to less accurate TCR-epitope annotations.
Supported File Formats
Similarly to the --database
option, the --add-custom-data-to-imwdb
supports data in the AIRR (Adaptive Immune Receptor Repertoire) format. All common delimiters, including CSV and TSV, are accepted.
File Format | Required Columns |
---|---|
AIRR | junction_aa , v_call , j_call , epitope |
--motif
Disclaimer: This feature is still under development, and its performance may differ from our standard model. We encourage your feedback to help us improve this feature.
The motif option offers an alternative approach to predicting TCR-epitope binding. Unlike the standard output of ImmuneWatch DETECT, which identifies the sequence of the most likely epitope binder, the motif argument will additionally generate a sequence motif. This motif represents a potentially diverse set of epitopes that a given TCR might bind to.
Its output includes two new columns and a modification to the score calculation:
Epitope Motif
: A human-readable summary of the epitope motif generated by ImmuneWatch DETECT, highlighting the most frequent amino acids at each position. It only includes amino acids that appear with a frequency of 5% or higher, omitting those with lower frequencies.Epitope Motif (MEME)
: The complete motif is provided in MEME format. For detailed information about this file format, please refer to the MEME documentation. Additionally, you can view an example output of an epitope motif in MEME format.Score
: The score reflects how closely the target epitope aligns with the generated sequence motif. By default, ImmuneWatch DETECT identifies the most likely epitope binder and calculates the score by comparing the predicted epitope with the motif. However, if target epitopes are specified using the--epitope
option, each target epitope is compared against the generated motif, and the score is calculated based on this comparison. Please be aware that this score is calculated differently than what is described on our scoring page. We are currently researching optimal cut-off scores.
junction_aa | v_call | j_call | Score | Epitope | Epitope Motif | Epitope Motif (MEME) |
---|---|---|---|---|---|---|
CASSIRSSYEQYF | TRBV19*02 | TRBJ2-7*01 | 0.923 | GILGFVFTL | |0:G94|1:I94|2:L94|3:G84E10|4:F89|5:V94|6:F94|7:T94|8:L94 | MEME version 4 ... |
CARNTGNQFYF | TRAV24*01 | TRAJ49*01 | 0.894 | NLVPMVATV | |0:N89|1:L91|2:V91|3:P89|4:M89|5:V89|6:A89|7:T89|8:V89 | MEME version 4 ... |
Epitope Motif
Interpretation
The motif for the first TCR is summarized as |0:G94|1:I94|2:L94|3:G84E10|4:F89|5:V94|6:F94|7:T94|8:L94
.
In this summary:
- At position 0, the amino acid G dominates with a frequency of 94%.
- Position 1 is similarly dominated by the amino acid I, also at 94%.
- This pattern continues for other positions, where the most frequent amino acid is listed with its corresponding percentage.
- At position 3, however, while G is the most frequent with 84%, the amino acid E also appears with a frequency of 10%, indicating a secondary presence in the motif.
Epitope Motif
Visualisation
The code snippet below allows us to generate a weblogo of the epitope motif, offering a visual representation of the TCR's likely epitope binding preferences. Alternatively, you can use the --output-html
feature, which will automatically generate visualisations of these epitope motifs.

Visualising the Epitope Motif using the MEME Format
To run the code, the following pip packages have to be installed
pip install pymemesuite weblogo numpy
import tempfile
import numpy as np
from pymemesuite.common import MotifFile
from weblogo import LogoData, LogoOptions, png_formatter, LogoFormat, unambiguous_protein_alphabet
def parse_meme_file(file_path):
with MotifFile(file_path) as f:
return f.read()
def parse_meme_string(meme_str):
# Write the MEME string to a temporary file
with tempfile.NamedTemporaryFile(delete=False) as temp_file:
temp_file.write(meme_str.encode())
temp_file_path = temp_file.name
# Parse the temporary file
with MotifFile(temp_file_path) as f:
return f.read()
def generate_logo(motif, output_file):
# Generate the sequence logo using WebLogo
logo_data = LogoData.from_counts(unambiguous_protein_alphabet, np.array(motif.frequencies))
logo_options = LogoOptions()
logo_options.resolution = 200
logo_format = LogoFormat(logo_data, logo_options)
# Save the logo to a PNG file
with open(output_file, 'wb') as f:
f.write(png_formatter(logo_data, logo_format))
motif = parse_meme_file('input.meme')
generate_logo(motif, 'output.png')
--output-html
Using the --output-html
option, you can convert your ImmuneWatch DETECT predictions into an interactive HTML table, providing an easy way to explore the results.
When combined with the --motif
option, this option automatically generates visual representations of the Epitope Motifs for you. It also simplifies the output by hiding the Epitope Motif summary and the complete MEME format, making the data easier to interpret.
--motif --output-html --epitope GILGFVFTL NLVPMVATV
Take a look at this example output, or go to the full screen version.
Performance
The performance of ImmuneWatch DETECT has been evaluated during the IMMREP23 benchmark competition. Here's a quote from the paper:
Most methods in the G2 group have comparable performance with the exception IMW DETECT that shows an substantial predictive advantage.

Figure 5 from the IMMREP23 paper.
Citation
When using ImmuneWatch DETECT please cite as follows:
ImmuneWatch DETECT, Version 1.0. Developed by ImmuneWatch BV. 2024. Available at: "https://www.immunewatch.com/detect"