Skip to main content

Epitope-Specific Monitoring of Vaccine Response


In this demo, you'll learn how to use ImmuneWatch DETECT with real-world data from the study conducted by Saggau et al. By downloading the provided dataset and following along with the code blocks, you'll see how TCR-epitope annotation is applied to understand immune responses before and after vaccination.

Saggau C, Martini GR, Rosati E, et al. The pre-exposure SARS-CoV-2-specific T cell repertoire determines the quality of the immune response to vaccination. Immunity. 2022;55(10):1924-1939.e5. doi:10.1016/j.immuni.2022.08.003

Download Dataset



Video walkthrough of this tutorial

Introduction

The research conducted by Saggau et al. offers valuable insights into the immune response triggered by the SARS-CoV-2 vaccination. As depicted in the study design below, they collected samples from individuals at three distinct phases: before vaccination, after the first dose, and post the second dose administration.

Following this, the samples were stimulated with the Spike antigen using a series of smaller, overlapping peptides that encompass the entire antigen sequence. This was followed by TCR sequencing, allowing us to examine the effect of vaccination on the Spike-specific T-cell response.

With ImmuneWatch DETECT, we can add another layer by analyzing the epitope-specific T-cell response.

Research Question

Which epitopes from the Spike antigen are driving the immune response to SARS-CoV-2 vaccination in the Saggau et al. dataset?




Running DETECT

After successfully installing ImmuneWatch DETECT, obtaining a licence key (through our website) and downloading the above data from Saggau et al., we are set! You can initiate the analysis by running the following command in your command line:

./imw_detect \
-i saggau_et_al.tsv \
-o saggau_et_al_predictions.tsv \
-d imwdb \
-l license

Processing the Predictions in Python

We start processing the annotations from DETECT in Python using the following steps:

  • Importing the necessary libraries
  • Reading in the annotations using pandas
  • Applying preprocessing steps to the data

You can follow along by copying the code in a Python script or Jupyter notebook.


import pandas as pd
import plotly.express as px
import plotly.figure_factory as ff


def preprocess_tcrs(df):
# Only keep the top 100 most abundant Spike-reactive clonotypes post 2nd vaccination
df = df[
(df['top100'] == 'yes')
]

# Adjust names of columns
df['count_before_vaccination'] = df['day0_Tmem']
df['count_after_vaccination'] = df['post_1st_Tmem'] + df['post_2nd_Tmem']

# Only keep the columns we need
df = df[['junction_aa', 'v_call', 'count_before_vaccination', 'count_after_vaccination', 'sample', 'Epitope', 'Score', 'Reference TCRs', 'Antigen']]

# The same TCRs might be present in multiple samples, so we need to aggregate them
df = df.groupby(['junction_aa', 'v_call', 'Epitope']).agg({
'count_before_vaccination': 'sum',
'count_after_vaccination': 'sum',
'sample': lambda x: set(x),
'Score': 'first',
'Reference TCRs': 'first',
'Antigen': 'first',
})

# Rename the 'sample' column to 'Samples', indicating the multiple samples that we can now find there
df = df.rename(columns={
'sample': 'Samples',
}).reset_index()

return df


file = 'saggau_et_al_predictions.tsv'
df = pd.read_csv(file, sep='\t')
df = preprocess_tcrs(df)

Confirming the Spike Stimulation

As we've previously established, the samples under study are stimulated by the Spike antigen. Moreover, we've implemented a processing step to the data to retain only the top 100 most abundant Spike-reactive clonotypes following the second vaccination dose. We can now put the algorithm to the test to see if the TCR-epitope annotations align with this Spike stimulation.

Given that DETECT automatically annotates each TCR with the most probable epitope binder, we can validate the Spike stimulation by checking if the TCRs are accurately annotated with a Spike epitope. The scores of these annotations are crucial in this process. For annotations with low scores, indicating less confidence, there may be a variety of different antigens in the annotations. However, for annotations with higher scores, indicating more confidence, we should see a higher percentage of Spike annotations.

We can put this hypothesis to the test by writing some code to visualise the annotations at different score thresholds. At each threshold, we display the proportion of TCRs that get annotated with Spike versus other antigens.

Visualising the Antigens in the Annotations

def plot_antigens_per_score_threshold(df):
# Define the score thresholds
score_thresholds = [0, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3]

# Initialize an empty list to store the results
per_score_threshold = []

# Iterate over the score thresholds
for score_threshold in score_thresholds:
# Filter the dataframe to include only rows where the score is greater than the threshold
df_filtered = df[df['Score'] > score_threshold]

# Split the filtered dataframe into two: one for 'Spike' antigen and one for 'Other' antigens
df_filtered_spike = df_filtered[df_filtered['Antigen'] == 'Spike/surface glycoprotein (S)']
df_filtered_other = df_filtered[df_filtered['Antigen'] != 'Spike/surface glycoprotein (S)']

# For each antigen type, calculate the number of clonotypes and the annotation proportion
for antigen, df_filtered_antigen in [('Spike', df_filtered_spike), ('Other', df_filtered_other)]:
clonotypes = len(df_filtered_antigen)
annotation_proportion = df_filtered_antigen['count_after_vaccination'].sum() / df_filtered['count_after_vaccination'].sum() * 100

# Append the results to the list
per_score_threshold.append({
'Score Threshold': score_threshold,
'Antigen': antigen,
'Clonotypes': clonotypes,
'Annotation Proportion': annotation_proportion,
})

# Convert the list of results into a dataframe
per_score_threshold = pd.DataFrame(per_score_threshold)

# Plot the dataframe as a bar chart
fig = px.bar(per_score_threshold, x='Score Threshold', y='Annotation Proportion', color='Antigen')
fig.show()


plot_antigens_per_score_threshold(df)



Spike Stimulation Confirmed

When we set a score threshold of 0, which doesn't exclude any annotations, only a small percentage are for Spike. However, as we gradually increase the score threshold, the percentage of Spike annotations increases. From the 0.3 score threshold, almost all annotations are for Spike. This confirms that the algorithm was successful in detecting the Spike stimulation.


Epitope-Specific Response

Having established that DETECT identified the Spike stimulation, we can now focus on our primary research question: identifying the epitopes that are driving the immune response. Our approach involves examining all TCRs annotated with a Spike epitope and grouping them by epitope to identify the most prominent ones.

Two key factors for each epitope are the abundance of TCRs targeting that epitope after vaccination (indicating how much the TCRs targeting that epitope have expanded), and the diversity of the TCRs (the number of different clonotypes targeting that epitope).

As we've seen, the score of the annotations is also crucial. Therefore, we should examine the epitope response at various score thresholds. We can visualise this all in a scatter plot, plotting each epitope with the abundance of TCRs targeting the epitope before vaccination on the x-axis, and the abundance of TCRs targeting the epitope after vaccination on the y-axis. The size and color of the dots represent the number of unique clonotypes targeting the epitope.

Visualising the Epitope-Specific Response

def plot_epitope_specific_response(df):
# Initialize an empty list to store the dataframes for each score threshold
sliders = []

# Define the score thresholds
score_thresholds = [0, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3]

# Iterate over the score thresholds
for score_threshold in score_thresholds:
# Filter the dataframe based on the current score threshold and antigen type
filtered_df = df[(df['Score'] > score_threshold) & (df['Antigen'] == 'Spike/surface glycoprotein (S)')]

# Group the filtered dataframe by epitope and aggregate the other columns
grouped_df = filtered_df.groupby(['Epitope']).agg({
'Samples': lambda x: len(set.union(*x)), # Count the unique samples
'junction_aa': lambda x: list(x), # Collect the junction_aa values into a list
'count_before_vaccination': 'sum', # Sum the count_before_vaccination values
'count_after_vaccination': 'sum', # Sum the count_after_vaccination values
}).reset_index()

# Add new columns to the grouped dataframe
grouped_df['Clonotypes'] = grouped_df['junction_aa'].apply(len) # Count the number of clonotypes
grouped_df['Example Clonotypes'] = grouped_df['junction_aa'].apply(lambda x: ','.join(x[:3])) # Get the first 3 clonotypes
grouped_df['Epitope Label'] = grouped_df['Epitope'].apply(lambda x: x if x in set(grouped_df.sort_values('Clonotypes', ascending=False)['Epitope'].head(5)) else '')
grouped_df['DETECT Score Threshold'] = score_threshold # Add the current score threshold
grouped_df['Pre-Vaccine Abundance'] = grouped_df['count_before_vaccination'] # Rename the column
grouped_df['Post-Vaccine Abundance'] = grouped_df['count_after_vaccination'] # Rename the column

# Append the grouped dataframe to the list
sliders.append(grouped_df)

# Concatenate the dataframes in the list into a single dataframe
df = pd.concat(sliders)

# Create a scatter plot with animation
fig = px.scatter(
df,
x="Pre-Vaccine Abundance",
y="Post-Vaccine Abundance",
size="Clonotypes",
color="Clonotypes",
animation_frame="DETECT Score Threshold",
animation_group="Epitope",
text="Epitope Label",
size_max=80,
hover_data=["Epitope", "Clonotypes", 'Example Clonotypes', "Samples", "Pre-Vaccine Abundance", "Post-Vaccine Abundance"],
)

# Update the y-axis to be logarithmic
fig.update_layout(yaxis_type="log")

# Remove the default animation control
fig["layout"].pop("updatemenus")

# Display the plot
fig.show()


plot_epitope_specific_response(df)



When we set the score threshold to 0, not excluding any annotations, we see many epitopes. Multiple stand out due to their high post-vaccine abundance and clonotype diversity, such as TFEYVSQPFLMDLE (TFE), YLQPRTFLL (YLQ), LTDEMIAQY (LTD), RFASVYAWNRKRISNCVADY (RFAS) and KLPDDFTGCV (KLP). Upon hovering over each epitope we can view detailed statistics. For instance, the most dominant epitope, TFE, has annotations in over 100 distinct clonotypes found in 37 unique samples, with a clear motif among the three given example clonotypes.

When we increase the score threshold to 0.05, we see the same prominent epitopes, albeit with slight shifts in location and size, while some of the less prominent epitopes disappear. As we continue to increase the score threshold, we see the epitopes shifting, with TFE and YLQ remaining the most prominent at a threshold of 0.15. TFE at this threshold has a strong presence of more than 50 unique clonotypes occuring in 35 samples, while YLQ has a slightly lower but still strong presence of 11 unique clonotypes in 9 samples. At a threshold of 0.3, TFE remains the only epitope with high confidence annotations. These results suggest that TFEYVSQPFLMDLE (TFE) and YLQPRTFLL (YLQ) are particularly prominent in driving the immune response to SARS-CoV-2 vaccination.

There are some limitations to this analysis. Despite SARS-CoV-2 being a pathogen with one of the highest numbers of known TCR-epitope interactions, and the IMWdb used for predictions containing a large number of these, it does not cover all possible epitopes in the Spike antigen. Therefore, it's possible that some epitopes that influenced the T-cell response in this dataset were not covered by the predictions.

Conclusion

In this demo, we used ImmuneWatch DETECT to analyze TCR sequencing data from Saggau et al., focusing on the immune response to SARS-CoV-2 vaccination. We aimed to identify which Spike epitopes were driving this response. Despite potential limitations, such as not all Spike epitopes having known TCR-epitope interactions, we found two epitopes to be particularly prominent in driving the immune response, TFEYVSQPFLMDLE (TFE) and YLQPRTFLL (YLQ). This analysis demonstrates how utilizing a TCR-Epitope annotation algorithm, specifically ImmuneWatch DETECT, can provide a deeper understanding of epitope-specific immune responses before and after SARS-CoV-2 vaccination.