Impact of the Database
In this guide, we delve into the influence of database selection on the predictions generated by DETECT. The database serves as the training data for the algorithm, and the selection of this data is crucial in machine learning, and by extension, in TCR-Epitope prediction.
What's our approach? We compare the predictions made by DETECT when utilizing two distinct databases: IMWdb and VDJdb. IMWdb is our proprietary, carefully curated database, whereas VDJdb is a commonly employed database in the space of TCR-Epitope prediction.
To assess the predictions, we employ a benchmark dataset. Specifically, the IMMREP23 dataset is used. This dataset comprises known TCR binders for 20 different epitopes. Some of these epitopes are unseen, as there was no public data available for them at the time of the competition. After generating predictions, the AUC0.1 is produced for each epitope, and this value is then averaged. AUC0.1 is a metric used to evaluate the performance of a binary classifier in its ability to correctly classify predictions, with a focus on low false positive rate.
We obtain two AUC0.1 scores: 0.7274 for IMWdb and 0.6551 for VDJdb.
With a significant increase of 0.0723, the results underscore the substantial impact of database selection on the predictions made by DETECT and show the increased accuracy of predictions when using IMWdb.
Once the complete benchmark data from the IMMREP23 is released, we will refresh this guide with comprehensive instructions to execute the benchmark on your own.