Leonie F Forth#, Burkhard Malorny, Markus Bönn, Erik Brinks, Grégoire Denay, Carlus Deneke, Hosny El-Adawy, Jennie Fischer, Jannika Fuchs, Ekkehard Hiller, Nancy Bretschneider, Sylvia Kleta, Stefanie Lüth, Tilman Schultze, Henning Petersen, Michaela Projahn, Christian Schäfers, Kerstin Stingl, Andreas J Stroehlein, Laura Uelze, Kathrin Szabo, Anne Wöhlke, Jörg Linde#
An inter-laboratory study characterizes the impact of bioinformatic approaches on genome-based cluster detection for foodborne bacterial pathogens.
Front Microbiol, 16 Art. No. 1629731 (2025)
Open Access PubMed Source
Accurate assignment of whole-genome sequences to clusters in foodborne outbreak investigations remains challenging. Variability in bioinformatics tools and quality metrics significantly impacts clustering outcomes. This study assessed inter-laboratory variance in cluster identification by providing four datasets of 50 raw Illumina paired-end sequences covering Shiga toxin-producing Escherichia coli, Listeria monocytogenes, Salmonella enterica, and Campylobacter jejuni. Following general rules of a specified guideline, participants applied in-house protocols for read quality assessment, 7-gene MLST, cgMLST, and SNP calling, then assigned samples to predefined focus clusters based on allele distance (AD) and mutations. Results revealed that differences in the interpretation of raw sequence and genome assembly quality influenced sample inclusion and finally cluster composition. Here, intra-species contamination was the most significant factor driving variability in decisions on whether to include or exclude samples. With one exception, 7-gene Multilocus-Sequence Typing (MLST) yielded consistent sequence types using different bioinformatics tools. The largest influence on cgMLST-defined clusters was the inclusion or exclusion of samples. Regarding bioinformatics, cgMLST was mainly reproducible. For S. enterica, discrepancies due to different software (Ridom SeqSphere+ vs. ChewieSnake) were larger than discrepancies due to different schemas. For other species, different schemas introduced larger discrepancies than different software. Most notably, C. jejuni cluster assignment was strongly affected by cgMLST schemas differing by a factor of two in the number of loci. SNP calling using Snippy produced concordant results across participants, except for C. jejuni when recombination filtering was used. This study highlights the impact caused by different interpretations of quality values when assessing clusters. Low-resolution cgMLST schemas were unsuitable for Campylobacter jejuni, and clustering near cut-off values was sensitive to bioinformatics tool selection. Standardized protocols are essential for reliable inter-laboratory comparison in foodborne pathogen surveillance.
@article{Forth9091,
author={Leonie F Forth, Burkhard Malorny, Markus Bönn, Erik Brinks, Grégoire Denay, Carlus Deneke, Hosny El-Adawy, Jennie Fischer, Jannika Fuchs, Ekkehard Hiller, Nancy Bretschneider, Sylvia Kleta, Stefanie Lüth, Tilman Schultze, Henning Petersen, Michaela Projahn, Christian Schäfers, Kerstin Stingl, Andreas J Stroehlein, Laura Uelze, Kathrin Szabo, Anne Wöhlke, Jörg Linde},
title={An inter-laboratory study characterizes the impact of bioinformatic approaches on genome-based cluster detection for foodborne bacterial pathogens.},
journal ={Frontiers in microbiology},
volume={16},
pages={null--null},
year=2025
}