Bacterial typing

This session took place on 25 August 2025 in-person in Bern.

Agenda & Participants

13h00 – 13h10 : Welcome and Introduction (Aitana Neves)
13h10 – 13h20: IMMense pipeline (Tim Roloff)
13h20 – 13h35: IMMense testing and deployment on SPSP (Julius Winter)
13h35 – 14h00: Current status on SPSP and limitations (Julius Winter)
14h00 – 14h30: Overview of typing tools (Julius Winter)
14h30 – 14h45: Coffee break
14h45 – 15h45: Discussion and next steps (all)

Present: Chris Fields (USB), Marc Garcia (CHUV), Trestan Pillonel (CHUV), Alban Ramette (Unibe), Aileen Geers (Unibe), Helena Seth-Smith (UZH), Michael Biggel (UZH), Tim Roloff (UZH), Vanni Benvenga (UZH), Julius Winter (SIB), Aitana Neves (SIB)
Excused: Vladimir Lazarevic (Unige)

IMMense pipeline

SPSP runs the IMMense bioinformatics pipeline, which currently only supports short-reads assembly (long reads under development).

As of today, new releases of IMMense can be rapidly deployed into the SPSP production server, thanks to automated regression tests.

When new samples are submitted to SPSP, they are analysed with the latest IMMense version on the SPSP production server. At this point, no tool/database update triggers a re-analysis of all samples on SPSP.
QUESTION: Should we re-run IMMense (or parts of it) on all the isolates after (significant) updates (e.g. main tool update; updated QC thresholds)?

On the SPSP Private Portal, users can view detailed information about the pipeline used for each isolate (IMMense version, tools and databases versions).

Typing

Many clinical laboratories in Switzerland use Ridom SeqSphere+ software (now acquired by Bruker) for bacterial typing, either from reads or from genomic assemblies.

For bacterial typing, SPSP relies on two modules implemented within IMMense: mlst tool for standard MLST typing and pyMLST for cgMLST.
TODO: Show MLST profile in an additional column on the Private Portal.
TODO
: Let users download the table of cgMLST allelic profiles (unhashed and hashed). 

It is important to update the MLST database frequently for proper assignation of STs.
TODO: On SPSP-side, implement regular automated updates of the MLST database (e.g. weekly). Show on Private Portal the last update date and source schema database name.

Hashing of alleles (e.g. using CRC32) was discussed as a means to ensure that novel alleles are properly called independently of a centralized database. There was general agreement on proceeding forward.
TODO: Implement hashing of alleles according to various standard protocols (notably MD5 (used by Enterobase), CRC32 (used by chewieSnake)).

chewBBACA was also discussed as a typing tool that could replace pyMLST. Issues that were mentioned were speed (because of Prodigal step) and the fact that chewBBACA seems to automatically curate schemes when imported based on a minimally-required BSR (Blast Scoring Ratio).

Curation of the allelic database seems crucial for correctly computing distances afterwards. It was agreed to delete strains with more than 10% missing loci (i.e. miniumum 90% coverage of loci). The skipping of a loci based on the percentage of strains missing it is not recommended, as it would make clusters more unstable as more strains are added to the database.
TODO: Investigate if QC-threshold should be species-specific or can be fixed at e.g. 90% loci-coverage.

cgMLST clusters and tree visualization

Thresholds for clustering should be species-specific (and potentially question-specific).
TODO: Define species-specific thresholds (to be used by default and for naming clusters). Add reference if possible.

QUESTION: Investigate possibility of modifying the threshold dynamically on the frontend, e.g. on the whole-species minimum spanning tree view, and then zoom into a cluster (showing only info in the table for those samples for example).

Contextual data from abroad would be interesting to have in the tree visualization.
TODO: Add contextual sequences imported from the ENA into SPSP clusters and trees (with minimal metadata, e.g. year, country and isolation source).

MSTreeV2 (Enterobase) was presented and discussed. While the general agreement was that the methodology made sense, it was unclear how useful it would be compared to a standard minimum spanning tree in the context of a well-curated database like intended in SPSP (with a minimum 90% loci-coverage). Anyone testing it is encouraged to share feedback with the group.

To be able to interpret within a cluster, a wgSNP tree would be required. Consider adding a disclaimer on the Portal. For the future, consider implementing wgSNP trees within each cluster.

Next steps

It was agreed to continue using pyMLST in general within SPSP. For foodborne pathogens that require notification to the EFSA, chewBBACA might be implemented (to be rediscussed/tested together with NENT, Agroscope and BLV; there is also the possibility to submit raw reads to the EFSA if we wish to keep pyMLST). Consider the possibility of using the same schema as the one recommended by EFSA for those pathogens (independent of the choice of using chewBBACA).

Table of Contents