Language Selection

Get healthy now with MedBeds!
Click here to book your session

Protect your whole family with Orgo-Life® Quantum MedBed Energy Technology® devices.

Advertising by Adpathway

Life Identification Numbers: A strain nomenclature approach to aid epidemiological surveillance of bacterial pathogens

5 days ago 8

PROTECT YOUR DNA WITH QUANTUM TECHNOLOGY

Orgo-Life the new way to the future

Advertising by Adpathway

Loading metrics

Federica Palma,
Melanie Hennart,
Keith A. Jolley,
Chiara Crestani,
Kelly L. Wyres,
Sebastien Bridel,
Corin A. Yeats,
Bryan Brancotte,
Brice Raffestin,
Sophia David

Published: June 4, 2026
https://doi.org/10.1371/journal.pbio.3003781

Abstract

Unified strain taxonomies are needed for the epidemiological surveillance of bacterial pathogens and international communication in microbiological research. Core genome multilocus sequence typing (cgMLST) holds great promise for standardized high-resolution strain genotyping. However, this approach faces challenges including classification instability and disconnection of new nomenclature from widely adopted classical MLST identifiers. This Essay discusses the cgMLST-based Life Identification Number (LIN) method, recently proposed as a stable multilevel strain taxonomy system applicable to most bacterial pathogens, covering how LIN codes are implemented and used in practice for precise strain definitions and epidemiological tracking.

Citation: Palma F, Hennart M, Jolley KA, Crestani C, Wyres KL, Bridel S, et al. (2026) Life Identification Numbers: A strain nomenclature approach to aid epidemiological surveillance of bacterial pathogens. PLoS Biol 24(6): e3003781. https://doi.org/10.1371/journal.pbio.3003781

Published: June 4, 2026

Copyright: © 2026 Palma et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work was supported by the Gates Foundation (INV-025280 to DMA, INV-077266 to KEH), and by European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 773830 to Sy.B). This work was also supported financially by the French Government’s Investissement d’Avenir program Laboratoire d’Excellence “Integrative Biology of Emerging Infectious Diseases” (ANR-10-LABX-62-IBEID to Sy.B). BIGSdb development was funded by a Wellcome Trust Biomedical Resource grant (218205/Z/19/Z to MCJM). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: KH is an Academic Editor at PLOS Biology.

Abbreviations: ANI, Average Nucleotide Identity; CG, clonal groups; cgMLST, core genome multilocus sequence typing; HierCC, Hierarchical clustering; KpSC, Klebsiella pneumoniae Species Complex; LIN, Life Identification Number; MLST, multilocus sequence typing; SL, sublineages; SNPs, single-nucleotide polymorphisms; ST, sequence type; WGS, whole-genome sequencing

Introduction

Taxonomies of bacterial strains responsible for infectious diseases are essential resources to ensure effective communication in population biology, epidemiological surveillance, and public health response to outbreaks. As illustrated by the SARS-CoV-2 variant nomenclature, simple nicknames (e.g., Alpha, Delta, Omicron) can greatly improve communication among multiple actors, including the public, in the face of public health threats [1,2]. Strain taxonomies are therefore needed to precisely recognize and define variants with properties of special medical interest, such as antimicrobial resistance, high virulence, or vaccine escape.

Taxonomic systems are based on three pillars: classification, nomenclature, and identification. Currently, there are neither classification nor nomenclature standards to define sublineages, variants, types, or clones (hereafter, collectively called ‘strains’) within bacterial species [3]. Linnean taxonomy encompasses classification levels from Phylum down to subspecies, but the latter is seldom used for bacterial species as it is not a practical solution to describe strains. Ad-hoc phenotypic (e.g., serotypes) and genotypic (e.g., sequence types) methods have long been used to differentiate strains from particular species but have shown limitations in terms of universal applicability, reproducibility of classification, or level of resolution. However, the advent of universally applicable whole-genome sequencing (WGS, see Box 1) has advanced the potential to refine and generalize strain taxonomy (Box 1) by providing the maximal discrimination power needed for epidemiological surveillance, while being broadly applicable as a harmonized approach across pathogen Phyla [4–6]. Yet, few attempts have been made to devise genomic taxonomies of strains and evaluate their general applicability. With WGS now being implemented worldwide in all sectors of microbiology (e.g., medical, veterinary, food, environmental), a precise and universal procedure for describing bacterial strains has become a key need to translate WGS data into relevant information that would support epidemiological surveillance, outbreak investigations, cross-niche or between-host transmission tracking, and public health actions that need international and cross-sectoral coordination. In this Essay, we introduce a new approach to describing bacterial strains and explain its benefits for stable definition and labeling of intra-species groups from the deepest phylogenetic lineages through epidemiologically important clonal groups down to outbreak strains.

Box 1. Glossary

Whole-genome sequencing

A method that determines the complete DNA sequence of an organism’s genome in a single process, providing comprehensive information for comparative genetic analyses based on core genome multilocus sequence typing (cgMLST) or other analytic methods.

Taxonomy

Here, we apply the word taxonomy to bacterial strains as a system of classifying, naming, and identifying strains based on shared genetic characteristics, e.g., as defined by cgMLST.

k-mer

In bioinformatics, k-mers are substrings of length k contained within a biological sequence. In comparative genomics, k-mers found in a genome sequence, and their frequency, can be used to compute genetic distances between isolates. The use of k-mer comparisons is not limited to the core genome or a predefined set of common loci, allowing variation in the accessory genome to be quantified simultaneously.

Multilocus sequence typing

A genotyping method applied mostly to microbial strains to study population structure and epidemiology, based on comparing the nucleotide sequences of a small number (typically seven) of housekeeping protein-coding genes. In multilocus sequence typing (MLST), allele numbers are assigned to each sequence variant (allele) of a given gene. The MLST genotype of a bacterial strain is defined by the combination of the allele numbers observed at the genes that are included in the genotyping scheme. A sequence type is assigned to each unique combination of alleles, called an MLST profile. MLST was invented in 1998 and became a de-facto standard taxonomy of bacterial strains, albeit at low resolution.

Core genome multilocus sequence typing

An extension of MLST that analyzes sequence variation across hundreds to thousands of conserved (core) genes, shared by all strains of a species, providing higher resolution typing for genomic epidemiology and evolutionary studies. cgMLST schemes typically comprise 2,000–4,000 genes, depending on the genome size and genetic variation (in terms of presence/absence of genes) within bacterial species. A core genome sequence type can be assigned to unique cgMLST profiles (i.e., a unique combination of cgMLST allelic numbers).

Average nucleotide identity

A measure of genomic similarity between two organisms, calculated as the average percentage of nucleotide similarity of orthologous genomic regions; commonly used to assess species-level relatedness in prokaryotes.

Bacterial strain taxonomies

Several bacterial strain taxonomies have emerged in recent years. One of them, applied to bacterial pathogens with low amounts of genetic diversity (i.e., evolutionary recent pathogens), relies on recognizing notable branches in phylogenetic trees, defined by specific diagnostic single-nucleotide polymorphisms (SNPs) [7,8]. A similar approach, the Pango nomenclature, was successfully applied to SARS-CoV-2 variants [2]. Unfortunately, these phylogenetic approaches face challenges raised by the need to update phylogenetic trees and define novel lineages. Another classification approach, PopPunk, relies on pairwise comparisons of unaligned genomes (based on k-mers, see Box 1) to create groups within bacterial populations, and is scalable to large and diverse datasets [9]. However, among the broad range of methods developed for bacterial strain typing and group naming [10,11], MLST (see Box 1), based on the analysis of a few (typically seven) conserved loci, was established over the past two decades as the method of choice for most bacterial species [12–14]. Indeed, major strengths of MLST are its standardization, as it relies on well-defined fixed sets of genetic markers, and its ease of interpretation and portability. Classic MLST schemes form the basis of widely adopted ‘sequence type’ (ST) taxonomies [15] in most bacterial species, which are maintained, expanded, and made available to the international community through the platforms BIGSdb [16] and EnteroBase [17].

The logical extension of MLST at the genome scale, known as core genome MLST (cgMLST, see Box 1), uses thousands of conserved gene loci, leading to the definition of core genome sequence types, or cgST [4,18]. However, because of their much higher resolution, any cgST will match with only a tiny fraction of bacterial isolates from a given bacterial species, making the cgST a less useful nomenclatural element than the classic ST for tracing genetic relationships through broad space and time scales. To define phylogenetic associations among similar cgSTs, which together might represent meaningful groups of particular medical or epidemiological interest (a common way of conceptualizing the informal notion of a ‘strain’), cgMLST allele profiles can be grouped at chosen levels of dissimilarity, resulting in multilevel classifications. A single-linkage clustering was initially used to create these higher-level groups from cgMLST data [19], but by design, this approach suffers from a lack of stability, as preexisting groups can merge when intermediate genotypes are sampled. To address this issue, profiles can be assigned to the most closely related preexisting group. This approach, called Hierarchical clustering (HierCC), represents the first multilevel bacterial strain taxonomy system based on cgMLST [20]. HierCC is implemented in the platform EnteroBase, where taxonomies for strains of Salmonella, Escherichia coli, and other important bacterial pathogens are maintained [17,21].

Amongst the various efforts to align the taxonomy of all life forms with genomic data, a novel multilevel classification system was proposed by Vinatzer and colleagues, using multi-position numerical codes attributed to each individual genome [22,23]. These codes, called Life Identification Numbers (LIN), were designed to encompass all domains of life in a single, unified taxonomy, based on the Average Nucleotide Identity (ANI) metric (Box 1) [24]. A database of these ANI-based LIN codes, GenomeRxiv (initially called LINbase), was set up to enable global development and use of this taxonomic approach [25].

Given that the pairwise ANI estimates between genomes can be imprecise for nearly identical strains, particularly when draft genomes are significantly fragmented, some of us proposed combining LIN codes with the cgMLST approach in order to design novel taxonomies of bacterial strains within species [26]. We found that the use of pairwise dissimilarities between cgMLST profiles, rather than ANI estimates between genomes, provides greater reproducibility in appraising small-scale genome relationships. Hence, cgMLST-based LIN codes (hereafter, LIN codes for short) combine the strengths of both approaches.

Below, we present this LIN code approach and its recent improvements, including its implementation in the widely used genotyping platforms BIGSdb and Pathogenwatch [16,27], and a LIN code nicknaming procedure to facilitate the designation of familiar intra-species groups of key importance in either biological research or epidemiological surveillance. We also illustrate how the LIN codes can be used to address questions in population biology and genomic epidemiology, using the case of the Klebsiella pneumoniae Species Complex (KpSC), a phenotypically and genetically diverse ubiquitous pathogenic group [28].

cgMLST-based LIN coding and missing data handling

LIN codes are series of numeric codes that reflect genomic similarity between organisms. cgMLST-based LIN code systems consist of multiple (e.g., 10) predefined positions (or bins), each corresponding to a range of pairwise cgMLST profile similarity values, together representing a partition of the complete range (0%–100%). From left to right, the positions of the code correspond to decreasing allele mismatch dissimilarity (i.e., increasing similarity). The leftmost bins are thus used to classify deep phylogenetic divisions, whereas the rightmost bins distinguish recently evolved variants. Analogous to the classification levels of Linnean taxonomy (e.g., Phylum–Class–Order–Family–Genus–species), the LIN codes thus capture from left to right, the membership of a genome to taxonomic groups of increasing relatedness.

The process of LIN code assignment from cgMLST data, first proposed in Hennart and colleagues [26], is summarized in Fig 1 (and formalized in Section A in S1 Appendix). A LIN code is created for each distinct cgST. The system is initialized by creating, for a selected initial cgST, a LIN code with the integer value 0 at every bin. The initial cgST can be chosen randomly or using a reference strain of the species or group under consideration (e.g., the first strain that was sequenced, or the taxonomic type strain). The next steps are the same for each subsequent individual cgST. Each incoming cgST is matched against all already LIN-encoded cgSTs in order to identify its most similar one (hereafter, the reference cgST), based on the fraction of allele mismatches across cgMLST loci. For creating the novel LIN code, the pivot bin is defined as the bin in which the observed allele similarity falls, and the novel LIN code is then created in three steps (Fig 1): copying the LIN code prefix of the reference cgST (i.e., from the leftmost bin up to the pivot bin (excluded)); incrementing by 1 the maximum integer value observed in the pivot bin among the cgST(s) sharing the same prefix used in the previous step; and attributing the integer value 0 at the bins downstream of the pivot, corresponding to initialization of the novel subdivision created at the pivot bin level.

Fig 1. Overview of the process of cgMLST-based LIN code assignment.

A. LIN encoding process. The process starts with assigning core genome multilocus sequence typing (cgMLST) profiles to genome sequences and then classifying profiles into unique core genome sequence types (cgST). For each novel cgST, the closest matching cgST in the LIN code database is then identified (with similarity s = fraction of matching alleles) and its similarity recorded to define the pivot bin (the bin which contains s within its left and right threshold values; if the similarity is 100%, the LIN code is simply assigned to the query cgST, but no novel LIN code is created). When the similarity is <100%, a novel LIN code is created using the following steps: (i) copy the closest match prefix up to the pivot bin; (ii) increment by 1 in the pivot bin, the largest value observed among LIN codes with the same prefix; and (iii) assign 0 in downstream bins. Note that one genome and its associated cgST is selected to initiate the process, with 0 assigned to each bin. B. Examples of novel LIN code creation. The similarity threshold values given in the header line correspond to those defined for the Klebsiella pneumoniae Species Complex (KpSC) LIN code scheme (see correspondence with minimal allelic mismatches and other information in S1 Appendix). Note that there is no bin corresponding to complete similarity (gray column on the right), as in this case the LIN codes are identical (i.e., there is no need to create a novel LIN code). Technically, each bin has a left border threshold (inclusive) that corresponds to a maximum number of pairwise allele differences between profiles, and is delimited on the right by the next threshold (exclusive, as the threshold value corresponds to the left threshold of the downstream bin or, for the last bin, corresponds to LIN code identity). The first row (Genome A) corresponds to the unique initialization step (full-0 code for the initial cgST). Note that similarity values defining the bins each correspond to fixed numbers of shared alleles among cgMLST profiles, divided by the length of the cgMLST scheme.

https://doi.org/10.1371/journal.pbio.3003781.g001

The cgSTs differ from classic STs in an important way: due to the often-fragmented nature of genome sequence assemblies and because many core genes are dispensable in bacteria, the cgST must accommodate missing data. If two cgSTs (new and reference) have a 100% similarity (i.e., no allele mismatch among the loci called in both profiles), the LIN code of the reference is simply assigned to the new cgST. This can happen when the new cgST differs from the reference only by its missing data pattern, such pairs (or groups) being called coincident cgSTs (see Section B in S1 Appendix). Consequently, a single LIN code can correspond to multiple coincident cgSTs.

An important implication of the encoding process is that LIN codes are created definitively, as are the assignments for LIN codes to individual cgSTs. Hence, LIN codes are stable by design, and the incorporation of novel genomes will never affect preexisting LIN codes and their assignments. This provides trust in LIN code referencing and stability in comparisons across time.

The internal structure of LIN codes and the notion of a prefix

A LIN code prefix can be defined as any bin subset that starts from the leftmost position. An important particularity of LIN codes is that the numerical identifiers at a given bin position (except the leftmost one) can only be interpreted in the context of the LIN code prefix preceding it: the same integer value at a given bin position corresponds to group membership only if the upstream prefixes are identical. In other words, the integer values at a given bin position are subdivisions of their respective upstream prefixes, and their numbering starts from zero independently for each prefix. This minimizes the maximal value used in each bin making them easy to read. This property can be regarded as a systematization of the analogous possibility in the Linnean nomenclature, where the same species epithet can be used for distinct genera (e.g., Klebsiella pneumoniae and Streptococcus pneumoniae). The initialization at zero for prefix subdivisions contrasts with other taxonomic systems, such as the HierCC approach used in the genomic epidemiology platform EnteroBase [17,21], in which a group identifier is created independently at each level (see Section C in S1 Appendix, for details).

The notion of a shared LIN code prefix is also important because it conveys a sense of genetic similarity among genomes: the longer the common prefix of two LIN codes is, the more similar the two corresponding genomes (based on their cgMLST profiles). For a given cgST profile, its LIN code thus expresses how similar it is to every other genome in the LIN code taxonomy. Very different profiles will show identity at few or no prefix positions of their LIN codes, whereas nearly identical genomes will yield LIN codes identical at most or all prefix positions (see, for example, genomes Z versus X in Fig 1: the shared prefix 0_2_0_0_0_0 implies a minimum similarity of 98.88% (inclusive) and a maximum similarity of 99.36% (exclusive)). We note that our definition of a LIN code prefix is similar to the LINgroup concept proposed by Vinatzer and colleagues [23].

Nicknaming LIN code prefixes provides continuity with previous nomenclatures

Whereas LIN code prefixes themselves can serve as machine-readable ‘diagnostic’ markers of groups of interest, they are not very easy to remember or pronounce by humans. It was therefore proposed to nickname relevant LIN code prefixes with simple denominations [23]. A prefix nicknaming system was also implemented within the BIGSdb platform for cgMLST-based LIN codes. It is therefore possible to nickname every distinct prefix in any chosen way. For example, one option is to increment an integer identifier (analogous to the numbering of STs in the MLST framework) for each novel prefix of a given length; but alternative labeling could be applied, such as Greek letters, astronomical objects, or any other series of words that may be universally understandable and easy to remember. This nicknaming process would be particularly useful for long prefixes, or prefixes of phenotypic or taxonomic relevance that subdivide the population at particularly informative levels.

For bacterial species with established nomenclatures, nicknames can be assigned to LIN code prefixes based on prior denominations such as MLST or serotyping, to retain interpretation and recognition as much as possible. To enable backward compatibility of LIN codes with well-established ST identifiers, a majority identifier inheritance rule was developed [26]. For example, in the KpSC LIN code system, prefixes are nicknamed using ST identifiers as a source (the process is formally defined in Section H in S1 Appendix). For convenience, groups of KpSC genomes with the same prefix of length 3 or 4 were designated as sublineages (SL) or clonal groups (CG), respectively (see Fig 2). The prefixes corresponding to these two levels were nicknamed because they correspond to deep subdivisions of the KpSC population structure, and their partitions (i.e., single prefixes) are highly concordant with well-known MLST-based STs (Section K in S1 Appendix).

Fig 2. Nicknaming of LIN code prefixes enables inheritance of previous nomenclatures.

Nicknames of some Klebsiella pneumoniae Species Complex (KpSC) LIN code prefixes of lengths 2 to 4 bins, inherited from Linnaean taxonomy (2-bin prefix, left panel) or 7-gene multilocus sequence typing (prefixes of lengths 3 and 4 bins, central and right panels, respectively). Groups corresponding to level 3 are called sublineages (SL) and groups of level 4 are called clonal groups (CG).

https://doi.org/10.1371/journal.pbio.3003781.g002

Although KpSC ST identifiers and SL/CG nicknames are generally identical, some ST numbers are shared by phylogenetically distinct genomes that can result from recombination events leading to the same combination of the seven alleles (see Section K in S1 Appendix). Therefore, prioritization of LIN code nicknames over classical STs is recommended in the future.

How to design and use LIN codes

An initial requirement for any cgMLST-based LIN code taxonomy implementation is a cgMLST scheme that has been designed and thoroughly validated to ensure the stability of the taxonomy from its inception. Subsequent removal of loci (e.g., because they are too infrequently called) or changing of their template (e.g., choosing an upstream start codon) would result in potential inconsistencies with previously defined LIN codes. Therefore, it is advisable to define LIN code taxonomies following careful evaluation of the cgMLST schemes from which they are derived, particularly if these are intended for broad usage.

The scope of applicability of the cgMLST scheme is also an important consideration. MLST or cgMLST schemes are typically used for a single species, and less frequently for an entire genus (e.g., more than 90% ANI). In rare cases, an intermediate category called a species complex is covered: these correspond to groups of closely related species that are sometimes misidentified in routine microbiology diagnostic processes. The applicability of the cgMLST scheme (and related LIN code taxonomy) should ideally be broadened, but increasing the phylogenetic breadth will be at the expense of the core genome size, reducing the discriminatory power of the scheme.

While any number of bins (up to the number of loci in the cgMLST scheme) can be chosen to create a LIN code system, it is recommended to guide their definition by analyzing the population structure of the species in order to propose phylogenetically informative bin thresholds. Several methods have been designed to find optimal ranges of dissimilarities that optimize the reliability of the subsequent classifications [20,26,29].

Deep levels might also correspond to previously recognized subdivisions and might therefore be optimized to match these previous classifications. For example, in the KpSC scheme, the distinction of its phylogroups (taxonomic species or subspecies) was used as a guide to define the two deepest thresholds [26], as detailed in Section G in S1 Appendix.

For epidemiological levels, bin thresholds can be selected to reflect epidemiological surveillance practice. In a hypothetical example, four different alleles might be typically used to define clusters and trigger outbreak investigations; in this case, using a LIN code bin associated with a threshold of four allele differences would be congruent with this practice. Broader epidemiological thresholds might yield more false positives (sporadic isolates unrelated to a given common source outbreak) in detecting genetic clusters, but will on the other hand capture outbreaks during which more diversity has accumulated. It is therefore advised to use a set of epidemiological thresholds which will be useful in different situations, from the most stringent (i.e., one single allelic difference) to more relaxed ones.

Visualization tools for LIN codes as proxies of population diversity

Once LIN codes are implemented and genomes encoded, the repertoire of LIN codes can be used to derive summary views of the species diversity and its structure. The complete and up-to-date LIN code nomenclature (comprising alleles, profiles, cgSTs, and LIN codes) can be extracted from BIGSdb using a single query; for example, for the KpSC, the LIN code taxonomy is available here. LIN code diversity within a bacterial species can be summarized using circular packing plots, which illustrate the diversity of populations at each classification level (Fig 3). These representations also convey a sense of relative frequencies of the variants and enable the identification of the most epidemiologically represented populations.

Fig 3. The hierarchical nature of LIN code positions applied to KpSC.

The hierarchical structure of LIN codes is shown via a circular packing plot (data from the BIGSdb-Pasteur Klebsiella pneumoniae Species Complex (KpSC) database). The circles correspond to LIN code prefixes of lengths 1 to 4 (an extra, all-encompassing circle corresponds to the entire KpSC); the size of each circle is related to the number of genomes it comprises. Numbering starts from 0 for subdividing each higher-level partition, characterized by a unique LIN code prefix. The first two bins in the LIN codes are used to identify Linnaean taxa. Whereas, for three species, there is a unique 2-bin prefix (e.g., prefix 0_0 for K. pneumoniae; Kpn, 3_0 for K. quasivariicola; Kqv, and 4_0 for K. africana; Kafr), in the other cases, two subspecies are distinguished (2_0 for K. quasipneumoniae subsp. quasipneumoniae; Kqq, and 2_1 for K. quasipneumoniae subsp. similipneumoniae; Kqs; 1_0 for K. variicola subsp. variicola; Kvv, and 1_1 for K. variicola subsp. tropica; Kvt). The hierarchical nature of LIN codes applies to subsequent levels such as those corresponding to sublineages (third bin, e.g., Kpn SL258 is identified with the LIN code prefix 0_0_105) and to clonal groups (fourth bin, e.g., the LIN code prefix 0_0_105_6 corresponds to Kpn CG258). Data was plotted using ggplot2 (R v4.3.2) and edited using Inkscape.

https://doi.org/10.1371/journal.pbio.3003781.g003

The nested structure of a set of LIN codes can also be visualized by representing the associated prefix tree, which roughly approximates the phylogenetic relationships among isolates [26] (see Section F in S1 Appendix). In such prefix trees, each internal node corresponds to a distinct LIN code prefix, where each node’s height corresponds to the associated bin threshold. This tree topology can be built without the initial genomes or cgMLST profiles, based solely on the relationships encoded in the nomenclature itself, and can serve as a computationally light proxy for the phylogenetic relationships among isolates.

LIN codes in practice: Source databases of taxonomies and their use with external tools

A taxonomic system needs to be created and updated in a coordinated manner. The cgMLST LIN code strain taxonomy approach was first implemented in BIGSdb, since v1.34.0 [26]. This open-source application is so far the only platform that has implemented cgMLST-based LIN codes and is deployed at two main sites, PubMLST (at Oxford University, Oxford, United Kingdom) and BIGSdb-Pasteur (at Institut Pasteur, Paris, France). For the KpSC, the BIGSdb-Pasteur database serves as the source database for the definitions of alleles, cgMLST profiles, cgSTs, and LIN codes. For other pathogens, the PubMLST platform is the source database of LIN code taxonomies.

A LIN code taxonomy is created with reference to a defined indexed scheme (i.e., a scheme with a unique identifier for each profile, e.g., cgST), with allele mismatch thresholds that define the LIN code bins. Fig A and Section G in S1 Appendix give details for the KpSC example of a LIN code taxonomy. Compared to the stand-alone tools initially used to create cgMLST-based LIN code taxonomies [26], the implementation of LIN codes into the BIGSdb application has been accompanied by a number of important improvements, including: ensuring the reproducibility of LIN encoding by addressing the dependency of this approach to rounded genetic distance values (Section D in S1 Appendix); implementing input order rules for creating novel LIN codes (Section E in S1 Appendix); and implementing formal rules for handling missing data (as described above; Section B in S1 Appendix). These improvements were introduced to achieve the robustness needed for a reference taxonomy. Furthermore, functionality was developed in BIGSdb for searching database isolates based on LIN code or prefixes (Section I in S1 Appendix), and for downloading corresponding genomes (Section N in S1 Appendix).

To make the LIN code taxonomy broadly accessible, its components (alleles, profiles, cgSTs, LIN codes, and nicknames) can be extracted from BIGSdb source taxonomy databases using an application programming interface [30]. They can then be used with external tools and analysis platforms, for example, by extracting cgMLST alleles from local genome sequences and matching these with the source nomenclature data (Fig 4). As a first example of external use of LIN codes, we implemented LIN code matching functions within the Pathogenwatch platform, which supports KpSC genomic typing [27] and now matches genome sequences with an internal copy of the KpSC reference LIN code taxonomy (Section J in S1 Appendix).

Fig 4. The LIN code taxonomy ecosystem.

The source database of LIN code taxonomy (Sequence Definitions database, lower left dark blue box) hosts the taxonomic elements (alleles, profiles, LIN codes). Curators create taxonomic elements from data sourced directly from data submitters, from INSDC public sequence repositories (NCBI/ENA/DDBJ), or from Pathogenwatch assemblies derived from short-read data (green arrows). External tools or platforms such as Pathogenwatch can retrieve the LIN code taxonomy from BIGSdb using Application Programming Interfaces (red arrow), such that query genome sequences can be compared to the copy of the reference taxonomy in order to define their closest match (blue arrow, bottom right). The LIN code bins that can be defined are then reported (followed by asterisks for undefined ones), as well as sublineage and clonal group nickname information (if this can be extracted from the deduced LIN code). In this example, although the LIN code is incomplete, the genome can be inferred as belonging to clonal group 307 (defined as prefix 0_0_369_0). To obtain a complete LIN code, the genomic sequence (or its extracted taxonomic elements) must be submitted to the source database (blue arrow, left) so that novel taxonomic elements can be defined consistently. The DNA icon is in the public domain: https://freesvg.org/dna. THospital icon credit: Alexis Criscuolo (CC-BY 4.0 license).

https://doi.org/10.1371/journal.pbio.3003781.g004

Furthermore, stand-alone approaches to using LIN codes can be developed to generate and handle local LIN code taxonomies. For example, there are offline command-line tools for assigning cgMLST LIN codes such as MiST, which is now included in Kleborate [31,32]. When novel genome sequences are matched to the LIN code taxonomy, no identical cgMLST profile may exist at that time in the source LIN code taxonomy, implying that the cgST and complete LIN code cannot be determined. Still, the level of similarity between the query genome and the closest reference cgMLST profile enables inference of their common prefix; in other words, the LIN code of the query genome can be partially defined. If the query genome is closely related to one in the nomenclature database, its LIN code will be almost completely defined. Hence, the use of LIN codes in external databases or tools can have great functional relevance.

However, in the most general case, an incomplete match will be found. This implies that new nomenclatural elements (cgST profiles and LIN codes) have been discovered and could be defined for the benefit of the global community. This can only be done within the source database, otherwise the consistency of nomenclature will be lost. For any genome that has no complete LIN code, data submission to the source database is therefore encouraged. Furthermore, to be effective, external copies of the LIN code database need to be frequently (e.g., daily) synchronized with the primary database, given that the latter is updated continuously. Although managing local LIN code taxonomies might be attractive for confidentiality reasons and will expand the applicability of this approach, this usage is deemed to be restricted to internal comparisons and will hopefully stimulate submissions to the source database to fulfill the intended shared nomenclature objective of open, central LIN code taxonomies.

LIN code applications in epidemiological surveillance and outbreak investigations

Bacterial species can harbor huge amounts of genetic diversity and are often structured genetically into recognizable sublineages. For example, in K. pneumoniae, sublineages including SL258, SL147, SL307, SL17, and SL23 have been recognized as globally distributed drivers of multidrug resistance and/or hypervirulent infections. These sublineages have been the subject of detailed studies that have led to defining their geographical spread and phylogenetic subgroups [33–37]. However, so far, these sublineages have been defined using a mix of 7-locus MLST, cgMLST, and ANI; a clear and simple definition, and a harmonized nomenclature, have been lacking, making it difficult to recognize them in subsequent studies.

One prominent example of how LIN codes provide clear definitions of sublineages and disambiguate MLST definitions is the case of hypervirulent sublineage ST23. WGS analyses demonstrated the polyphyletic status of ST23 [35], which conflates isolates from two distant phylogenetic branches that are appropriately separated into two LIN code sublineages (SL23: 0_0_429 and SL218: 0_0_115; Section K in S1 Appendix). Beyond the case of ST23, multiple distinct sublineages are conflated into single STs, but can be appropriately recognized by their distinctive LIN codes (Table A and Section K in S1 Appendix).

LIN codes can also help track dissemination at fine genetic scales within sublineages; what follows is an example using SL258, a major K. pneumoniae carbapenemase (KPC)-producing sublineage. SL258 is defined by the LIN code prefix 0_0_105 and encompasses all isolates from 7-gene ST11, ST258, ST340, ST512, and some others (see Fig 2). Its phylogenetic structure is depicted in Fig 5 (see Section L in S1 Appendix for methodological details) and shows that SL258 is divided into several subclades. These include CG258 (0_0_105_6), which contains all ST258 and ST512 isolates. LIN code bin 5 can further be used to distinguish major subclades within SL258, including those corresponding to ST340 (0_0_105_0_11), ST437 (0_0_105_1_1, and other subclades within ST11, some of which seem to be associated with recombination events that include the capsule (K) locus (KL column in Fig 5).

Fig 5. SL258 phylogenetic structure and LIN codes.

Maximum likelihood phylogenetic tree of SL258 genomes inferred from a recombination-free variable site alignment (Section L in S1 Appendix). Tips are colored to indicate geographic regions of origin as per the legend (United Nations region classifications; N/E/W/S are short for North/East/West/South). The distribution of 7-gene multilocus sequence types (STs), K-loci (KL), bla_KPC (K. pneumoniae carbapenemase (KPC) variants), aerobactin locus lineages (iuc), and LIN code prefixes of sizes 4 and 5, are indicated by colored blocks as labeled (note that colors are independent to each column; for readability the labels for rare groups are omitted). Only K-loci identified with a Kaptive v2 confidence score of ‘Good’ or better are shown (otherwise marked ‘unknown’). Phylogenetic clades described in the text are colored and labeled accordingly.

https://doi.org/10.1371/journal.pbio.3003781.g005

LIN codes can help distinguish between different subclades that are associated with the same ST and capsule locus, a combination often used to describe specific subclones. For example, LIN codes clearly distinguish three phylogenetically distinct subclades that are all ST11-KL64 (gray shading on the tree branches, Fig 5). One of these is the major lineage circulating in China (0_0_105_2_0_0_2, predominantly 0_0_105_2_0_0_2_17) that carries KPC-2 and often the iuc aerobactin virulence locus, which we use here as a marker for the K. pneumoniae virulence plasmid, as discussed broadly [38,39]. A second, unrelated ST11-KL64 subclade (0_0_105_0_0) is circulating in South America encoding KPC-2, but rarely iuc [40], while a third smaller clade (0_0_105_0_2) is detected primarily in Taiwan [41] rather than in mainland China (lacking KPC and with only one of eight genomes carrying iuc). These distinct clades are all referred to in the literature as ST11-KL64, despite representing phylogenetically distinct and likely unrelated, independently evolved lineages. This example shows how LIN code classification beneath the sublineage level can help recognize and name subgroups of medical and epidemiological relevance, which should be subject to enhanced surveillance.

Identifying outbreak strains and tracking strain diversification during outbreaks are key objectives of genomic epidemiology, as they provide capacity to quickly respond to outbreaks and prevent further infections. LIN codes could be used to detect outbreaks: when genomic sequencing is applied prospectively in surveillance programs, finding identical or nearly identical LIN codes would lead to a strong suspicion of transmission, and such strain groups could be flagged for more in-depth epidemiological analyses. Further, the multi-level nature of LIN codes provides convenient flexibility for ‘outbreak strain’ definition and thus for epidemiological investigations that include risk factors analyses (e.g., to associate infection with the outbreak strain with source exposure). Complementary ad-hoc genomic analyses using, for example, whole-genome SNPs and pangenome analysis, can be performed to further characterize the outbreak isolates. Importantly, the availability of a public cgMLST-based LIN code system simplifies communication about outbreak strains between laboratories and public health actors, making it straightforward to identify whether an outbreak strain at one hospital is closely related to an outbreak identified at another hospital based on similarity of LIN codes.

LIN codes can even subdivide isolates from single long-term outbreaks. Clades that diversify during an outbreak can be captured and labeled unambiguously with LIN codes (Section M in S1 Appendix), as shown using an Italian outbreak of SL147, a prominent multidrug-resistant international sublineage of K. pneumoniae [42]. LIN codes were also recently used to label isolates from eight regional K. pneumoniae outbreaks that occurred within Poland (including two caused by SL258 and SL147), which had initially been loosely defined based on SNPs, O and K serotypes, and bla_VIM carbapenemase-carrying integrons [43].

Future directions and conclusions

Facilitating communication on the intra-species diversity of microbial strains is a key objective of strain taxonomies, which entail classification and naming of groups within species. In the field of epidemiological surveillance of bacterial pathogens, it has long been recognized that strain typing methods used for long-term and global strain tracking should be reproducible enough to enable internationally standardized nomenclatures, or ‘library typing systems’ [44]. LIN codes based on cgMLST benefit from the high standardization and reproducibility of the cgMLST approach and provide a flexible and robust way to classify, name, and identify subpopulations within bacterial species. The recognition of subpopulations associated with distinct phenotypes is an important raison d’être of taxonomies, and multilevel LIN codes strain taxonomies will advance our understanding of the links between genotypes and clinical phenotypes, vaccine coverage, and antimicrobial resistance.

Given the reproducibility of cgMLST, an outbreak strain may be recognized by different investigators based solely on its LIN code prefix. As LIN code prefixes are sufficient to define strain identity across countries or sectors, LIN codes provide a simple yet accurate solution for cross-border or other collaborative genomic surveillance investigations, without the need to share genomic sequences themselves (Fig 6). This possibility brought by shared strain taxonomies can alleviate issues around data confidentiality and sharing agreements, which are often an important barrier in genomic surveillance and rapid response to outbreaks under investigation by multiple institutional actors. Likewise, for the surveillance of particularly concerning strains, early warnings could be triggered based on the detection of the specific LIN codes of the targeted strains. LIN codes may thus become an integral part of epidemiological surveillance practice.

Fig 6. Two models of multicentric genomic epidemiology.

A. The current model of sequence data sharing, which necessitates the sharing of sequence data from distinct institutions for their central analysis and direct comparison, which may show that a single strain is present in both institutions. B. The shared strain taxonomy model, where the use of a common nomenclature enables local analysis of sequence data (enabling confidentiality) and subsequent communication on nomenclatural subtypes. The recognition of identical strains can thus be achieved without having to share sequence data. Hospital, dialog and database icons credits: Alexis Criscuolo (CC-BY v4.0) https://commons.wikimedia.org/wiki/File:Hospital_Icon.png.

https://doi.org/10.1371/journal.pbio.3003781.g006

However, LIN codes have several limitations. As a classification system, they rely on comparing genomes with a central database, implying the need for maintenance of this resource, regular updates to define the novel taxonomic elements, and accessibility of the nomenclatural source database. As for other databases used for genomic epidemiology of pathogens, sustainability, security, and governance of such resources are important issues that need to be addressed [45]. As a genotyping and outbreak detection tool, LIN codes are limited by the discriminatory power of the cgMLST scheme they are based upon. Hence, they do not provide ultimate resolution, which can be provided by complementary approaches such as whole-genome SNPs. Furthermore, cgMLST LIN codes are limited by the breadth of applicability of the cgMLST approach, which is typically limited to single species or groups of closely related ones. Fortunately, the rapid pace of cgMLST developments to multiple novel bacterial species expands the possibilities of LIN code implementation.

In general, LIN codes represent a widely applicable strain taxonomy system, as illustrated by the rapid pace of developments of LIN code implementations to bacterial pathogens. Following the initial use case for the KpSC, cgMLST LIN codes taxonomies have been introduced for S. pneumoniae [46], Staphylococcus aureus [47], Moraxella catarrhalis [48], Neisseria gonorrhoeae [49], and Corynebacterium diphtheriae [50]. In addition, developments are ongoing for Neisseria meningitidis, Campylobacter jejuni, and Streptococcus pyogenes. These LIN code taxonomies use different numbers of bins and thresholds adapted to the population structure of each species, illustrating the flexibility of the LIN code approach. The rapid development and adoption of LIN code taxonomies is facilitated by their integration into the BIGSdb platforms at Institut Pasteur and at Oxford University. In addition, an implementation of cgMLST LIN codes for E. coli and Salmonella enterica has just been released on the last EnteroBase update.

The applicability of the cgMLST LIN codes to most other bacterial species should be straightforward, provided they comprise sufficient genetic diversity. This requirement excludes the so-called monomorphic pathogens [51], such as Mycobacterium tuberculosis or Salmonella enterica serotype Typhi, where phylogeny-based taxonomies based on whole-genome SNPs are considered more useful given their higher resolution compared to cgMLST. The cgMLST LIN code strategy can also be extended with minor adaptations to other organisms with predominantly clonal reproduction, such as protozoan parasites and fungi, even if they are not haploid, given the existence of MLST taxonomies for, for example, Candida albicans and Trypanosoma cruzi [52,53].

The wide adoption of cgMLST LIN code strain taxonomies has the potential to result in a universal approach for standardized bacterial genotyping that could greatly enhance microbial biodiversity studies, international genomic epidemiology, and infectious disease surveillance.

Acknowledgments

We thank François Lebreton for providing the Newick file of the Italian ST147 outbreak described in [42]. We acknowledge the help of the HPC Core Facility of the Institut Pasteur for this work.

References

1. Konings F, Perkins MD, Kuhn JH, Pallen MJ, Alm EJ, Archer BN, et al. SARS-CoV-2 Variants of Interest and Concern naming scheme conducive for global discourse. Nat Microbiol. 2021;6(7):821–3. pmid:34108654
- View Article
- PubMed/NCBI
- Google Scholar
2. Rambaut A, Holmes EC, O’Toole Á, Hill V, McCrone JT, Ruis C, et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat Microbiol. 2020;5(11):1403–7. pmid:32669681
- View Article
- PubMed/NCBI
- Google Scholar
3. Prokaryotes IC of N of. International code of nomenclature of prokaryotes. Int J Syst Evol Microbiol 2019;69:S1–S111.
- View Article
- Google Scholar
4. Maiden MCJ, Jansen van Rensburg MJ, Bray JE, Earle SG, Ford SA, Jolley KA, et al. MLST revisited: the gene-by-gene approach to bacterial genomics. Nat Rev Microbiol. 2013;11(10):728–36. pmid:23979428
- View Article
- PubMed/NCBI
- Google Scholar
5. Nadon C, Van Walle I, Gerner-Smidt P, Campos J, Chinen I, Concepcion-Acevedo J, et al. PulseNet International: Vision for the implementation of whole genome sequencing (WGS) for global food-borne disease surveillance. Euro Surveill. 2017;22(23):30544. pmid:28662764
- View Article
- PubMed/NCBI
- Google Scholar
6. Struelens MJ, Brisse S. From molecular to genomic epidemiology: transforming surveillance and control of infectious diseases. Euro Surveill. 2013;18(4):20386. pmid:23369387
- View Article
- PubMed/NCBI
- Google Scholar
7. Coll F, McNerney R, Guerra-Assunção JA, Glynn JR, Perdigão J, Viveiros M, et al. A robust SNP barcode for typing Mycobacterium tuberculosis complex strains. Nat Commun. 2014;5:4812. pmid:25176035
- View Article
- PubMed/NCBI
- Google Scholar
8. Wong VK, Baker S, Connor TR, Pickard D, Page AJ, Dave J, et al. An extended genotyping framework for Salmonella enterica serovar Typhi, the cause of human typhoid. Nat Commun. 2016;7:12827. pmid:27703135
- View Article
- PubMed/NCBI
- Google Scholar
9. Lees JA, Harris SR, Tonkin-Hill G, Gladstone RA, Lo SW, Weiser JN, et al. Fast and flexible bacterial genomic epidemiology with PopPUNK. Genome Res. 2019;29(2):304–16. pmid:30679308
- View Article
- PubMed/NCBI
- Google Scholar
10. Struelens MJ. Consensus guidelines for appropriate use and evaluation of microbial epidemiologic typing systems. Clin Microbiol Infect. 1996;2(1):2–11. pmid:11866804
- View Article
- PubMed/NCBI
- Google Scholar
11. van Belkum A, Tassios PT, Dijkshoorn L, Haeggman S, Cookson B, Fry NK, et al. Guidelines for the validation and application of typing methods for use in bacterial epidemiology. Clin Microbiol Infect. 2007;13 Suppl 3:1–46. pmid:17716294
- View Article
- PubMed/NCBI
- Google Scholar
12. Aanensen DM, Spratt BG. The multilocus sequence typing network: mlst.net. Nucleic Acids Res. 2005;33(Web Server issue):W728-33. pmid:15980573
- View Article
- PubMed/NCBI
- Google Scholar
13. Maiden MCJ. Multilocus sequence typing of bacteria. Annu Rev Microbiol. 2006;60:561–88. pmid:16774461
- View Article
- PubMed/NCBI
- Google Scholar
14. Maiden MC, Bygraves JA, Feil E, Morelli G, Russell JE, Urwin R, et al. Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci U S A. 1998;95(6):3140–5. pmid:9501229
- View Article
- PubMed/NCBI
- Google Scholar
15. Feil EJ. Small change: keeping pace with microevolution. Nat Rev Microbiol. 2004;2(6):483–95. pmid:15152204
- View Article
- PubMed/NCBI
- Google Scholar
16. Jolley KA, Bray JE, Maiden MCJ. Open-access bacterial population genomics: BIGSdb software, the PubMLST.org website and their applications. Wellcome Open Res. 2018;3:124.
- View Article
- Google Scholar
17. Dyer NP, Päuker B, Baxter L, Gupta A, Bunk B, Overmann J, et al. EnteroBase in 2025: exploring the genomic epidemiology of bacterial pathogens. Nucleic Acids Res. 2025;53(D1):D757–62. pmid:39441072
- View Article
- PubMed/NCBI
- Google Scholar
18. Bialek-Davenet S, Criscuolo A, Ailloud F, Passet V, Jones L, Delannoy-Vieillard A-S, et al. Genomic definition of hypervirulent and multidrug-resistant Klebsiella pneumoniae clonal groups. Emerg Infect Dis. 2014;20(11):1812–20. pmid:25341126
- View Article
- PubMed/NCBI
- Google Scholar
19. Moura A, Criscuolo A, Pouseele H, Maury MM, Leclercq A, Tarr C, et al. Whole genome-based population biology and epidemiological surveillance of Listeria monocytogenes. Nat Microbiol. 2016;2:16185. pmid:27723724
- View Article
- PubMed/NCBI
- Google Scholar
20. Zhou Z, Charlesworth J, Achtman M. HierCC: a multi-level clustering scheme for population assignments based on core genome MLST. Bioinformatics. 2021;37(20):3645–6. pmid:33823553
- View Article
- PubMed/NCBI
- Google Scholar
21. Achtman M, Zhou Z, Charlesworth J, Baxter L. EnteroBase: hierarchical clustering of 100 000s of bacterial genomes into species/subspecies and populations. Philos Trans R Soc Lond B Biol Sci. 2022;377(1861):20210240. pmid:35989609
- View Article
- PubMed/NCBI
- Google Scholar
22. Marakeby H, Badr E, Torkey H, Song Y, Leman S, Monteil CL, et al. A system to automatically classify and name any individual genome-sequenced organism independently of current biological classification and nomenclature. PLoS One. 2014;9(2):e89142. pmid:24586551
- View Article
- PubMed/NCBI
- Google Scholar
23. Vinatzer BA, Tian L, Heath LS. A proposal for a portal to make earth’s microbial diversity easily accessible and searchable. Antonie Van Leeuwenhoek. 2017;110(10):1271–9. pmid:28281028
- View Article
- PubMed/NCBI
- Google Scholar
24. Goris J, Konstantinidis KT, Klappenbach JA, Coenye T, Vandamme P, Tiedje JM. DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. Int J Syst Evol Microbiol. 2007;57(Pt 1):81–91. pmid:17220447
- View Article
- PubMed/NCBI
- Google Scholar
25. Tian L, Huang C, Mazloom R, Heath LS, Vinatzer BA. LINbase: a web server for genome-based identification of prokaryotes as members of crowdsourced taxa. Nucleic Acids Res. 2020;48(W1):W529–37. pmid:32232369
- View Article
- PubMed/NCBI
- Google Scholar
26. Hennart M, Guglielmini J, Bridel S, Maiden MCJ, Jolley KA, Criscuolo A, et al. A dual barcoding approach to bacterial strain nomenclature: genomic taxonomy of Klebsiella pneumoniae strains. Mol Biol Evol. 2022;39(7):msac135. pmid:35700230
- View Article
- PubMed/NCBI
- Google Scholar
27. Argimón S, David S, Underwood A, Abrudan M, Wheeler NE, Kekre M, et al. Rapid genomic characterization and global surveillance of Klebsiella using pathogenwatch. Clin Infect Dis. 2021;73(Suppl_4):S325–35. pmid:34850838
- View Article
- PubMed/NCBI
- Google Scholar
28. Wyres KL, Lam MMC, Holt KE. Population genomics of Klebsiella pneumoniae. Nat Rev Microbiol. 2020;18(6):344–59. pmid:32055025
- View Article
- PubMed/NCBI
- Google Scholar
29. Mixão V, Pinto M, Brendebach H, Sobral D, Dourado Santos J, Radomski N, et al. Multi-country and intersectoral assessment of cluster congruence between pipelines for genomics surveillance of foodborne pathogens. Nat Commun. 2025;16(1):3961. pmid:40295532
- View Article
- PubMed/NCBI
- Google Scholar
30. Jolley KA, Bray JE, Maiden MCJ. A RESTful application programming interface for the PubMLST molecular typing and genome databases. Database (Oxford). 2017;2017:bax060. pmid:29220452
- View Article
- PubMed/NCBI
- Google Scholar
31. Lam MMC, Wick RR, Watts SC, Cerdeira LT, Wyres KL, Holt KE. A genomic surveillance framework and genotyping tool for Klebsiella pneumoniae and its related species complex. Nat Commun. 2021;12(1):4188. pmid:34234121
- View Article
- PubMed/NCBI
- Google Scholar
32. Bogaerts B, Roosens NHC, Vanneste K. MiST: rapid, accurate and flexible (core-genome) multi-locus sequence typing (MLST) allele calling from draft genomes. BMC Genomics. 2025;26(1):1106. pmid:41257567
- View Article
- PubMed/NCBI
- Google Scholar
33. Deleo FR, Chen L, Porcella SF, Martens CA, Kobayashi SD, Porter AR, et al. Molecular dissection of the evolution of carbapenem-resistant multilocus sequence type 258 Klebsiella pneumoniae. Proc Natl Acad Sci U S A. 2014;111(13):4988–93. pmid:24639510
- View Article
- PubMed/NCBI
- Google Scholar
34. Hetland MAK, Hawkey J, Bernhoff E, Bakksjø R-J, Kaspersen H, Rettedal SI, et al. Within-patient and global evolutionary dynamics of Klebsiella pneumoniae ST17. Microb Genom. 2023;9(5):mgen001005. pmid:37200066
- View Article
- PubMed/NCBI
- Google Scholar
35. Lam MMC, Wyres KL, Duchêne S, Wick RR, Judd LM, Gan Y-H, et al. Population genomics of hypervirulent Klebsiella pneumoniae clonal-group 23 reveals early emergence and rapid global dissemination. Nat Commun. 2018;9(1):2703. pmid:30006589
- View Article
- PubMed/NCBI
- Google Scholar
36. Rodrigues C, Desai S, Passet V, Gajjar D, Brisse S. Genomic evolution of the globally disseminated multidrug-resistant Klebsiella pneumoniae clonal group 147. Microb Genom. 2022;8(1):000737. pmid:35019836
- View Article
- PubMed/NCBI
- Google Scholar
37. Wyres KL, Hawkey J, Hetland MAK, Fostervold A, Wick RR, Judd LM, et al. Emergence and rapid global dissemination of CTX-M-15-associated Klebsiella pneumoniae strain ST307. J Antimicrob Chemother. 2019;74(3):577–81. pmid:30517666
- View Article
- PubMed/NCBI
- Google Scholar
38. Wang J, Feng Y, Zong Z. The Origins of ST11 KL64 Klebsiella pneumoniae: a Genome-Based Study. Microbiol Spectr. 2023;11(2):e0416522. pmid:36971550
- View Article
- PubMed/NCBI
- Google Scholar
39. Zhou K, Xiao T, David S, Wang Q, Zhou Y, Guo L, et al. Novel subclone of carbapenem-resistant Klebsiella pneumoniae sequence type 11 with enhanced virulence and transmissibility, China. Emerg Infect Dis. 2020;26(2):289–97. pmid:31961299
- View Article
- PubMed/NCBI
- Google Scholar
40. Barroso M do V, da Silva CR, Benfatti LR, Gozi KS, de Andrade LK, Andrade LN, et al. Characterization of KPC-2-producing Klebsiella pneumoniae and affected patients of a pediatric hospital in Brazil. Diagn Microbiol Infect Dis. 2023;106(2):115932. pmid:37023592
- View Article
- PubMed/NCBI
- Google Scholar
41. Li Y-T, Wang Y-C, Chen C-M, Tang H-L, Chen B-H, Teng R-H, et al. Distinct evolution of ST11 KL64 Klebsiella pneumoniae in Taiwan. Front Microbiol. 2023;14:1291540. pmid:38143864
- View Article
- PubMed/NCBI
- Google Scholar
42. Martin MJ, Corey BW, Sannio F, Hall LR, MacDonald U, Jones BT, et al. Anatomy of an extensively drug-resistant Klebsiella pneumoniae outbreak in Tuscany, Italy. Proc Natl Acad Sci U S A. 2021;118(48):e2110227118. pmid:34819373
- View Article
- PubMed/NCBI
- Google Scholar
43. Biedrzycka M, Urbanowicz P, Guzek A, Brisse S, Gniadkowski M, Izdebski R. Dissemination of Klebsiella pneumoniae ST147 NDM-1 in Poland, 2015-19. J Antimicrob Chemother. 2021.
- View Article
- Google Scholar
44. Struelens MJ, De Gheldre Y, Deplano A. Comparative and library epidemiological typing systems: outbreak investigations versus surveillance systems. Infect Control Hosp Epidemiol. 1998;19(8):565–9. pmid:9758056
- View Article
- PubMed/NCBI
- Google Scholar
45. WHO. Attributes and principles of genomic data-sharing platforms supporting surveillance of pathogens with epidemic and pandemic potential. 2025.
46. Jansen van Rensburg MJ, Berger DJ, Yassine I, Shaw D, Fohrmann A, Bray JE, et al. Development of the Pneumococcal Genome Library, a core genome multilocus sequence typing scheme, and a taxonomic life identification number barcoding system to investigate and define pneumococcal population structure. Microb Genom. 2024;10(8):001280. pmid:39137139
- View Article
- PubMed/NCBI
- Google Scholar
47. Hadjirin NF, Yassine I, Bray JE, Maiden MCJ, Jolley KA, Brueggemann AB. Development of a core genome multilocus sequence typing scheme and life identification number code classification system for Staphylococcus aureus. Microb Genom. 2025;11(8):001486. pmid:40880171
- View Article
- PubMed/NCBI
- Google Scholar
48. Yassine I, Jolley KA, Bray JE, Jansen vanRensburg MJ, Patel F, Sheppard AE, Zar HJ, Allen V, Tow LA, Maiden MC, Nicol MP, Brueggemann AB. 2025. Understanding the population structure of Moraxella catarrhalis using core genome multilocus sequence typing (cgMLST) and a life identification number (LIN) code classification system. bioRxiv 2025.04.30.651387.
- View Article
- Google Scholar
49. Unitt A, Krisna MA, Parfitt KM, Jolley KA, Maiden MCJ, Harrison OB. Neisseria gonorrhoeae LIN codes provide a robust, multi-resolution lineage nomenclature. Elife. 2025;14:RP107758. pmid:41474525
- View Article
- PubMed/NCBI
- Google Scholar
50. Delgado-Blas JF, Rethoret-Pasty M, Brisse S. Life Identification Number (LIN) codes for the genomic taxonomy of Corynebacterium diphtheriae strains. Genome Med. 2025;18(1):5. pmid:41353550
- View Article
- PubMed/NCBI
- Google Scholar
51. Achtman M. Evolution, population structure, and phylogeography of genetically monomorphic bacterial pathogens. Annu Rev Microbiol. 2008;62:53–70. pmid:18785837
- View Article
- PubMed/NCBI
- Google Scholar
52. Bougnoux M-E, Aanensen DM, Morand S, Théraud M, Spratt BG, d’Enfert C. Multilocus sequence typing of Candida albicans: strategies, data exchange and applications. Infect Genet Evol. 2004;4(3):243–52. pmid:15450203
- View Article
- PubMed/NCBI
- Google Scholar
53. Yeo M, Mauricio IL, Messenger LA, Lewis MD, Llewellyn MS, Acosta N, et al. Multilocus sequence typing (MLST) for lineage assignment and high resolution diversity studies in Trypanosoma cruzi. PLoS Negl Trop Dis. 2011;5(6):e1049. pmid:21713026
- View Article
- PubMed/NCBI
- Google Scholar