On Wednesday, February 5th 2020, Prof. Dr. Zamin Iqbal will hold a talk as part of our “Distinguished Speaker Series” about the topic “Nucleotide resolution variant analysis in bacterial pan genomes using genome graphs”.

Abstract

When we study evolution of a species, we use different models, depending on what we want to achieve or infer. We might restrict to SNP variation in the “core genome”  (presumably inherited vertically) to study phylogeography or to study an outbreak. In reducing the problem to the analysis of SNPs (and invariant sites), it has been possible for researchers to build a range of sophisticated phylogenetic models. However once we try to incorporate genome organisation, chromosomal rearrangements, movement of plasmids, transposons or phage, then the modelling problem is far harder. The question of how to  properly model bacterial genetic variation is wide open and extremely challenging. A prerequisite for any solution to this, is a decision on how to describe the variation in the first place – you cannot model variation until you represent it. Note that this is true even if you have perfect genome assemblies: even if it were possible to multiple sequence align them, this would not really help with how to notice that a SNP at one position in one genome is “the same” as a SNP somewhere else in another. In

this talk,  I want to discuss a solution we have been developing to this representation problem. We show how it is possible to represent the pan genome of a species as a network of “floating” graphs, representing the ensemble of known variation in  orthology blocks (we use genes and intergenic regions, but  this could be done for mobile elements also). In doing so it becomes possible to discover and describe genetic variation at fine (SNP/indel) and coarse (gene order) level. I will quantify the value of this approach, and show how it works with Oxford nanopore data.