Whole genome sequencing has opened several avenues of new research and has expanded our biological knowledge. Tens of thousands of microbial isolates have been whole genome sequenced representing over 1,500 different species. Thousands of new genomes are added each year to public databases. To harness the diagnostic potential of this genomic information it is critical that the genome information be compressed into the same format regardless of the method used to generate the sequence. Furthermore, this format must enable comparisons of a new clinical sample to the database of tens of thousands of genomes and be scalable as we add more genomes to public databases. These comparisons must be performed accurately and in a timely manner to be useful for clinical diagnostics. The lack of such a solution has been noted to be the primary reason whole genome sequencing is not being widely adopted for diagnostic purposes. The present invention advances the art and provides technology for better clinical diagnosis.