All publications herein are incorporated by reference to the same extent as if each individual publication or patent application was specifically and individually indicated to be incorporated by reference. The following description includes information that may be useful in understanding the present invention. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed invention, or that any publication specifically or implicitly referenced is prior art
Antibodies present in human specimens serve as the primary analyte and disease biomarker for a large and broad group of infectious, bacterial, viral, allergic, parasitic, and autoimmune diseases. As such, hundreds of distinct antibody detecting tests (collectively referred to as “immunoassays”, have been developed to diagnose human disease using tissue samples that include but are not limited to whole blood, serum, plasma, saliva, urine, and tissue aspirates. Immunoassays remain essential to the diagnosis of autoimmune diseases including, but not limited to, Grave's disease, Sjogren's syndrome Celiac disease, Crohn's disease, Rheumatoid arthritis. Immunoassay are also widely used to diagnosis infectious diseases including for example viral infections (e.g. HIV, Hepatitis C, HSV-1, Zika virus, Epstein Barr virus, and others), bacterial infections include for example (Streptococcus sp., Helicobacter pylori, Borrellia burdorferi (Lyme), and others), fungal infections (e.g. Valley Fever), parasitic infections (e.g., Trypanosoma cruzi, Toxoplasma gondii, Taenia solium, Toxocara canis, and others). Furthermore, Immunoassays are often used to identify and monitor allergies (e.g. peanut allergy, milk, pollen, and others. Beyond these areas, immunoassays have demonstrated utility for the diagnosis of neurodegenerative disease, cardiovascular disease, and cancers.
Methods to detect antibodies include radio immunoassay (MA), enzyme linked immunosorbant assays (ELISA), chemiluminescent assays, and protein and peptide arrays. These assay formats share in common the requirement to develop a molecular chemical reagent that binds to the analyte antibody in a sample in the majority of individuals with disease, to provide sensitivity, but not to any of the many distinct antibodies present in individuals without disease, to provide diagnostic specificity. Such reagents include antibodies, peptides, human proteins, nucleic acid aptamers, and other molecular binding entities [1, 2] [3, 4]. Such reagents are often highly optimized (Ballew J et al., PNAS, 2014) in order to achieve high sensitivity and specificity. Such optimization has been the subject of much research and development. Individual reagents, however, often possess insufficient affinity and specificity for the analytes of interest.
Present method used to develop diagnostic immunoassays limit the overall sensitivity and specificity that can be obtained from the assay, and thus the utility, because they include extraneous antigen matter (i.e., large proteins, peptides, lipids, whole cell lysates) that can result in cross-reactive binding from unrelated antibodies. For example, Lyme disease (infection with Borrelia burgdorferi) tests use whole cell lysates that contain a large number of distinct molecular compositions that are not targeted by the immune response Borrelia, but capture or detect antibodies generated in response to other infections such as infectious mononucleosis. Thus there is an unmet need for diagnostic technologies that can identify and present only those antigen components that are most specifically recognized by the immune response in individuals with a given phenotype.
Because individual reagents often do not capture or react with a sufficient number of samples from individuals with the disease (i.e. insufficient sensitivity), two or more reagents can be combined into a diagnostic test or used in parallel as an antigen panel. Nevertheless, combining sets of peptides into a single assay to increase the sensitivity of diagnosis is challenging since their non-specific binding, that limits specificity, is generally additive thereby limiting the overall diagnostic specificity of the assay. Experimental identification of the optimal combination of biochemical reagents is difficult given the combinatorial complexity of combining and weighting the antibody reactivities to each antigen in a panel [5, 6].
An important limitation associated with existing immunoassay formats is that they cannot be readily combined or aggregated together. Consequently, performing a large number of tests is additive in terms of cost and labor, thereby decrease the probability of making a correct diagnosis. For example, if an individual is bit by a tick, they may be infected with multiple tick-borne pathogens (there are more than 10 known tick-transmitted infectious agents). In many cases, physicians will only a test for Borrelia burgdorferi, even though any of 10s of other organisms may have infected that individual. Thus, there is a need for low cost multiplexed test that can diagnosis any or all of the tick-borne infections. Similarly, if a patient presents with a common symptom (e.g. fever, fatigue, headache), it can be difficult to identify which tests should be ordered to identify potential causes of the presenting symptoms. Thus, there is a need for methods and compositions that can integrate many tests into a single standardized assay, and thus simultaneously test for many different diseases or infections. The present invention provides solution to this problem.
The use of massively parallel DNA sequencing, also known as next-generation sequencing (hereafter referred to as “NGS”), high throughput sequencing, or deep sequencing, has been applied to enable the diagnosis of human diseases [7]. These collective approaches may be referred to generally as “NGS” throughout.
The prospect of analyzing entire human antibody repertoires has been a goal for at least several decades. Reported methods include human proteome arrays, phage display/immunoprecipitation (Ph-IP), peptide and peptoid arrays, and NGS analysis of antibody genes (Ig-Seq) [9][8]. One challenge associated with repertoire characterization is identifying particular peptide sequences to populate arrays limited to ˜106 fields. Hence, prior methods have used small arrays of random peptides, typically having fewer than 300,000 peptides, or peptoids unlikely to closely mimic antigens. Array based approaches are presently limited to small collections of organisms with small proteomes (e.g., viruses) [10]. For peptide arrays, their relatively low peptide sequence diversity limits their ability to find individual sequences and motifs that mimic the bona-fide antigen targeted by an antibody.
A principle advantage of the invention provided herein is that it is unbiased—that is, it does not assume which organisms are antigenic. The method claimed can identify epitopes in any organisms in the rapidly growing protein database, not just pre-specified viruses [10], allowing antigen identification within even the largest proteomes (e.g., wheat genome=17 GB). Thus, the wheat genome alone is 100-1000× larger than the combined genomes of all known human viruses.