The life sciences research is undergoing a paradigm shift from a traditional laboratory (i.e., wet science) driven approach to a truly information-driven approach. A new understanding of the workings of life at the genetic and molecular levels, together with laboratory automation, likely will make the processes associated with finding new drugs, therapies, and agricultural products faster, cheaper, and more effective. As a result, a formidable volume of data is being generated by innovative technologies such as genomics, combinatorial chemistry, and high-throughput screening at an unprecedented rate.
The challenges that accompany the management of massive volumes of data may be compounded by the fact that life sciences data are often dispersed throughout the research and development (R&D) enterprise, across the public domain, and within the labs of external research partners. The data, which tends to be highly complex and constantly changing, may often be stored in multiple heterogeneous formats such as 3-D chemical structure databases, relational database tables, flat files, text stores, image repositories, web sources and other formats. This data may further reside on different hardware platforms, under different operating systems, and in different database management systems.
Life science and biotechnology experiments use large amount of information and images. Processing and communicating efficiently large amounts of information and images led to the new scientific discipline of image informatics. Life science and biotechnology experiments can be categorized as either diagnostic medical imaging or biotechnology experiment. Biotechnology applications include study in disciplines such as genomics, proteomics, pharmacogenomics and molecular imaging. Diagnostic medical imaging includes Histopathology, Cell-cycle analysis, Genetics, Magnetic resonance imaging (MRI), Digital X-ray and Computed tomography (CT). Converting large amounts of images and raw data generated in these experiments into meaningful information remains a challenge that hinders many investigators.
Genomics: Gene expression microarrays are revolutionizing the biomedical sciences. A DNA microarray consists of an orderly arrangement of DNA fragments representing the genes of an organism. Each DNA fragment representing a gene is assigned a specific location on the array, usually a glass slide, and then microscopically spotted (<1 mm) to that location. Through the use of highly accurate robotic spotters, over 30,000 spots can be placed on one slide, allowing molecular biologists to analyze virtually every gene present in a genome. A cDNA array is a different technology using the same principle; the probes in this case are larger pieces of DNA that are complementary to the genes one is interested in studying. High-throughput analysis of micro-array data requires efficient frame work and tools for analysis, storage and archiving voluminous image data. For example, assay titled “.DNA Microarrays. History and overview” by Southern EM. Methods Molecular Biology Journal, 170: 1-15, 2001 provides an insight into the evolution of DNA microarrays.
Cancer is an especially pertinent target of micro-array technology due to the well-known fact that this disease causes, and may even be caused by, changes in gene expression. Micro-arrays are used for rapid identification of the genes that are turned on and the genes that are turned off in tumor development, resulting in a much better understanding of the disease. For example, if a gene that is turned on in that particular type of cancer is discovered, it may be targeted use in cancer therapy. Today, therapies that directly target malfunctioning genes are already in use and showing exceptional results. Micro-arrays are also used for studying gene interactions including the patterns of correlated loss and increase of gene expression. Gene interactions are studied during drug design and screening. Large number of gene interactions studied during a drug discovery requires efficient frame work and tools for analysis, storage and archiving voluminous image data.
Proteomics: Proteomics is the study of the function of all expressed proteins and analysis of complete complements of proteins. Proteomics includes the identification and quantification of proteins, the determination of their localization, modifications, interactions, activities, and, ultimately, their function. In the past proteomics is used for two-dimensional (2D) gel electrophoresis for protein separation and identification. Proteomics now refers to any procedure that characterizes large sets of proteins. Rapid growth of this field is driven by several factors—genomics and its revelation of more and more new proteins; powerful protein technologies, such as newly developed mass spectrometry approaches, global [yeast] two-hybrid techniques, and spin-offs from DNA arrays. For example, Tyers M, Mann M. provides a vivid picture in article titled “. From genomics to proteomics” in Nature Journal 2003, 13; 422(6928): 193-7. Large-scale data sets for protein-protein interactions, organelle composition, protein activity patterns and protein profiles in cancer patients are generated in the past few years. Rapid analysis of these data sets requires innovative information driven framework and tools to process, analyze, and interpret prodigious amounts of data.
Tissuemicroarray (TMA) works on the similar principles of DNA microarray where large number of tissue samples are placed on a single slide and analysed for these expression of proteins. The image data generated in such cases is tremendous and require efficient software analysis tools. TMA may involve reporting protein to be detected by IHC, immunofluorescence, luminescence, absorbance, and reflection detection.
Pharmacogenomics: There is great heterogeneity in the way humans respond to medications, often requiring empirical strategies to find the appropriate drug therapy for each patient. There has been great progress in understanding the molecular basis of drug action and in elucidating genetic determinants of disease pathogenesis and drug response. Pharmacogenomics is the field of investigation that aims to elucidate the inherited nature of inter-individual differences in drug disposition and effects, with the ultimate goal of providing a stronger scientific basis for selecting the optimal drug therapy and dosages for each patient. These genetic insights should also lead to mechanism-based approaches to the discovery and development of new medications. For example, assay by Howard L McLeod, William E Evans titled “PHARMACOGENOMICS: Unlocking the Human Genome for Better Drug Therapy” in Annual Review of Pharmacology and Toxicology 2001, Vol. 41: 101-121 describes the scope of pharmacogenomics. Collection, analysis and maintenance of inter-individual differences data sets requires efficient information driven framework and tools to process, analyze, and interpret prodigious amounts of data.
Microscopy: Molecular imaging—Identification of changes in the cellular structures indicative of disease remains the key to the better understanding in medicinal science. Microscopy applications as applicable to microbiology (e.g., gram staining, etc.), Plant tissue culture, animal cell culture (e.g. phase contrast microscopy, etc.), molecular biology, immunology (e.g., ELISA, etc.), cell biology (e.g., immunofluorescence, chromosome analysis, etc.) Confocal microscopy: Time-Lapse and Live Cell Imaging, Series and Three-Dimensional Imaging. The advancers in confocal microscopy have unraveled many of the secrets occurring within the cell and the transcriptional and translational level changes can be detected using fluorescence markers. The advantage of the confocal approach results from the capability to image individual optical sections at high resolution in sequence through the specimen. Framework with tools for 3-Dimensional analysis of thicker sections, differential color detection, FISH etc is needed to expedite the progress in this area.
Near infrared (NIR) multiphoton microscopy—is becoming a novel optical tool for fluorescence imaging with high spatial and temporal resolution, diagnostics, photochemistry and nanoprocessingu within living cells and tissues. NIR lasers can be employed as the excitation source for multifluorophor multiphoton excitation and hence multicolour imaging. In combination with fluorescence in situ hybridization (FISH), this novel approach can be used for multi-gene detection (multiphoton multicolour FISH). For example, assay titled “Multiphoton microscopy in life sciences” by Konig K. in Journal of Microscopy, 2000, Vol. 200 (Part 2):83-104 indicates the state of microscopy in life sciences.
In-vivo imaging: Animal models of cancer are inevitable in studies that are difficult or impossible to perform in people. Imaging of in-vivo markers permit observations of the biological processes underlying cancer growth and development. Functional imaging—the visualization of physiological, cellular, or molecular processes in living tissue—would allows to study metabolic events in real time, as they take place in living cells of the body.
Diagnostic medical imaging: Imaging technology has broadened the range of medical options in exploring untapped potential for cancer diagnosis. X-ray mammography already has had a lifesaving effect in detecting some early cancers. Computed tomography (CT) and ultrasound permit physicians to guide long, thin needles deep within the body to biopsy organs, often eliminating the need for an open surgical procedure. CT scan images can reveal whether a tumor has invaded vital tissue, grown around blood vessels, or spread to distant organs; important information that can help guide treatment choices. Three dimensional image reconstruction and visualization techniques require significant processing capabilities using smaller, faster, and more economical computing solutions.
The conventional process for managing medical images is completed at most hospitals, clinics and imaging centers. The medical image is printed onto sheets of film, which are delivered to the radiologist for interpretation. After the transcribed report is delivered to the radiologist, reviewed for errors and signed, the films and report are delivered or mailed to the referring doctor. This process often takes several days, up to a week. If questions arise, the referring doctor contacts the radiologist, who may be forced to rely upon memory, having reviewed the films several days before and no longer has possession of them. Also, the referring doctor must then manage the hard-copy films, either by filing the films in his office, or returning the films to the imaging center or hospital to be filed, depending upon practices in the local community. If the patient then goes to a second doctor, requires surgery, or requires another medical imaging procedure, the films must be located and physically carried or shipped to the hospital, surgery center, or to the second doctor's office. There are numerous opportunities for films to be lost or misfiled, and doctors who maintain more than, one office may not always have the correct patient films in the correct office.
The current film-based system is very expensive, and the charges for films, processing chemicals, and delivery can easily add up to $30 to $50 per MRI patient study. Other problems for the imaging facility are the numerous opportunities for the films to be physically lost, as well as the considerable time, personnel, and expense required for the delivery and retrieval of these films. Estimates are that up to 25% of medical images are not accessible when required.
Several researchers have performed experiments on and made observations of biological tissue samples suggesting that a molecular basis for cancer and other diseases might be discovered through careful molecular analysis of such tissues. Such an understanding could permit improvement in the diagnosis, screening, and treatment of disease, and could permit disease treatment to be tailored to the specific molecular defects found in an individual patient. Many different researchers and laboratories study the molecular basis of disease and a large amount of data and information is produced from such studies. Optimization of the handling and integration research results, data, and other information produced and used by various laboratories devoted to studying the molecular basis of cancer and other tissue-based diseases is advantageous for realizing improvements in the understanding and treatment of disease.
Even though many large genomic warehouse databases currently exist, and even though scientific laboratories are connected to the Internet, the data produced by a lab are not necessarily well handled, integrated, validated, searchable, and useable either by the lab producing the data or by another lab that might be interested in using the data. Generally, when data from biological tissue studies are published, only a limited set of the actual primary data (and sometimes none of it) are available for review and reanalysis. Moreover, common language and reference points are often not used for reporting the data. Even in the lab that did the original work, there is often no efficient or robust way to integrate data from a study with previous or subsequent studies. Furthermore, because of space limitations and the difficulty of tracking complex research methods, many published descriptions of laboratory methods do not provide adequate information for another scientist to accurately reproduce an experiment, even though this is a central tenet of scientific publication. The end result is that when taken as a group, the many similar or related studies, while individually illuminating, are isolated and autonomous from each other, and do not achieve potential synergies.
Lack of proper data handling may result in major problems that may slow or possibly prevent real progress in finding better treatment and diagnostic methods for major diseases. In particular, current methods of disseminating information from molecular studies of cancer and other diseases do not allow results from one study to be easily integrated with results from other studies. There is no standard way to link the results of DNA, RNA, and protein-based studies to cellular function or phenotype expression. Current methods of dissemination of the results of molecular studies do not allow preservation of a substantial portion of the original data supporting such studies, making it difficult for researchers to verify the conclusion of a research study or otherwise reinterpret the data.
The ability to detect, through imaging, the histopathological image data for the molecular and phenotypic changes associated with a tumor cell will enhance pathologists ability to detect and stage tumors, select appropriate treatments, monitor the effectiveness of a treatment, and determine prognosis.
A standard test used to measure protein expression is immunohistochemistry (IHC). Analyzing the tissue samples stained with imnunohistochemical (IHC) reagents has been the key development in the practice of pathology. Normal and diseased cells have certain physical characteristics that can be used to differentiate them from each other. These characteristics include complex patterns, rare events, and subtle variations in color and intensity
Hematoxillin and Eosin (H&E) method of staining is used to study the morphology of tissue samples. Based on the differences and variations in the patterns from the normal tissue, the type of cancer is determined. Also the pathological grading or staging of cancer (Richardson and Bloom Method) is determined using the H&E staining. This pathological grading of cancer is not only important from diagnosis angle but has prognosis value attached to it.
One of the main limitations of the prior art is the scope of the term “image content.” Content is often represented by image parameters and has little or no relevance to the objective of the study being carried out. A consequence of this interpretation of content is the images retrieved may look similar from image dimensions, but may not be having similar pathological/radiological properties. For example, two images of the different tissues with malignancies might look different to a normal eye. It is pathologist's knowledge and experience that differentiates the types of malignancies. However, pathologists grade both these images as similar and assign membrane score (e.g., 3+).
There have been many attempts to solve some of the problems associated with automated analysis of medical images. For example, in U.S. Pat. No. 6,675,166, entitled “Integrated multidimensional database” that issued to Bova teaches “a method of distributing research data from a common database to a user of the common database is provided. Data concerning research results and data upon which the research results are based are stored in a local database and are linked to each other. Data concerning research results and data upon which the research results are based are selectively extracted from the local database to the common database. Research data are then selected by a user of the common database from the extracted data concerning research results and from the data upon which the extracted data are based and the selected research data are distributed to the user.”
U.S. Pat. No. 6,678,703, entitled “Medical image management system and method” that issued to Rothschild et al. teaches “a medical image management system and method that uses a central data management system to centrally manage the storage and transmission of electronic records containing medical images between remotely located facilities. A polling system is provided with remotely located workstations or local workstations so that the remote or local workstations may request queued data to be delivered that is awaiting delivery in the central database management system. The remotely located workstation or local image workstation communicates with a remotely located central data management system via a remote interface over the internet. The central database management system maintains and update any changes in the IP address of a remote or local workstation, in a look up table. The central data management system may also, in addition, push data when received to the last known IP address in the look up table.
U.S. Pat. No. 6,785,410, entitled “Image reporting method and system” that issued to Vining, et al., teaches “a method and system are provided to report the findings of an expert”s analysis of image data. The method and system are based on a reporting system that forms the basis of an image management system that can efficiently and systematically generate image reports, facilitate data entry into searchable databases for data mining, and expedite billing and collections for the expert's services. The expert identifies a significant finding on an image and attaches a location:description code to the location of that finding in order to create a significant finding and an entry into a database. Further descriptions of that finding, such as dimensional measurements, may be automatically appended to the finding as secondary attributes. After the evaluation, the system sorts the findings in the database and presents the findings by prioritized categories. The expert edits and approves a multimedia report which may be delivered by electronic means to an end-user.”
In U.S. Pat. No. 5,915,250, entitled “Threshold-based comparison,” that issued to Jain teaches “a system and method for content-based search and retrieval of visual objects. A base visual information retrieval (VIR) engine utilizes a set of universal primitives to operate on the visual objects. An extensible VIR engine allows custom, modular primitives to be defined and registered. A custom primitive addresses domain specific problems and can utilize any image understanding technique. Object attributes can be extracted over the entire image or over only a portion of the object. A schema is defined as a specific collection of primitives. A specific schema implies a specific set of visual features to be processed and a corresponding feature vector to be used for content-based similarity scoring. A primitive registration interface registers custom primitives and facilitates storing of an analysis function and a comparison function to a schema table. A heterogeneous comparison allows objects analyzed by different schemas to be compared if at least one primitive is in common between the schemas. A threshold-based comparison is utilized to improve performance of the VIR engine. A distance between two feature vectors is computed in any of the comparison processes so as to generate a similarity score.
In U.S. Published Patent Application, 20030176929, entitled “User interface for a bioinformatics system,” published by Gardner, teaches “a bioinformatics system and method is provided for integrated processing of biological data. According to one embodiment, the invention provides an interlocking series of target identification, target validation, lead identification, and lead optimization modules in a discovery platform oriented around specific components of the drug discovery process. The discovery platform of the invention utilizes genomic, proteomic, and other biological data stored in structured as well as unstructured databases. According to another embodiment, the invention provides overall platform/architecture with integration approach for searching and processing the data stored in the structured as well as unstructured databases. According to another embodiment, the invention provides a user interface, affording users the ability to access and process tasks for the drug discovery process.”
In U.S. Published Patent Application, 20050033736, entitled “System and method for processing record related information, published by Carlin, teaches “A system and method is disclosed for defining a set of predetermined comment codes for incorporation into database records to enhance processes and workflows. The system comprises a repository of records, including a plurality of records incorporating at least one comment identifier identifying a comment associated by a user with a particular record and provides additional information about a particular characteristic of the record, a search processor for searching the repository to identify a plurality of records incorporating at least one predetermined comment identifier in response to a user provided search query, and a task processor for assigning performance of a task comprising processing the identified plurality of records. The system may also include a report generator for generating a report identifying the plurality of records incorporating the predetermined comment identifier and for providing data representing the report in a format suitable for at least one of, display on a reproduction device, printing and electronic communication, in response to a user command.”
In U.S. Published Patent Application, 20050038776, entitled, “Information system for biological and life sciences research,” published by Cyrus teaches “an online life science research environment and virtual community with a focus on design and analysis of biological experiments includes a life sciences laboratory system employing at least one networked computer system that defines a virtual research environment. Users access the system through a portal associated with the networked computer system(s). The virtual research environment has a data coupling mechanism by which the user designates a set of user-specified data for bioinformatics processing. A processor(s) associated with the networked computer system(s) performs bioinformatics services upon the user-specified data. In one embodiment, the data coupling mechanism enables transfer of the user-specified data to a memory space that is mediated or accessed by the processor performing the bioinformatics processing. Users may thus exploit bioinformatics processing resources that are not deployed on users' local computer environments, and to store and organize information relating to life sciences research in a secure, online workspace.”
Picture Archiving and Communication Systems (PACS): Several solutions have been developed with the intention of streamlining the storage and accessibility of medical images by managing, electronic records that include the images in electronic form that may be converted for viewing, such as on screen displays or via film printers. “Picture Archiving and Communications Systems” (PACS) generally provides medical image management via a collection of components that enable image data acquisition, transmission, display, and, storage. Such systems are implemented in imaging clinics and hospitals to make the digital data available at different locations within the radiology department or the facility. Further, the use of such systems is generally restricted to in-house radiology and other departments, thus excluding the referring physicians, who are outside the imaging facility. These systems have high price tags for the local installation of the respective central image management and storage systems generally required, and involve other high costs related to additional personnel to configure and maintain such image management systems locally onsite at the imaging facility.
Medical Images and Internet Application Service Providers (ASP): Medical image management market is large, and represents large volumes of recurring transmissions of electronic records associated with medical images. Several efforts have recently been made to replace or at least significantly enhance the conventional film-based systems and methods for medical image management by managing these images electronically, and more particularly via an internet-based ASP model. However, the concept of an Internet based Application Service Provider (ASP) for the transmission and storage of medical images is an industry in embryonic stage. Very few of the diagnostic imaging procedures performed annually in the U.S. are being transmitted and/or stored utilizing an ASP model.
There are several vendors supplying PACS. Amicas, www.amicas.com has supplied its vision series to several leading hospitals in US. eMed Technologies is a Healthcare Application Service Provider (HASP), www.emed.com, provides eMed.net service including a medical image viewing application with integrated access to medical images and reports along with other relevant information through a physician's web site. General Electric Medical systems, www.ge.com, has a product Centricity™ PACS. Centricity™ PACS is at the heart of GE's integrated approach to improving radiology workflow. By itself, Centricity™ PACS includes image communication within a radiology department. When integrated with Centricity™ RIS, it creates a single workflow engine—Centricity™ RIS/PACS—that allows radiologists to access comprehensive patient information at a single workstation via a single login anywhere in the enterprise.
Image Medical, www.eradimagemedical.com has a product “eRAD”. It is a PACS and Teleradiology System that web-enabled access to patient studies and reports from a standard PC with Internet access at a moment's notice. Radiologists, Referring Physicians, System Administrators and other authorized users may easily log on from anywhere to access diagnostic-quality, pre-fetched images from dynamic IP addresses. The patient studies may be received uncompressed or compressed, as selected by the user.
Over the last one decade, there has been tremendous progress in the theory and design of “search engine optimization.” Google is leading in several aspects of the search engine research, and Google's data structures are optimized so that a large document collection can be crawled, indexed, and searched with little cost. See “The Anatomy of a Large-Scale Hypertextual Web Search Engine” by Sergey Brin and Lawrence Page. Although, CPUs and bulk input output rates have improved dramatically over the years, a disk seek still requires about 10 ms to complete. Google is designed to avoid disk seeks whenever possible, and this has had a considerable influence on the design of search data structures.
However, none of these solutions solve all of the problems associated with automatically analyzing anatomical structures of interest. Thus, it is desirable to provide a method and system automated digital image analysis using anatomical structures of interest.