As demand for Information Technology (IT) software and hardware to provide global data access and integrated business solutions has exploded, significant challenges have become evident. A central problem poses access, integration, and utilization of large amounts of new and valuable information generated in each of the major industries. Lack of unified, global, real-time data access and analysis is detrimental to crucial business processes, which include new product discovery, product development, decision-making, product testing and validation, and product time-to-market.
With the completion of the sequence of the human genome and the continued effort in understanding protein expression in the life sciences, a wealth of new genes are being discovered that will have potential as targets for therapeutic intervention. As a result of this new information, however, Biotech and Pharmaceutical companies are drowning in a flood of data. In the Life Sciences alone, approximately 1 Terabyte of data is generated per company and day, of which currently the vast majority is unutilized for several reasons.
First, data are contained in diversified system environments using different formats, heterogeneous databases and have been analyzed using different applications. These applications may each apply different processing to those data. Competitive software, based on proprietary platforms for network and applications analysis, have utilized data platform technologies such as SQL with open database connectivity (ODBC), component object model (COM), Object Linking and Embedding (OLE) and/or proprietary applications for analysis as evidenced in patents from such companies as Sybase, Kodak, IBM, and Cellomics in U.S. Pat. Nos. 6,161,148, 6,132,969, 5,989,835, 5,784,294, for data management and analysis, each of which patents are hereby incorporated by reference. Because of this diversity, despite the fact, that the seamless integration of public, legacy and new data is crucial to efficient drug discovery and life science research, current data mining tools cannot handle all data simultaneously. There is a significant lack of data handling methods, which can utilize these data in a secure, manageable way. The shortcomings of these technologies are evident within heterogeneous software and hardware environments with global data resources. Despite the fact that the seamless integration of public, legacy and new data is crucial to efficient research (particularly in the life sciences), product discovery (such as for example drug, or treatment regime discovery) and distribution, current data mining tools cannot handle or validate all diverse data simultaneously.
Second, with the expansion of high numbers of dense data in a global environment, user queries often require costly massive parallel or other supercomputer-oriented processing in the form of mainframe computers and/or cluster servers with various types of network integration software pieced together for translation and access functionality as evidenced by such companies as NetGenics, IBM and ChannelPoint in U.S. Pat. Nos. 6,125,383 6,078,924, 6,141,660, 6,148,298, each of which patents are herein incorporated by reference—(e.g. Java, CORBA, “wrapping”, XML) and networked supercomputing hardware as evidenced by such companies as IBM, Compaq and others in patents such as for example U.S. Pat. Nos. 6,041,398, 5,842,031, each of which is hereby incorporated by reference. Even with these expensive software and hardware infrastructures, significant time-delays in result generation remain the norm.
Third, in part due to the flood of data and for other reasons as well, there is a significant redundancy within the data, making queries more time consuming and less efficient in their results.
Fourth, an additional consideration, which is prohibitive to change towards a more homogenous infrastructure, is cost. The cost to bring legacy systems up to date, to retool a company's Intranet based software systems, to carry out analysis with existing tools, or even to add new applications can be very expensive. Conventional practices require retooling and/or translating at application and hardware layers, as evidenced by such companies as Unisys and IBM in U.S. Pat Nos. 6,038,393, 5,634,015.
Because of the constraints outlined above, it is nearly impossible to extract useful, relevant information from the entity of data within reasonable computing time and efforts. For this reason, the development of architecture to overcome these obstacles is needed.
These are not the only limitations. With the advent of distinct differentiations in the field of genomics, proteomics, bioinformatics and the need for informed decision making in the life sciences, the state of object data is crucial for their overall validation and weight in complex, multi-disciplinary queries. This is even more important due to inter-dependencies of a variety of data at different states. Furthermore, because biological data describe a “snapshot” of complex processes at a defined state of the organism, data obtained at any time refer to this unique phase of metabolism. In order to account for meaningful comparison, thus, only data in similar states can be utilized. Therefore, there is a growing need for a object data state processing engine, which allows to continuously monitor, govern, validate and update the data state based on any activities of intelligent molecular objects in real-time.
Data translation processes between different data types are time-consuming and require provision of information on data structure and dependencies, in spite of advances in information technology. These processes, although available and used, have a number of shortcomings. Data contained in diversified system environments may use different formats, heterogeneous databases and different applications, each of which may apply different processing to those data. Because of that, despite the fact that the seamless integration of public, legacy and new data is crucial to efficient drug discovery and life science research, several different applications and/or components have to be designed in order to translate each of those data sets correctly. These require significant effort and resources in both, software development and data processing. With the advent of distinct differentiations in the field of genomics, proteomics, bioinformatics and the need for informed decision making in the life sciences, access to all data is crucial for overall validation and weight in complex, multi-disciplinary queries. This is even more important due to inter-dependencies of a variety of data at different states. The current individual data translation approach does not support these needs. Because biological data describe a “snapshot” of complex processes at a defined state of the organism, data obtained at any time refer to this unique phase of metabolism. In order to account for meaningful comparison, thus, only data in similar states can be utilized. The latter requires real-time processing and automated, instant data translation of data from different sources. Therefore, there is a growing need for an object data translation engine, which allows for bi-directional translation of multidimensional data from various sources into intelligent molecular objects in real-time.
The flood of new and legacy data results in a significant redundancy within the data making queries more time consuming and less efficient in their results. There is a lack of defined sets of user interaction and environment definition protocols, which are needed to provide means for intelligent data mining and optimization in result validation towards real solutions and answers. An additional consideration, which is prohibitive to change towards a more homogeneous infrastructure is the missing of object representation definition protocols to prepare and present data objects for interaction within heterogeneous environments. Lastly, data currently are interacted with and presented in diverse user interfaces with dedicated, unique features and protocols preventing universal, unified user access. Thus, a homogeneous, unified presentation such as a web-enabled graphical user interface which integrates components from diverse applications and laboratory systems environments is highly desirable, but currently non-existent for objects in real-time.
Because of these constraints, it is nearly impossible to extract useful, relevant information from the entity of data within reasonable computing time and efforts. For this reason, the development of an architecture and unifying user interface to overcome these obstacles is needed.
Relevant Patents
U.S. Pat. No. 6,136,274, U.S. Pat. No. 6,125,383, U.S. Pat. No. 6,052,722, U.S. Pat. No. 6,016,495, U.S. Pat. No. 5,937,189, U.S. Pat. No. 5,596,744, U.S. Pat. No. 5,867,799, U.S. Pat. No. 5,745,895, U.S. Pat. No. 6,076,088, U.S. Pat. No. 5,706,453, U.S. Pat. No. 5,767,854, U.S. Pat. No. 6,035,300, U.S. Pat. No. 6,145,009, U.S. Pat. No. 5,974,532, U.S. Pat. No. 5,873,097, U.S. Pat. No. 6,094,656, U.S. Pat. No. 6,136,274, U.S. Pat. No. 6,138,171, U.S. Pat. No. 6,144,989, U.S. Pat. No. 6,137,499, U.S. Pat. No. 6,016,393. U.S. Pat. No. 6,145,009, U.S. Pat. No. 6,167,563, U.S. Pat. No. 6,144,989, U.S. Pat. No. 6,134,664, U.S. Pat. No. 6,125,383, U.S. Pat. No. 6,111,893, U.S. Pat. No. 6,108,661, U.S. Pat. No. 6,102,969, U.S. Pat. No. 6,078,924, U.S. Pat. No. 6,076,088, U.S. Pat. No. 5,964,891, U.S. Pat. No. 5,937,189, U.S. Pat. No. 5,745,895, U.S. Pat. No. 5,664,215, U.S. Pat. No. 6,052,722, U.S. Pat. No. 6,064,382, U.S. Pat. No. 6,134,581, U.S. Pat. No. 6,146,027, U.S. Pat. No. 5,664,066, U.S. Pat. No. 5,862,325, U.S. Pat. No. 6,016,495, U.S. Pat. No. 6,119,126.
Relevant Literature
Elisa Bertino, Susan Urban, Elke A. Rundensteiner (eds.): Theory and Practice of Object Systems (1999) 5 (3): 125–197; Akmal B. Chaudhri, Julie A. McCann, Peter Osmon: Theory and Practice of Object Systems (1999) 5 (4): 263–279; D. Cai, M. F. McTear, S. I. McClean International Journal of Intelligent Systems (2000): 15 (8): 745–761; Carol A. Hert, Elin K. Jacob, Patrick Dawson: Journal of the American Society for Information Science (2000) 51 (11): 971–988. F. J. González-Castaño, L. Anido-Rifón, J. M. Pousada-Carballo, P. S. Rodríguez-Hernández, R. López-Góm: Software: Practice and Experience (2001) 31 (1): 1–16; Daniel E. Cooke, Per Andersen: Software: Practice and Experience (2000) 30 (14): 1541–1570; Akmal B. Chaudhri: Theory and Practice of Object Systems (1999) 5 (4): 199–200; Lee A. Segel: Complexity (2000) 5 (6): 39–46; L. J. G. T. van Hemmen: International Journal of Network Management (2000) 10 (6): 261–275. Joel E. Henry: Journal of Software Maintenance: Research and Practice (2000) 12 (4): 229–248; Michael Mattsson, Jan Bosch: Journal of Software Maintenance: Research and Practice (2000) 12 (4): 79–102; Sally Mcclean, Bryan Scotney, Mary Shapcott: International Journal of Intelligent Systems (2000) 15 (6): 535–547; Julie M. Hurd: Journal of the American Society for Information Science (2000) 51(14): 1279–1283; Serge Demeyer, Matthias Rieger, Theo Dirk Meijler, Edzard Gelsema: Theory and Practice of Object Systems (1999) 5 (2): 73–81; Dao et al: IEEE (1991): 88–91. Joel E. Henry: Journal of Software Maintenance: Research and Practice (2000) 12 (4): 229–248; Michael Mattsson, Jan Bosch: Journal of Software Maintenance: Research and Practice (2000) 12 (4): 79–102; Sally Mcclean, Bryan Scotney, Mary Shapcott: International Journal of Intelligent Systems (2000) 15 (6): 535–547; Julie M. Hurd: Journal of the American Society for Information Science (2000) 51 (14): 1279–1283; Serge Demeyer, Matthias Rieger, Theo Dirk Meijler, Edzard Gelsema: Theory and Practice of Object Systems (1999) 5 (2): 73–81. Mark Baker: Software Focus: Parallel programming with Java (2000) 1 (1); C. N. Lauro, G. Giordano, R. Verde: Applied Stochastic Models and Data Analysis: A multidimensional approach to conjoint analysis (1998) 14 (4): 265–274; P. America: Formal Aspects of Computing: Issues in the Design of a Parallel Object-Oriented Language [POOL] (1989) 1 (4): 366–411.