1. Field of the Invention
The invention relates generally to techniques for determining whether a complex digital object that contains subobjects conforms to a standard and more specifically to techniques for determining whether a given DICOM object conforms to the general standard for all DICOM objects and also to the local standard established by a given producer or consumer of DICOM objects.
2. Description of Related Art
Complex Digital Objects
As bits have become cheaper and network communications have become quicker, many different types of complex digital objects have been developed. One large class of such objects is objects that contain data representing wave forms or still or moving images and data about the wave forms or images. The latter is termed in the following metadata. One common example of such an object is an object that contains a video. Such an object contains data representing the video images, data representing the audio signal, and metadata such as the video's title, the location of scenes in the video, closed captions, and so forth. Complex objects that contain non-character data or a mixture of character and non-character data are termed in the following binary objects.
DICOM Objects: FIG. 1
Another example of a complex digital object is a DICOM object. DICOM objects are objects that are used to record data produced by devices such as X ray or MRI machines and metadata for the recorded data. The metadata includes items such as identification information for the patient, identification information for the study the recorded data belongs to, the equipment that produced the recorded data, and the recorded data itself. DICOM objects are thus binary objects. DICOM objects are made according to the Digital Imaging and Communications in Medicine (DICOM) standard. Most modern medical imaging devices produce objects made according to the standard and the workstations and terminals used by doctors to view the objects will correctly display any object made according to the standard. The DICOM standard is revised about once a year.
FIG. 1 provides an overview of an instance of a DICOM object at 101. A DICOM object is an instance of an information object definition, or IOD, that may contain instances of other IODs. Each IOD defines a set of related pieces of information, including other IODs. A set of operations that deal with the information belonging to a given IOD are termed services for that IOD, and an IOD and its services form a service-object pair class, or SOP class. An instance of an IOD is termed an information entity, shown at 105 in FIG. 1. Information entities contain attributes as defined in the entity's IOD. Each attribute 109 specifies a single item of information. Attributes that are related to each other are grouped into information object modules, or IOMs, as shown at 107. An IOM may be specified in more than one IOD. Summing all of this up, attributes 109, which specify the rows and columns of the image, are contained in module 107, which describes the pixels making up the image information entity, which is an information entity defined by the Image IOD. The SOP class for the image IOD is the combination of the image IOD and a group of operations (service) on the attributes contained in the image information entity.
At 111 is shown in detail how an attribute's value is specified in a DICOM object 105. Each attribute is identified by a group number and an element number; here the group number 113 is hex 7FE0 and the element number is hex 0000. The group number and the attribute number together form the attribute's tag, identifying the attribute “Pixel Data Group Length”, and then the value of the attribute, 19850, which in this case is the length in bytes of the value of the pixel data attribute.
Information Model for a DICOM Object: FIG. 2
FIG. 2 shows the DICOM information model, which is a conceptual description of the information that can be contained in a DICOM object and of the relationships between the components of the information. At 200 is shown how the information in a DICOM object relates to an examination of a patient in the real world. The examination is termed a study. Study 201 of patient 203 involves two different kinds of imaging. Each kind of imaging is termed a modality. In study 201 there are modality 1 203, which produced two series of images 205 and modality 2 207, which produced a single image 209. The information entities 105 resulting from study 201 are shown at information model 211. There is a single information entity 213 which represents the patient. A DICOM information model may contain more than one study 215 for a patient, as indicated at 217. The study 215 shown is for study 201 and has a set of information entities 219 for each of the series of images 205 and 209 made for the patient. The binary representations of the series of images are shown at 220(a . . . c).
At 221 is shown conceptually how the components of a DICOM object for study 201 relate to each other. Representation 221 has two components types: information entities 105 and relationships between entities 223. A relationship has a direction and a cardinality. The direction is indicated by the arrows on relationship 223, with the entity at the tail of the arrow being a first entity that the relationship relates to one or more entities at the head of the arrow. A single number on the arrow indicates an exact number of entities; x,y on the arrow indicates a range of numbers of entities. Thus, as shown at 225, 227, and 229, relationship 227 relates a single patient to one or more studies. Put another way, relationship 227 requires that there be only one patient entity in a DICOM study and that there can be more than one studies for a patient in the DICOM information model.
Continuing with representation 221, any one of the studies 229 will contain one or more series 233. As indicated at 235 and 237, a frame of reference entity may apply to none or any number of series entities 223 and an entity 239 representing a piece of equipment may create one or more of the series 233. Each series 233, finally, may contain the components shown at 245-253 in the numbers indicated by the cardinalities for the various contains relationships 243. As may be seen from the foregoing, representation 221 specifies constraints on instances of DICOM objects. An instance of a DICOM object which does not satisfy all of the constraints imposed by the information model is invalid. For example, as indicated above, one of the constraints requires that the DICOM object have one and only one patient information entity 225; if a DICOM object has none or more than one, the DICOM object is invalid. The DICOM standard further specifies constraints on the contents of information entities, manufacturers of equipment impose constraints on the attributes that describe images produced by their equipment, and producers and consumers of DICOM objects may impose further constraints. For example, a hospital may require that the attribute that identifies the physician who performed the study identify a physician who is associated with the hospital at the time the study is made. An insurance company may require that the DICOM object have been made by one of a limited number of approved vendors of medical imaging services. At present, the DICOM standard specifies the constraints in the English language. As is apparent from the foregoing, there may be different sets of constraints that apply to DICOM objects. The DICOM objects to which a given set of constraints apply are termed in the following the class of DICOM objects corresponding to the set of constraints.
These constraints must of course be validated. One known way of validating the constraints is a hard-coded validating program that writes all constraints as program constructs. The program can verify a DICOM object with respect to the set of constraints for which the program was written. Many DICOM storage system vendors provide such validating programs for free so that DICOM objects can be validated and corrected before they are stored in the system provided by the vendor. The end user of the system of course cannot maintain the code, use it with a different storage system, or alter it as required by changes in the DICOM standard or to add code for constraints particular to a new piece of equipment or to the end user. A way of validating the constraints that overcomes some of these drawbacks is to use XML. That technique will be described in the following.
XML and DICOM: FIG. 3
XML is a widely adopted format for representing any arrangement of data as a set of quoted character strings. The XML character string is termed an XML document. The W3C XML recommendation (www.w3.org/TR/2006/REC-xml-20060816/) describes the general syntax of an XML document. The W3C XML schema specification (www.w3.org XML/Schema) also describes how a user of XML may make an XML Schema document which defines how the user's particular arrangement of data is to be represented as an XML document. Anyone who has an arrangement of data and the XML schema describing how the arrangement of data is to be made into an XML document can make the XML document described by the XML schema from the arrangement of data. Another kind of document, an XSLT document, describes how an XML document may be converted into something else. One of the things an XSLT document may specify is how to convert the contents of an XML document back into the particular arrangement of data from which the XML document was made. Because XML, XML schemas, and XSLT are both canonical and completely flexible, it is becoming increasingly common to translate arrangements of data which must be shared with others to and from XML documents.
As would be expected from the foregoing, XML is used to represent DICOM objects. How one makes an XML document from a DICOM object is shown at 301 in FIG. 3. First, a DICOM parser 303 that can read the information entities which make up a DICOM object reads the attributes of entities. As the attributes are read, they are presented to an XML encoder 305, which is a program that is designed to make an XML representation 309 of DICOM object A 302 that conforms to schema 307 for XML documents made from DICOM objects. When parser 303 and encoder 305 are finished, the result is XML representation 309 of DICOM object A. The XML schema and the XML encoder can be written so that the translation from DICOM object A 302 is lossless, i.e., XML representation 309 contains all of the information that was in DICOM object A 302.
Once an XML representation of a DICOM object has been made, various operations may be performed on the XML representation instead of the DICOM object itself. One reason for doing this is that many more people are familiar with XML than are familiar with DICOM; another is that a great many tools are available for manipulating XML; for example, modern database systems include extensive XML toolkits. An example of performing an operation on an XML representation of a DICOM object instead of on the DICOM object itself is shown at 309, 313, 314, and 312 of flowchart 301. The operation is updating a DICOM object A 302 with additional information. An XML representation 309 of A exists, so the update is done on the XML representation. First, an XML representation 308 of the updates is made; it, an XSLT document for DICOM 313 and XML representation 309 are input to XSLT processor 314, which produces a new XML representation 312 that includes the contents of XML representation and updates 308 as prescribed by XSLT 313. DICOM encoder 311 losslessly produces DICOM objects from XML representations of DICOM objects. DICOM encoder 311 can consequently produce updated DICOM object A′ 315 from XML representation 312. It should be noted at this point that although XML representations of DICOM objects are widely used, there is no standard XML schema for a DICOM object and consequently no standard XML representation of a DICOM object.
One of the operations that can be done on the XML representation of a DICOM object is validation of the DICOM object. This is possible because the XML representation is logically exactly equivalent to the original DICOM object. There are in general two kinds of validation that may be done on an object: structural validation and semantic validation. Structural validation validates structural constraints, i.e., constraints that are not dependent on the values of attributes in the object; semantic validation validates semantic constraints, i.e. constraints that are dependent on values of attributes in the object. To give an example of the distinction between structural and semantic validation with regard to DICOM objects, structural validation checks whether a particular DICOM object obeys the structural constraint that there may be only one patient node 225 in a DICOM object; semantic validation of a particular patient node 225 checks whether the information in the patient node obeys the semantic constraint that the patient's name must exist and is not empty for DICOM objects that are produced by a study that was performed after a certain date.
Validation of an XML representation of a DICOM object may be done by incorporating checks for DICOM constraints in the XML representation's XML schema and by combining XSLT with XPath statements that check for DICOM constraints. XPath is a standard language for locating nodes in an XML document and returning information about them. This is shown at 317. Both structural and semantic validation may be done. Structural validation is done as shown at 319. An XML document 309 to be validated and the XML schema 307 for the document are input to a schema validator 321 which produces validation result indicating whether the XML document has the structure described by the documents XML schema. Semantic validation is done as shown at 325. The XML representation 309 to be validated is input to XML decoder 309 along with DICOM validation XSLT 327. Validation XSLT 327 contains XPath statements which check the semantic constraints and return results. Validation XSLT 327 produces validation result 329 which specifies any semantic constraint results. In terms of what XSLT generally does, XML document 309 has been transformed into validation result 329.
The use of the XML schema and XSLT and XPath to validate XML documents is explained in detail in William L. Provost, An XML Validation Architecture using XML Schema, XPath, and XSLT, available in August, 2006 at www.objectinnovations.com/Library/Articles/Provost/XML ValidationArchitecture/index.html.
The reference is copyrighted 2004. As may be seen from the Provost reference, neither the XML schema, XSLT, nor XPath was designed to do validation, and consequently, a high order of skill in XML is required to use the XML schema, XSLT, and XPath for that purpose. A related problem is that the validation is done in terms of the structure and content of the XML document, not in terms of the structure and content of the DICOM object. A consequence of this is that expertise in the structure and semantics of the DICOM object is not by itself sufficient to do validation of DICOM objects using the XML schema, XSLT, and XPath. What is required to do it is enough expertise both in DICOM and in XML to be able to translate the constraints as expressed in English in the DICOM standard into an XML schema, a set of XPath statements that check those constraints, and an XSLT document that produces the validation result.
A System that Uses XML to Represent DICOM Objects and Uses the XML Representations to Validate the DICOM Objects: FIG. 4
FIG. 4 is a block diagram of the system for integrating DICOM objects into a database management system described in U.S. Ser. No. 11/285,977. In that system, validation is done on the XML representations of the DICOM objects. The main components of system 401 are in-memory DICOM representation 405 in memory 403, model 409 in repository 407, relational database system 423, and programs including DICOM encoder 416, DICOM parser 417, an XML encoder 419, an XML parser 420, and a DICOM conformance validator 421. Beginning with in-memory DICOM representation 405, the representation is a representation of a DICOM object that has been optimized to permit rapid access to the DICOM object's subobjects. In system 401, in-memory DICOM representation 405 represents the DICOM object as a hierarchical directed graph. The components of the DICOM object are nodes in the directed graph. Pointers in the nodes permit rapid traversal of the graph. Because the graph is hierarchical, no node of the graph has more than one parent. DICOM parser 417 provides the interface which is used by other components of system 401 to perform operations on in-memory DICOM representation 405. Among operations that DICOM parser 417 can perform is returning a value indicating a data type for a given locator for a subobject of the DICOM object and also returning subobject values of the data type contained in a DICOM object.
Model repository 407 is persistent storage that contains a model 409 of DICOM objects. Model 409 is modifiable by the user of system 401 and may thus be easily changed to deal with changes in the DICOM standard and with peculiarities of DICOM objects that are either produced locally or received from elsewhere. Model 409 has three components:                DICOM data dictionary 411 describes for each information entity, attribute, and module in the classes of DICOM objects that system 401 deals with how the entity, attribute, or module is to be represented in in-memory representation 405.        DICOM mapping document 413 is an XML document that describes how an XML representation of a DICOM object is to be made from the DICOM object's in-memory representation 405.        XML validation documents 415 are XSLT and XPath documents that are used to validate the XML representation of a DICOM object in the manner already described. The validation documents in repository 407 must be able to validate every class of DICOM object that system 401 deals with.        
Relational database system 423 contains at least one relational table 425 which has rows 427 that contain SQLDICOM objects. An SQLDICOM object has two components that are of interest in the present context: an XML representation 429 of the DICOM object's metadata, i.e., of all of the data in the DICOM object other than the images themselves, and a binary copy 431 of the DICOM object itself.
Operation of system 401 is as follows: When a DICOM object 105 is received in system 401, it is copied to field 431 of a row in relational table 425. DICOM parser 417 further reads DICOM object 105 either directly or from field 431, retrieves the information entities, modules, and attributes from the DICOM object, and produces in-memory DICOM representation 405 of the entities, modules, and attributes as specified by the attributes in DICOM data dictionary 411 for the entities, modules, and attributes. When in-memory DICOM representation 405 is finished, XML encoder 419 makes XML metadata 429 by reading in-memory DICOM representation 405 and making it into an XML document as specified in mapping document 413. XML metadata 429 is then stored in the row 427 that contains the DICOM object. XML parser 420 can now use XML metadata 429 and DICOM data dictionary 411 to make in-memory DICOM representation 405 from the XML metadata. DICOM performance validator 421 then validates the DICOM object by using the XML schema or the XSLT document and the XPath document on XML metadata 429 for the DICOM object and produces a validation result 329 as described in the discussion of FIG. 3. It should be noted here that no mechanism is provided in system 401 for validating a DICOM object beyond that offered by using XSLT and XPath to validate the XML object made from the DICOM object. That mechanism suffers from the shortcomings described with reference to the validation techniques shown at 317 in FIG. 3. It is an object of the techniques described herein to overcome those shortcomings and provide improved validation of binary objects and other complex digital objects.