1. Field of the Invention
The invention relates generally to managing grid enabled computing environments, and particularly to increasing the security of grid enabled computing environments by implementing an edge management system.
2. Description of the Related Art
Edge management relates to how broadly a given grid job should be able to expand across a computing infrastructure. A grid job is a computer processing job that is portioned out across a plurality of processors. As the grid job expands across different boundaries separating different computing environments, there is an increased risk that sensitive information will be processed on a node that is insufficiently secure.
Grid schedulers accept applications and jobs submitted by users and provide the mechanism to deploy such jobs and applications on the grid computing equipment based on scheduling policies. Grid schedulers currently utilize various security components to ensure information is processed in a sufficiently secure mode. For example, grid enabled computing environments use security standards for authentication such as those described in proposed standards such as open grid service infrastructure (OGSI). Or, a grid environment may use platform security standards for hardware and software such as in a Government certification. However, some grid applications may have security needs that even a certified platform cannot satisfy. Additionally, grid security is conventionally defined within the scheduling function, and as such, an error introduced when scheduling a grid job, or hundreds of grid jobs, may cause jobs to run in environments less secure than intended.
A conventional manner of implementing grid computing is to use a cluster of computers in a grid-like fashion. This enables computers to pool processing power. However, even within a single corporate organization, sharing resources can be difficult because two separate groups may own those different clusters, and each of the groups may use their own schedulers that apply a different set of rules. It is not easy to coordinate security policies across schedulers for different clusters. In situations where there are two, three or four different schedulers, if someone makes a mistake scheduling a job and does not give the job an appropriate level of security, there is nothing in place to prevent the job from being processed on an insufficiently secure node.
In a cluster form of grid computing, generally the scheduler is limited to the resources within that given cluster. If there is more than one cluster within an organization, there can be grid activity between clusters and schedulers. For example, scheduler A not only sends local jobs through scheduler A′s local cluster, scheduler A can also send work to other clusters within the organization.
The cluster configuration is not true grid computing, but rather is a quasi-form of grid computing or a grid-like environment. An example of a quasi-form of grid computing would be a cluster of computers in an accounting department of an organization that form a grid that does not expand outside of that particular cluster. A true grid computing environment is able to use resources outside of a particular cluster.
For example, when expanding beyond the grid-like environment discussed above, one subnet (an interconnected portion of a network sharing a network address, but distinguishable by a subnet) may contain two machines: one server with payroll records, and a second server that tests new application code. The first server would have more stringent edge/security requirements than the second. Conventional schedulers lack the security features to enable true grid computing. To prevent the more sensitive payroll information from being processed on the less secure second server, the need arises for a comprehensive edge manager for grid enabled computing environments. In addition, when using grid computing equipment that is external to an organization's computing environment, the management of security becomes even more critical than when using equipment that is part of the organization's own environment. If scheduler A issues an instruction to parallelize a job out to 1,000 nodes and there are only 400 nodes in-house, the 600 nodes outside the organization that are used must be carefully selected.
Most conventional schedulers lack the security features necessary to expand grids outside clusters. For example, OpenPBS (Portable Batch System), which is a freely available open source grid/cluster scheduler, does not enforce a security policy. OpenPBS uses the operating system security methods for user authentication (i.e., UNIX .rhosts file, which is not secure), access control lists, and firewall rules to restrict access to servers.
There are schedulers on the market today that include security as part of their scheduling policy. However, a flaw exists in the conventional scheduling mechanisms in their inability to ensure appropriate security is applied to a particular computing job. For example, in the situation where there are several different schedulers and somebody makes a mistake scheduling a job and does not provide the job with the appropriate level of security, there is nothing in place to prevent the job from being executed. The inventors have recognized the shortcomings of existing systems and have developed, in an exemplary embodiment of the present invention, an edge manager that would establish corporate level security policies and could prohibit a scheduler from executing a job submitted with an insufficient level of security by overriding the scheduler.
The present inventors recognized that the inadequate security offered by conventional schedulers is a factor favoring the use of grid-like environments, rather than true grid environments.
The present inventors also recognized that the increasing demand for computer processing resources has created a need for equipment that will better manage and maximize existing resources. Money could be saved by reducing the amount of computer equipment that is not being fully utilized. Rather than buying new, expensive, specialized equipment that has a lot of processing power, jobs could be distributed over a plurality of processors. Distributing jobs over a plurality of processors allows less expensive machines to be purchased and used. A system that securely uses a plurality of processors for a particular job also could increase the speed with which that job is completed. A job that would take three weeks could take only 24 hours if equipment to better manage existing processing resources existed.
Furthermore, no complete intra-site to inter-site solution has been developed that would manage, based on data security requirements, the extent to which a grid-job may parallelize outside of the local computing environment. Conventional systems manage the risk of processing secure information on an unsecure node by using a policy based edge manager that will not allow any grid enabled job to traverse the global grid beyond what is defined as secure for that particular job or job environment. The present invention would allow jobs that require a secure environment to run in a wider grid by providing a mechanism for addressing the security issue in a suitable manner.