As computing needs for organizations grow, and as organizations plan for growth, one common way to plan for, and obtain economical computing is to purchase computing systems that are scalable. A system or architecture is scalable when it can be upgraded or increased in size, or reconfigured to accommodate changing conditions. For example, a company that plans to set up a client/server network, may want to have a system that not only works with the number of people who will immediately use the system, but can be easily and economically expanded to accommodate the number of employees who may be using the system after one year, five years, or ten years. In another situation, a company that runs a server farm and hosts web sites or applications via the Internet may continue to grow, and this company may desire a scalable system where they can economically add servers as needed to accommodate growth.
Accordingly, a scalable system can typically merge or integrate a number of scalable servers or chassis having one or more processors to create a “larger” unitary system having processing nodes. Thus, a collection of scalable servers can function like a single larger server when properly merged. When multiple servers are merged, they can be partitioned using hardware partitioning. A system with a single partition can run a single instance of an operating system (OS) and all the nodes of the system are thus conceptually combined. Thus, in effect, the user will experience a single, more powerful computing system functioning as one “scaled up” node, instead of a number of less powerful nodes running independently.
Currently, customers have many issues with start up and expansion of scalable systems. Much of the problems are related to cabling and configuring the scalable system. It can be difficult to configure a system, and once it is configured it can be difficult to debug problems that arise. Current implementations allow the system administrator to see only one partition from a menu, and the administrator typically has to go to other terminals to see other partitions. A partition can include a set of components, nodes, or systems in a scalable environment that are configured to work as a single entity. It can be appreciated that it is inconvenient during a set up procedure to configure all the components into different partitions not knowing which components are part of different partitions since the partitions can only be “seen” and configured from different interfaces connected to different nodes. Further, not knowing the exact cabling or if the system is properly cabled can cause additional issues.
Many other issues can arise such as maintenance and repair issues for systems that breakdown or begin running at a reduced capacity. If there are cable or connector issues, such as a broken electrical connection, traditionally, it is very difficult to find where this broken connection has occurred. The scalable system performance can drop dramatically when faults occur, as soft failures. Further it can be very difficult to determine which port and cable is causing the failure since each component can have many different ports. Even intermittent failures can cause components to re-route communication when a fault occurs, thus a “soft” failure can go undetected.
A traditional approach for combining multiple nodes of a system into a single-partition merged system running a single instance of an OS is to have a trained technician manually integrate and configure each node as a system is built or as computing resources (nodes) are added. Traditionally, a trained technician or administrator must configure each node with the proper partition configuration information, entering data to specify or configure one of the nodes as the primary, or boot node, and the other nodes as secondary nodes to the primary node. This approach is cumbersome, and requires trained technicians to build and configure such a system. When there are more than a few nodes to manually configure, the configuring process can get complex and such configuring is prone to connection and configuration errors and omissions. Traditionally, many scalable server systems utilize a management interface (a standalone computing system with specialized hardware) for most of the components in the system to properly configure and boot the connected server nodes. It can be appreciated that this approach requires costly dedicated hardware, and may require modification to preexisting systems that do not allow for the addition of such functionality.
Generally, systems in a scalable environment don't automatically know that they are cabled together and can work as one system. These scalable systems have to be told (i.e. configured by a technician) such that they know that they are cabled to other nodes and must be configured regarding how they can communicate with other nodes. There are many current designs available that utilize this manual configuration approach. One design uses a network such as an Ethernet connection between nodes and utilizes dedicated hardware such as a remote supervisor adapter (RSA) to facilitate set up of the system. Typical set up, among other things, requires a user to input the Internet Protocol (IP) addresses of each RSA, in the RSA interface before the scalable systems can work as a single entity. This process can be cumbersome for a user to discover and enter the RSA IP address for each component. This IP detection process can include booting each scalable component and after the component is connected to the network the user can request and find the IP address in the BIOS menu of the component.