This background description is set forth below for the purpose of providing context only. Therefore, any aspect of this background description, to the extent that it does not otherwise qualify as prior art, is neither expressly nor impliedly admitted as prior art against the instant disclosure.
Many cryptocurrencies (e.g., Bitcoin, Litecoin) are based on a technology called blockchain, in which transactions are combined into blocks. These blocks are stored with previous blocks of earlier transactions into a ledger (the “blockchain”) and rendered immutable (i.e., practically unmodifiable) by including a hash. The hash is a number that is calculated based on the blocks and that meets the particular blockchain's criteria. Once the block and hash are confirmed by the cryptocurrency network, they are added to the blockchain. The hashes can be used to verify whether any of the prior transactions or blocks on the blockchain have been changed or tampered with. This creates an immutable ledger of transactions and allows the cryptocurrency network to guard against someone trying to double spend a digital coin.
Many cryptocurrency networks consist of a large number of participants that repeatedly attempt to be the first to calculate a hash meeting the blockchain network's requirements. Depending on the blockchain, they may receive a reward (e.g., a coin reward or transaction fee reward) for being first to calculate a successful hash, and that reward may motivate them to continue participating (mining).
Many blockchain networks require computationally difficult problems to be solved as part of the hash calculation. The difficult problem requires a solution that is a piece of data which is difficult (costly, time-consuming) to produce, but is easy for others to verify and which satisfies certain requirements. This is often called “proof of work”. A proof of work (PoW) system (or protocol, or function) is a consensus mechanism. It deters denial of service attacks and other service abuses such as spam on a network by requiring some work from the service requester, usually meaning processing time by a computer. The difficulty level may change periodically for some blockchain networks that attempt to compensate for increases in hash power that occur on the network.
Participants in the network operate standard PCs, servers, or specialized computing devices called mining rigs or miners. Because of the difficulty involved and the amount of computation required, the miners are typically configured with specialized components that improve the speed at which hashes (the device's hash rate) or other calculations required for the blockchain network are performed. Examples of specialized components include application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), graphics processing units (GPUs) and accelerated processing unit (APUs). Specialized cryptocurrency mining software (e.g., cgminer) may also be used with the specialized components, for example software applications configured to compute the SHA-256 algorithm.
Miners are often run for long periods of time at high frequencies that generate large amounts of heat. Even with cooling (e.g., high speed fans), the heat and constant operation can negatively impact the reliability and longevity of the components in the miners. ASIC miners for example have large numbers of hashing chips (e.g., 100's) that are more likely to fail as temperatures rise.
Many participants in blockchain networks operate large numbers (e.g., 1000's, 10,000's, 50,000's, or more) of different miners (e.g., different generations of miners from one manufacturer or different manufacturers) concurrently in large data centers. These data centers and large numbers of miners can be difficult to manage. Data centers housing large numbers of miners or other ASIC- or GPU-based systems have different challenges than traditional data centers housing more general computers. This is due to the significantly higher density, including higher power usage, higher heat generation, and near constant compute-intensive operation.
The constant operation often leads to performance issues such as memory leaks. A memory leak can reduce the performance of the computer by reducing the amount of available memory. Memory leaks can be a problem when programs run for an extended time and consume more and more memory over time. Eventually too much of the available memory may become allocated, and all or part of the system or device may stop working correctly. One or more applications running on the device may fail and the system may slow down due to thrashing. Thrashing is when a computer's virtual memory resources are overused, leading to a constant state of paging and page faults, dramatically slowing or inhibiting application-level processing.
In large data centers, there can be a significant number of units failing each day, both for known and unknown reasons. A typical data center management solution is to determine when a computing device is no longer responding to requests (e.g., responding to network pings), and then to power cycle the device (e.g., by going to the device and unplugging it). This is less than ideal, as it can take a significant amount of the data center technician's time to fine and manually power cycle all of the failed devices each day. In addition, there can be a significant loss in processing during the time when the device's performance is degraded while the device is still able to respond to requests.
For at least these reasons, there is a desire for a system and method to allow for improved management of large numbers of computing devices such as miners in a data center.