A common situation in practical industrial applications related to product development is the need to perform quick surveys inside a space of state parameters. In mature and very competitive industrial sectors like aerospace, this need is motivated by the drive to generate products having good technical performance within design cycles that are as short as is feasible. That is: time is a key factor in industrial competitiveness because shortening the time market may provide a leading economic advantage during the product life cycle.
In the specific case of aeronautics, the prediction of the aerodynamic forces and, more generally, skin surface value distributions experimented by an aircraft is an important feature in order to optimally design its structural components so that the weight of the structure is the minimum possible, but at the same time being able to withstand the expected aerodynamic forces.
Thanks to the increase in the use of CFD the determination of the aerodynamic forces on an aircraft is commonly done by numerically solving the Reynolds Averaged Navier-Stokes equations (RANS equations from now onwards) that model the movement of the flow around the aircraft, using discrete finite element or finite volume models. With the demand of accuracy posed in the aeronautical industry, each one of these computations requires important computational resources.
A first known approach for improving the execution of said equations for a given model is to provide analytical techniques that simplify the calculations needed for arriving to a solution. An example in this respect can be found in US 2009/157364 in the name of the applicant.
A second approach is the use of computer techniques whether for accelerating the calculation process or for optimizing the computational resources needed for solving a given problem.
To accelerate the calculating process it is common to use parallel machines. The grid is partitioned into several sub-grids, which are solved separately. When each temporal iteration is finished, it is necessary to send the values of the variables of boundary vertices to the neighbour vertices. Therefore, as the grid is partitioned into more sub-grids, communications are increasing until a point is reached where the increase in speed by adding more machines is marginal, because most of the time is spent in communications.
The addition of accelerator devices to a conventional computer to improve the execution time of a given algorithm has also been proposed. As a basis for building these devices two technologies have been used: FPGA (Field-Programmable Gate Array) and GP-GPU (General Purpose Graphics Processing Unit). The format of these accelerator devices can be either that of expansion cards such as PCI (Peripheral Component Interconnect) or that of PCI Express (Peripheral Component Interconnect Express) or plug-in modules which fit into the processor socket (in-socket accelerators), such as the XD2000i of XtremeData).
In the accelerator device both the computationally more expensive sections of the algorithm or the entire algorithm can be executed. In particular, US 2007/0219766 discloses the use of a PCI card with a FPGA for accelerating the computationally more expensive sections of the algorithm.
In Reference [1] is disclosed an alternative based on an in-socket accelerator (ISA) which also uses the approach of executing in the FPGA the computationally most expensive sections of the algorithm.
US 2005/0288800 discloses an architecture with several PCI cards interconnected through a dedicated network where a section or the entire algorithm can be executed.
Finally, Reference [2] discloses a solution that executes a Navier-Stokes code completely in GP-GPUs.
However, none of said proposals can achieve the performance required in industrial environments. On the one hand, the proposal that executes only a part of the algorithm in the accelerator devices does not usually obtain good results due to strong communication overload. On the other hand, the number of expansion cards or processor sockets available on a system is limited and therefore so is the overall acceleration that can be achieved in the proposals disclosed in US 2005/0288800 and in Reference [2].
Additionally the proposals aimed to the full execution of the algorithm have significant limitations on the size of the grid that can be processed and/or on the processing speed. Reference [2] shows results for networks of hundreds of thousands of vertices. In US 2005/0288800 a preferred embodiment is disclosed with a pipeline between two ZBT memories that limits the number of vertices of the grid that can be processed per cycle, since these calculations involve the reading of tens or even hundreds of variables, including those of their own vertex and all its neighbours.
A system allowing a quick execution of scientific codes such as the fluid dynamics codes used in the aeronautic industry that involve grids of tens or hundreds of millions of vertices and codes such as Reynolds-Averaged Navier-Stokes (RANS) is therefore desirable.
The present invention is addressed to the attention of this demand.