Mark Lommers, M.AIRAH, has built considerable expertise in immersion-cooled supercomputers, and along the way developed a familiarity with some characters known as Bazza, Bubba and Bruce. Ecolibrium editor Matt Dillon recently broke bread with the Perth-based tech expert.
Ecolibrium: Could you tell us a little bit about the immersion-cooled supercomputers that you have worked on?
Mark Lommers: Over the years I have worked on several immersion-cooled super computers throughout the world, all with the same underlying technology, a method of self-contained immersion cooling using an “in tank” heat exchanger to manage the cooling of these high-performance computers using a dielectric fluid.
Through the support of DUG Technologies, a West Australian supercomputing technology company, I have worked on five separate immersion-cooled high-performance computing clusters throughout the world ranging from 60kW up to a mammoth 15MW data centre in Houston Texas.
In true Australian fashion, the MD of DUG Technology gave each supercomputer an affectionate nickname: Bazza in London, Bubba in Houston, Bohdi in Kuala Lumpur, and Bruce here in Perth.
Amazingly, Bazza, Bohdi and Bruce, as well as Bubba’s predecessor in Houston, were built within typical office buildings, demonstrating the flexibility of this technology.
Compared with traditional air-cooled data centres, these immersion-cooled supercomputers are near silent, with the only noise coming from the occasional network switch serving the immersed equipment.
An immersion–cooled supercomputer is one that uses a dielectric fluid to provide the heat transport from the heat–generating components of the computing equipment to a heat–rejection system
Eco: What is an immersion-cooled high-performance computing system and how does it work? Can it really achieve 46 per cent energy savings?!
ML: An immersion-cooled supercomputer is one that uses a dielectric fluid to provide the heat transport from the heat-generating components of the computing equipment to a heat-rejection system.
The “DUG Cool” patented system (US Patent 11026344) consists of an array of computing units – servers and the like – vertically mounted in a tank of dielectric fluid. A specially designed heat exchanger module is installed within the tank to control the circulation of the dielectric fluid and cool the computer equipment.
This heat exchanger unit also couples to the heat rejection source, such as dry coolers, cooling towers, chilled water or the like, depending on what cooling resources are available in the location.
It is really designed to “keep it simple,” with the added benefit that all tanks – or racks – have independent cooling systems that do not affect each other during operation.
On a mechanical cooling level, the systems can achieve greater than 46 per cent reduction in energy consumption attributable to mechanical systems. Through the removal of energy-hungry air-circulation fans in the immersed computing equipment – these are no longer required – we can see an instant 10 per cent reduction in energy use, which translates into a direct reduction in overall cooling requirement.
This is coupled to a completely free-cooling heat rejection system. The cooling requirements are only “condenser water” temperature cooling water at 29.5°C. This results in a mechanical power usage effectiveness (PUE) – the ratio of IT power usage divided by the sum of the IT power usage plus the cooling system’s power usage – which peaks at mPUE=1.08. This can be compared to a well-designed chilled-water-based traditional data centre cooling system at mPUE=1.20+.
This move to free-cooling-based heat rejections systems also cuts out most of the refrigerant use, as the only air cooling required in the data hall is a small amount of space cooling to keep attending IT experts at a comfortable temperature.
“Supercomputer systems are very expensive to construct, maintain and operate. Power supply and cooling system operation of a conventional data centre represents about 35 per cent of the total cost of ownership”
Eco: In 2016, you won an AIRAH Award for Excellence in Innovation for one of your designs. Could you tell us about the design, and how you arrived at it?
ML: I won the innovation award for the implementation of this immersion cooling systems at DUG Technologies worldwide computing centres.
Supercomputer systems are very expensive to construct, maintain and operate. Power supply and cooling system operation of a conventional data centre represents about 35 per cent of the total cost of ownership of the facility.
Immersion cooling systems are a way to reduce total cost of ownership for datacentre operators through improvements in efficiency.
Although cooling electronic systems with dielectric fluids is not a new technology, directly immersing computers into the heat-transfer medium is a recent phenomenon.
Presented with a problem by my then clients DUG Technology, I commenced a research and development program to develop the modular system and heat exchanger module.
I wanted to keep it simple so I engineered the system from the ground up to be a robust application of heat-transfer first principles.
The general design philosophy was to develop a series of “racks” – or tanks – to house the IT equipment, each with its own independent cooling module connected to a common cooling water loop.
After modelling the system requirements using fluid mechanics and thermodynamics equations, I knew how the performance of the system would be affected by the fluid properties, its temperature, and circulation rate.
By iterative design of the packaging of the whole system we approached what is today known as the “DUG Cool” immersion cooling system.
Eco: Why do supercomputers need this kind of cooling? Apart from, say, the ASKAP Pathfinder initiative, what other kinds of calculations are these tools being used for?
ML: These kinds of supercomputers are used for a vast array of scientific calculations historically in the university research space, including programs such as data processing for the Square Kilometre Array Radio telescope.
For many years supercomputers have featured heavily in the “seismic modelling” industries that use noise-filtering techniques in the search for oil and gas reserves.
From the computational fluid dynamics (CFD) of Formula 1 cars to genome sequencing in the health sciences and more recently training large language model (LLM)-based “AI engines”, such as the ubiquitous ChatGPT, these supercomputers are present wherever there is an economic edge to be gained by getting results as fast as possible.
This avalanche of AI-based technologies that is impacting the world is fuelled by graphics processing unit (GPU) development, which is being led by companies such as NVIDIA and AMD. This technology will bring interaction with supercomputer resources to everyone around the world.
“What we’re talking about isn’t some incremental improvement – it’s holistic and meaningful”
Eco: What’s the biggest challenge associated with these kinds of designs?
ML: The biggest challenge is adoption of the design for third-party applications. The world of high-performance computers, especially in the third-party data centre space, is heavily codified, with standards developed over many years centred around traditional air-cooling systems.
The codified nature coupled with the need to assess all candidate IT equipment for immersion conversion make it difficult to design for a “one size fits all” data centre solution.
Air-cooled data centres just provide a constant supply of cooled air, a known fluid transport medium.
When it comes to the dielectric fluid used in immersion cooling systems, the fluids properties, while far superior to air, require redesign/assessment of heatsinks installed within the IT equipment for performance in the fluid.
Through the work of the Open Compute Foundation’s “Advanced Cooling Systems” group, we are seeing the generation of design standards and best practice-guidelines for the implementation of the innovative cooling systems.
When it comes to the dielectric fluid used in immersion cooling systems, the fluids properties, while far superior to air, require redesign/assessment of heatsinks installed within the IT equipment for performance in the fluid.
Eco: What is the next step in data/centre/supercomputer cooling?
ML: The next step in data centre and supercomputer cooling is dealing with this impending wave of super-high-power CPU and GPU systems primarily used in AI training and inference systems.
We have seen power figures approaching 1,000 Watts per chip in the latest general-purpose GPU systems that form the basis of many of these massively paralleled supercomputers. At 1,000 Watts per processor, we have exceeded the practical limit of air cooling, and new cooling strategies such as immersion cooling and “direct to chip” systems are needed from herein.
There are very few existing data centres with sufficient supplementary cooling water loops of a size to cope with this transition and the enormous power consumption this brings with it. Data centre operators and developers are working quickly to bring to market the next generation of data centres able to serve this AI revolution and the uncertainty it brings.