en
es en
Phone +34 985 73 39 52

Blog

How to Assess the Performance of Critical Cooling in Data Centers

Continuous monitoring of the quality and performance of the cooling fluid in data centers (DCs) and AI servers has become a critical and key factor in ensuring the facility’s availability, energy efficiency, and regulatory compliance.

Technical literature and ASHRAE guidelines even emphasize that, in certain cooling architectures, fluid chemistry is just as critical as the mechanical design itself: a deviation in water or coolant quality leads to corrosion, scaling, microbiological growth, or fouling, and ultimately, a loss of cooling capacity and failures in high-value hardware.

What do we refer when we talk about a cooling fluid in a data center and AI servers?

In an AI data center, the “cooling fluid” is the medium that extracts heat from high-performance hardware and transfers it to the main cooling system (chillers, dry coolers, cooling towers, etc.).

Depending on the cooling architecture used, three main families of fluids are employed: deionized water, water-glycol mixtures, and dielectric fluids.

For example, in direct-to-chip solutions and closed loops associated with cold plates, high-purity treated water (often deionized) or water-glycol is frequently used due to its high heat capacity.

On the other hand, in immersion or cooling systems where the liquid is in direct contact with electronic boards and connectors, specific dielectric fluids are used that combine very low electrical conductivity with good thermal and chemical stability to protect the equipment.

Why is it important to maintain and monitor cooling water?

ASHRAE, in its reference document on water-cooled servers, emphasizes that fluid quality and continuous monitoring are just as important as the mechanical design of the cooling system itself. The white paper distinguishes between the facility water loop (FWS) and the technology loop (TCS)—the latter being more demanding and directly supplying the hardware’s cold plates and microchannels—and warns that a misinterpretation of quality requirements can lead to cost overruns or, worse, corrosion, scaling, and fouling that compromise server cooling.

Esquema del sistema de refrigeración líquida CDU en un centro de datos

Diagram of the CDU liquid cooling system in a data center

In fact, corrosion, mineral scale, and microbiological growth (or fouling) are the three main enemies of liquid cooling. These phenomena are directly linked to water quality and the effectiveness of the treatment program; they reduce the effective cross-sectional area of pipes and microchannels, increase the thermal resistance of heat exchangers, and raise pressure drops, which results in reduced heat removal capacity, higher energy consumption, and a greater likelihood of hot spots and thermal throttling (thermal throttling) in AI clusters.

In the context of AI data centers, where power densities per rack can be several times higher than those of a traditional data center and where liquid cooling has become the dominant option over air cooling, the quality of the coolant is a determining factor in hardware availability and lifespan.

A slight deviation in water chemistry (for example, loss of inhibitor or increased turbidity) can trigger corrosion, blockages, and leaks within weeks or months in systems that represent millions of dollars in investments in AI engine training servers.

How to monitor the performance of cooling fluid in data centers and AI servers

Monitoring the performance of critical cooling involves going beyond simply ensuring that the system “doesn’t overheat” and instead comprehensively tracking the thermal behavior of the hardware as well as the chemical and mechanical health of the fluid itself.

These indicators of water quality and corrosion are obtained, for example, through the smart inline sensors manufactured by Pyxis Lab, which are integrated into the industrial cooling monitoring solutions we offer at Envira.

CTA sensors for monitoring coolant quality

These sensors enable continuous monitoring of key cooling fluid quality parameters directly in process lines, including:

  • Combined PTSA and turbidity sensors that allow simultaneous monitoring of inhibitor dosage via a fluorescent tracer and the presence of suspended solids that may indicate fouling or filtration issues.
  • pH and ORP sensors, designed for cooling water with high turbidity and contamination, providing robust measurements in demanding environments such as CPD loops.
  • Optical dissolved oxygen and ultra-low conductivity sensors, suitable for monitoring the chemistry of high-purity loops and detecting small variations in dissolved salts that could compromise system integrity.
  • LPR-based corrosion rate sensors, capable of reporting general and localized corrosion in real time—one of the most direct indicators of the performance of the chemical treatment program and the quality of the cooling fluid.

Conclusion

In a context where liquid cooling is becoming the backbone of data centers and AI clusters, measuring and maintaining the quality of the coolant is no longer a secondary task, but a central element of the reliability and efficiency strategy.

Integrating smart sensors and continuous monitoring systems makes it possible to turn the coolant into a true performance indicator.

Frequently asked questions about monitoring the quality of cooling fluid in data centres

Is it really necessary to monitor the coolant if I’m already measuring rack and CPU/GPU temperatures?

The temperature of racks and processors is a lagging indicator: by the time it spikes, the problem with the coolant (corrosion, scale, fouling, lack of inhibitor) is usually already well advanced. Monitoring the chemistry and condition of the coolant allows deviations to be detected weeks or months before they result in hotspots, thermal throttling or hardware failures.

Isn't regular laboratory testing enough?

Laboratory analyses are necessary, but in AI cooling systems, which operate 24/7, experience shows that changes in water quality can occur within a matter of hours or days as a result of, for example, a water top-up, a dosing failure or a one-off contamination incident. Inline sensors provide continuous monitoring and enable alarms and corrective actions to be triggered between analyses, reducing the risk of ‘blind spots’.

How do these sensors integrate into my existing infrastructure?

The Pyxis smart sensors distributed by Envira feature 4–20 mA and RS-485 Modbus outputs, meaning they can be connected directly to existing PLCs, CDUs, BMSs or SCADA systems without the need for intermediate equipment. Envira helps to identify the most representative installation points (loop returns, TCS, CDU manifolds, dosing lines) and to configure alarm ranges and trends.

What impact does monitoring fluid quality have on PUE and WUE?

A refrigerant in good condition reduces fouling and pressure drops, improves heat exchange and allows chillers and pumps to operate closer to their optimum point, which helps to improve PUE (Power Usage Effectiveness). In evaporative and water-intensive systems, precise control of water chemistry and concentration cycles also helps to optimise WUE (Water Usage Effectiveness), reducing water consumption per kWh of energy consumed.

Do these systems affect the hardware manufacturers’ warranties?

Far from having a negative impact, proper monitoring of the coolant’s quality helps ensure compliance with the specifications set by manufacturers of servers, cold plates and chillers, who typically stipulate ranges for pH, conductivity and other water or coolant parameters in order to maintain the warranty. Keeping monitoring records makes it easier to demonstrate that the system has operated within those ranges.

What return on investment can I expect?

The return on investment depends on the size of the data centre, the complexity of the cooling system and the initial situation, but studies and real-world examples demonstrate significant savings through a reduction in unplanned downtime, less frequent cleaning, an extended service life for equipment and improved energy efficiency. In AI environments, where the cost per hour of downtime can be very high and the hardware is particularly expensive, the value of preventing a single serious incident often justifies the investment in advanced coolant monitoring.

see more

Do you want to know more about Envira?

Contact us