You are currently viewing Can your host be damaged if the storage device overheats? The HCTM feature in Nvme is here to help prevent that.

Can your host be damaged if the storage device overheats? The HCTM feature in Nvme is here to help prevent that.

Have you ever wondered what happens when a storage device, such as a memory card, overheats? This phenomenon, known as thermal throttling, can impact the host device (like a camera) to which the storage device is connected. As a result, performance drops can be seen.

Let’s consider a scenario where a camera has an NVMe-based memory card inserted into its slot. While recording, if the memory card becomes too hot, thermal throttling might kick in to prevent overheating. This could impact the camera in several ways:

  1. Performance Degradation: The camera might experience reduced performance due to the memory card throttling its speed to manage the heat.
  2. Recording Issues: Overheating could lead to potential interruptions or corruption in the recorded footage, affecting the quality of your recordings.
  3. Device Safety: In extreme cases, excessive heat might cause the camera to shut down or malfunction to prevent damage, impacting your ability to use the device effectively.
  4. Potential Damage: Continuous overheating can lead to long-term damage to both the memory card and the camera, affecting their longevity and reliability.

This is where Host Controlled Thermal Management (HCTM) in NVMe comes into play. Simply put, the host—in this case, the camera—can control the temperature of the storage device.

You can determine whether a storage device supports HCTM by retrieving the Identify Controller data, which is one of the admin commands in NVMe. For instance, in case of my personal SSD, HCTM is not supported, as confirmed by using the NVMe CLI. The results show that my SSD does not support HCTM.

Host Controlled Thermal Management offers a way for the host to configure a controller to automatically switch between different power states or carry out vendor-specific thermal management actions. This helps to meet the thermal management requirements set by the host.

The host establishes and activates the thermal management requirements by setting the Thermal Management Temperature 1 and/or Thermal Management Temperature 2 fields to a non-zero value using a Set Features command.

Here are two important parameters associated with HCTM 

  1. TMT1 – Thermal management temperature 1 
  2. TMT2 – Thermal management temperature 2 

As per the Nvme spec

TMT1 implies that if Composite Temperature : 

  1. Is at or above the current value of TMT1 
  2. Is below the current value of the TMT2  

In the above scenario, the controller should begin transitioning to lower power active states or perform vendor-specific thermal management actions. This should be done with minimal impact on performance to reduce the composite temperature.

Thermal Management Temperature 2 field specifies that if the Composite Temperature is at or above this value, than the controller shall start transitioning to lower power active power states or perform vendor specific thermal management actions regardless of the impact on performance in order to attempt to reduce the Composite Temperature (e.g., transition to an active power state that performs heavy throttling).

Let us understand further with an example : – 

Vendor specific temperature is the temperature which the vendor has defined, in HCTM context here, it is  the temperature at which thermal throttling will trigger if it crosses beyond that temperature 

Let Composite temperature = C 

Vendor Specific Temperature = VS 

  1. VS < C < TMT1 => no thermal throttling 
  2. TMT1 < C < TMT2 => Light thermal throttling
  3.  C > TMT2 => Heave thermal throttling 

In cases 2 and 3, if the controller was in a lower power active state or performing vendor-specific thermal management actions, it should return to the active power state it was in before transitioning to a lower power active state.

Since the NVMe specification defines composite temperature as the basis for HCTM, its value may vary depending on how the vendor has defined it. This composite temperature could be influenced by both the flash temperature and the PCP temperature.

Practical scenario where HCTM can come to our rescue ?

Let us understand with an analogy

Imagine you’re transferring images from a memory card inside your camera to your laptop. If the memory card starts reaching throttling temperatures image transfer speed will have an impact. Depending on whether light or heavy throttling occurs, the copying speed will be impacted. Once the camera triggers HCTM, after sometime when the card temperature is in normal range, the speed of copying files will again get back to optimal.

For example, let’s compare the speed of copying files to a running car, where the camera represents the car and its memory card acts as the engine. As the engine of the car (or the memory card) heats up, it can affect the car’s ability to accelerate, inevitably reducing its speed. The heat warning sign notifies us of the engine temperature, further putting coolant cools down the engine, similarly HCTM feature helps in bringing the memory card to optimal temperatures.

At the end we can say the HCTM is a savior for the host as well as storage device.

Sources : –

https://designer.microsoft.com/home

https://pixabay.com