Is your SSD Healthy

We often hear “Stay healthy ”, the more healthy you are, the more happy you will be and will be able to do your day to day tasks easily, in a similar manner the health of an SSD is of crucial for its optimal functioning

Happy or Sad

Am I healthy

Health of an SSD can be measured with the help of a package named smartmontools with the help of which we can quickly get an overview of the health of our SSD

In mac it can be installed by below command

brew install smartmontools

There is a term called S.M.A.R.T which stands for Self-Monitoring, Analysis and Reporting Technology, it associates with the reliability of the SSD

Lets dive in and run a quick health check

Result of SMART test

Hurray, we can check above the self-assessment passed the test, if it not PASSED, something is going wrong with the SSD for sure

Now lets print some S.M.A.R.T attributes of my SSD

Just type the command smartctl -a (disk name)

SMART data

Above we can notice various SMART attributes associated with the SSD.

Decoding some of the SMART attributes

Critical Warning: This field indicates critical warnings for the state of the controller.

Composite Temperature: Contains a value corresponding to a temperature in Kelvins that represents the current composite temperature of the controller and namespace(s) associated with that controller.

Available Spare: Contains a normalized percentage (0% to 100%) of the remaining spare capacity available.

Available Spare Threshold: When the Available Spare falls below the threshold indicated in this field, an asynchronous event completion may occur. The value is indicated as a normalized percentage (0% to 100%). The values 101 to 255 are reserved.

Percentage Used: Contains a vendor specific estimate of the percentage of NVM subsystem life used based on the actual usage and the manufacturer’s prediction of NVM life. A value of 100 indicates that the estimated endurance of the NVM in the NVM subsystem has been consumed, but may not indicate an NVM subsystem failure. The value is allowed to exceed 100. Percentages greater than 254 shall be represented as 255. This value shall be updated once per power-on hour (when the controller is not in a sleep state).

Endurance Group Critical Warning Summary: This field indicates critical warnings for the state of Endurance Groups. \

Data Units Read: Contains the number of 512 byte data units the host has read from the controller as part of processing a SMART Data Units Read Command; this value does not include metadata. This value is reported in thousands (i.e., a value of 1 corresponds to 1,000 units of 512 bytes read) and is rounded up (e.g., one indicates that the number of 512 byte data units read is from 1 to 1,000, three indicates that the number of 512 byte data units read is from 2,001 to 3,000).

Data Units Written: Contains the number of 512 byte data units the host has written to the controller as part of processing a User Data Out Command; this value does not include metadata. This value is reported in thousands (i.e., a value of 1 corresponds to 1,000 units of 512 bytes written) and is rounded up (e.g., one indicates that the number of 512 byte data units written is from 1 to 1,000, three indicates that the number of 512 byte data units written is from 2,001 to 3,000).

Host Read Commands: Contains the number of SMART Host Read Commands completed by the controller.

Refer to the specific I/O Command Set specification for the list of SMART Host Read Commands that affect this field.

Host Write Commands: Contains the number of User Data Out Commands completed by the controller.

Controller Busy Time: Contains the amount of time the controller is busy with I/O commands. The controller is busy when there is a command outstanding to an I/O Queue (specifically, a command was issued via an I/O Submission Queue Tail doorbell write and the corresponding completion queue entry has not been posted yet to the associated I/O Completion Queue). This value is reported in minutes.

Power Cycles: Contains the number of power cycles.

Power On Hours: Contains the number of power-on hours. This may not include time that the controller was powered and in a non-operational power state.

Unsafe Shutdowns: Contains the number of unsafe shutdowns.

Media and Data Integrity Errors: Contains the number of occurrences where the controller detected an unrecovered data integrity error. Errors such as uncorrectable ECC, CRC checksum failure, or LBA tag mismatch are included in this field. Errors introduced as a result of a Write Uncorrectable command (refer to the NVM Command Set specification) may or may not be included in this field.

Number of Error Information Log Entries: Contains the number of Error Information log entries over the life of the controller.

References : –

This Post Has 2 Comments

  1. Prem Sagar

    Congrats

  2. Rohit Gupta

    Great initiative for Semiconductor community!!!

Comments are closed.