SSD Performance benchmarking with FIO

Here I will talk about how we can use FIO (Flexible IO tester) to benchmark the performance of  different types of storage devices. For the demonstration below an M.2 NVME SSD has been taken as device under test

FIO is a free tool which is widely used across the industry for performance benchmarking of an SSD

I will go over with the basic Sequential Read and Write tests. I have connected a M.2 NVME SSD with the below Ubuntu system.

Below is how we can see the block devices connected to a Linux System 

First I have performed the Sequential Read Operation which implies that we will perform a read operation in a Sequential Manner on the SSD

We can run the below command to perform a read operation 

Below is the what each parameter implies : – 

Filename => It is the device under test, where we want to run our test 

ioengine=libaio=> The type of IO engine we will use

bs=4K => When the read operation is being the performed, the data will be read in chunks of 4K block size 

rw=read => this will be a read operation only

numjobs=16 => how many threads will the the n number of CPU initiate depending upon the host configuration 

iodepth=128 => for each thread initiated by the CPU, a queue will be there , for each thread the queue which is there , that depth will be here 128, in other words it is the maximum number of outstanding I/O that FIO will try and queue internally. Here for each thread the CPU can submit 128 commands, and each command will read data of 4KB 

runtime=10 => It implies that the job will run for 10s

group_reoorting=> the format of the output 

direct=1 =>  It tells if FIO will use the direct IO or buffered IO, here we have set the buffered IO as 1 which will imply that FIO will not use the buffered IO

timebased=> the measured throughput is the total amount of data which is transferred by all the job/time it took for the last job to get done

name=read-test-job=> this is the name of the file that FIO will create to run the test on it 

eta-newline=> Force a new line for every time period passed. When this unit is omitted, the value is interpreted in seconds 

Lets decode the below output 

Lets decode step by step 

The above implies that a performed a total IO of 237K at 972Mb/s

bw=> the total bandwidth 

Now the performance is always measured in IOPS. IOPS => IO per second , IO => the number of blocks which has been transmitted per second , block size we gave is 4K, Here IOPS = 273K

slat => submission latency – which is the time it took submit this IO for processing to the kernel 

clat=> completion latency – Time that passes in between the submission to the kernel till when IO gets completed Excluding the submission latency

lat=> It is the IO completion latency 

clat percentiles=> The is completion latency percentiles 

bw=> bandwidth

cpu=> it shows the user and the sytem’s CPU percentages , ctx =>context switches, majf,minf= major and minor page faults 


IO depths=> 
how many IO’s FIO issues to the OS at a given time depending upon the settings 

submit=> the number of IO’ s submitted by the FIO at a particular time

complete=> the number of IO’s that are completed which were submitted by the FIO

issued=> it is the number of IO’s that are issued

Latency=> we can set FIO with a latency target, which can be used to fine tune the throughput till the time it can hit our latency which have been configured 

Run stats=> the feature of FIO which supports the grouping of different test for aggregation

bw=> It is the total bandwidth/throughput = 972Mb/s

IO=>The total number of IO’s which have been done overall

Similarly, Sequential Write test was run, we can see its output below 

We can see the throughput almost got halved as in Sequential Write (497MB/s) vs Sequential Read (972Mb/s)