Here I will talk about how we can use FIO (Flexible IO tester) to benchmark the performance of different types of storage devices. For the demonstration below an M.2 NVME SSD has been taken as device under test
FIO is a free tool which is widely used across the industry for performance benchmarking of an SSD
I will go over with the basic Sequential Read and Write tests. I have connected a M.2 NVME SSD with the below Ubuntu system.
Below is how we can see the block devices connected to a Linux System
First I have performed the Sequential Read Operation which implies that we will perform a read operation in a Sequential Manner on the SSD
We can run the below command to perform a read operation
Below is the what each parameter implies : –
Filename => It is the device under test, where we want to run our test
ioengine=libaio=> The type of IO engine we will use
bs=4K => When the read operation is being the performed, the data will be read in chunks of 4K block size
rw=read => this will be a read operation only
numjobs=16 => how many threads will the the n number of CPU initiate depending upon the host configuration
iodepth=128 => for each thread initiated by the CPU, a queue will be there , for each thread the queue which is there , that depth will be here 128, in other words it is the maximum number of outstanding I/O that FIO will try and queue internally. Here for each thread the CPU can submit 128 commands, and each command will read data of 4KB
runtime=10 => It implies that the job will run for 10s
group_reoorting=> the format of the output
direct=1 => It tells if FIO will use the direct IO or buffered IO, here we have set the buffered IO as 1 which will imply that FIO will not use the buffered IO
timebased=> the measured throughput is the total amount of data which is transferred by all the job/time it took for the last job to get done
name=read-test-job=> this is the name of the file that FIO will create to run the test on it
eta-newline=> Force a new line for every time period passed. When this unit is omitted, the value is interpreted in seconds
Lets decode the below output
Lets decode step by step
The above implies that a performed a total IO of 237K at 972Mb/s
bw=> the total bandwidth
Now the performance is always measured in IOPS. IOPS => IO per second , IO => the number of blocks which has been transmitted per second , block size we gave is 4K, Here IOPS = 273K
slat => submission latency – which is the time it took submit this IO for processing to the kernel
clat=> completion latency – Time that passes in between the submission to the kernel till when IO gets completed Excluding the submission latency
lat=> It is the IO completion latency
clat percentiles=> The is completion latency percentiles
bw=> bandwidth
cpu=> it shows the user and the sytem’s CPU percentages , ctx =>context switches, majf,minf= major and minor page faults
IO depths=> how many IO’s FIO issues to the OS at a given time depending upon the settings
submit=> the number of IO’ s submitted by the FIO at a particular time
complete=> the number of IO’s that are completed which were submitted by the FIO
issued=> it is the number of IO’s that are issued
Latency=> we can set FIO with a latency target, which can be used to fine tune the throughput till the time it can hit our latency which have been configured
Run stats=> the feature of FIO which supports the grouping of different test for aggregation
bw=> It is the total bandwidth/throughput = 972Mb/s
IO=>The total number of IO’s which have been done overall
Similarly, Sequential Write test was run, we can see its output below
We can see the throughput almost got halved as in Sequential Write (497MB/s) vs Sequential Read (972Mb/s)