Hi dear Vmware community! I'm a rookie with ESXI and faced with a problem. I have an issue with my ESXI work which i will describe below and hope someone could help me with this, thank you all in advance!
So my setup:
- CPU Intel Xeon Silver 4214
- Supermicro MBD-X11DPL-I-O - ATX
- Supermicro NMVe AOC-SLG3-2M2 with two Samsung M2 disks in it (Samsung 970 Evo plus 1TB)
- 4 pieces of Crucial 32GB DDR4-2666 RDIMM
Esxi runs from USB flash drive.
I installed Intel-nvme-vmd (1.8.0.1001-1oem.670.0.0.8169922) driver for vmhba adapter (with the driver by default iavmd_1.2.0.1011-2vmw.670.0.0.8169922 it fails to BSOD )
The problem i will describe below occurs after some period of time (as usual after ~20 hours of working under usual load) i turned HOST on and launch set of my Virtual machines.
All the VMs are configured identically and stored on different datastores (some Vms on 1st, some on 2nd disk) :
Disk - Thick provisioned, eagerly zeroed with Nvme Controller
Network - VMxnet 3
Others option mostly default..
Free space on both storage is more than 50%
So the problem is - Very high latency on monitor:
i started dig dipper and found very high Kavg during see high latency above (It easily jumps to 1000+)
What is strange - this high KAVG appears only on certain SSD (let it be marked with "A") disk. I tried change places for disks in PCI card, and high KAVG was still on those disk A.
At the same time i do not see any Queue take place:
Also it seems to me that latency on Vms is pretty normal...
I switched off hardware acceleration (Vaai), but it did not help.
I attached 2 log files:
- vmkernelv
- mkwarning
There are a lot of errors and warnings like:
HppThrottleLogForDevice:564: Cmd 0x42 (0x459a4233da40, 2108176) to dev "t10.NVMe____Samsung_SSD_970_EVO_Plus_1TB____________S4EWNF0M717097A_____00000001" on path "vmhba2:C0:T1:L0" Failed:
2020-02-14T08:56:48.253Z cpu3:2097198)WARNING: HPP: HppThrottleLogForDevice:570: Error status H:0xc D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0.
In case any other logs required i will provide for analyzing the issue. I will really-really appreciate any help!



