hello all,
from vmkernel.log, we always got messege "ALERT: NMI: 709: NMI IPI received. xxxxxxx" every 5 minutes or more. Its marked by vcenter as an alert, and always send email notification. why this alert always appear on vmkernel.log? Is this host error or vsan error?
we run vsphere esxi 6.0 update 3 in our environment.
"
2017-03-25T04:25:46.581Z cpu0:2606979)nvme_CC cmd:85 sense:05 asc:20 ascq:00
2017-03-25T04:25:46.587Z cpu0:2606979)nvme_CC cmd:85 sense:05 asc:20 ascq:00
2017-03-25T04:25:46.623Z cpu0:2606979)nvme_CC cmd:85 sense:05 asc:20 ascq:00
2017-03-25T04:25:46.624Z cpu0:2606979)nvme_CC cmd:85 sense:05 asc:20 ascq:00
2017-03-25T04:25:55.078Z cpu9:33485)NMP: nmp_ResetDeviceLogThrottling:3343: Error status H:0x0 D:0x2 P:0x0 Sense Data: 0x5 0x20 0x0 from dev "naa.2ff70002ac018904" occurred 22 times(of 22 commands)
2017-03-25T04:28:48.079Z cpu38:32957)WARNING: Heartbeat: 796: PCPU 22 didn't have a heartbeat for 8 seconds; *may* be locked up.
2017-03-25T04:28:48.079Z cpu22:33422)ALERT: NMI: 709: NMI IPI received. Was eip(base):ebp:cs [0xe058cd(0x418014800000):0x43911471b788:0x4010](Src 0x1, CPU22)
2017-03-25T04:28:48.079Z cpu22:33422)0x43911471b5b0:[0x4180156058cd]Bucketlist_LowerBound@LSOMCommon#1+0x2ad stack: 0x43911471b7c8
2017-03-25T04:28:48.079Z cpu22:33422)0x43911471b610:[0x418015608f0b]Rangemap_LowerBound@LSOMCommon#1+0x43 stack: 0x46142f4001
2017-03-25T04:28:48.079Z cpu22:33422)0x43911471b690:[0x418015609106]LSOMLbaTable_Find@LSOMCommon#1+0x16 stack: 0x0
2017-03-25T04:28:48.079Z cpu22:33422)0x43911471b6a0:[0x418015680bd5]PLOG_FillRdtSg@com.vmware.plog#0.0.0.1+0x1ad stack: 0x0
2017-03-25T04:28:48.079Z cpu22:33422)0x43911471b860:[0x41801568da62]PLOGRunElevator@com.vmware.plog#0.0.0.1+0x18fe stack: 0x0
2017-03-25T04:28:48.079Z cpu22:33422)0x43911471ba90:[0x4180155cee33]VSANServerMainLoop@com.vmware.vsanutil#0.0.0.1+0x2bb stack: 0x0
2017-03-25T04:28:48.079Z cpu22:33422)0x43911471bb70:[0x4180148c0ac2]WorldletBHHandler@vmkernel#nover+0x51e stack: 0x0
2017-03-25T04:28:48.079Z cpu22:33422)0x43911471bcd0:[0x41801483336d]BH_Check@vmkernel#nover+0xe1 stack: 0x417fd48f4a88
2017-03-25T04:28:48.079Z cpu22:33422)0x43911471bd40:[0x418014a11ad2]CpuSchedIdleLoopInt@vmkernel#nover+0x182 stack: 0x4
2017-03-25T04:28:48.079Z cpu22:33422)0x43911471bdc0:[0x418014a153a3]CpuSchedDispatch@vmkernel#nover+0x16b3 stack: 0x0
2017-03-25T04:28:48.079Z cpu22:33422)0x43911471bee0:[0x418014a15f68]CpuSchedWait@vmkernel#nover+0x240 stack: 0x0
2017-03-25T04:28:48.079Z cpu22:33422)0x43911471bf60:[0x4180149dbd6c]NetPollWorldCallback@vmkernel#nover+0x54 stack: 0x100f
2017-03-25T04:28:48.079Z cpu22:33422)0x43911471bfd0:[0x418014a16bfe]CpuSched_StartWorld@vmkernel#nover+0xa2 stack: 0x0
2017-03-25T04:28:56.945Z cpu14:32873)HBX: 283: '37e2d558-b46c-8a59-1c09-0cc47a39b6d8': HB at offset 3522560 - Reclaimed heartbeat [Timeout]:
2017-03-25T04:28:56.945Z cpu14:32873) [HB state abcdef02 offset 3522560 gen 1 stampUS 665251903642 uuid 58cbcbcc-42978f31-9737-0cc47a39b6d8 jrnl <FB 206000> drv 14.61 lockImpl 4]
2017-03-25T04:30:08.113Z cpu7:33923)ScsiDeviceIO: 2652: Cmd(0x43b881203500) 0x28, CmdSN 0x1 from world 2608079 to dev "naa.2ff70002ac018904" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
"
we appreciate any advise and comments. thanks.
regards,
banganto