Quantcast
Channel: VMware Communities : Discussion List - ESXi
Viewing all articles
Browse latest Browse all 8132

lost access to volume

$
0
0

I am running ESXi on 3 different machines at high load (cpu + disk), and I am encountering the following error event:

 

Lost access to volume GUID (XXX) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly. 3/21/2015 3:48:57 PM

 

shortly thereafter, access is restored:

 

Successfully restored access to volume  GUID (XXX) following  connectivity issues. 3/21/2015 3:49:24 PM

 

Interestingly, these events occur exactly every 6 hours on each affected machine:

 

2015-03-21T13:39:01.303Z cpu30:32857)HBX: 270: Reclaimed heartbeat for volume GUID (XXX): [Timeout] Offset 3796992

2015-03-21T19:39:13.824Z cpu20:32855)HBX: 270: Reclaimed heartbeat for volume GUID (XXX): [Timeout] Offset 3796992

2015-03-22T01:39:09.569Z cpu0:32856)HBX: 270: Reclaimed heartbeat for volume GUID (XXX): [Timeout] Offset 3796992

 

Most of the search results (and the VMware KB) discuss issues related to FC/iSCSI/network connected datastores. However, these datastores are local disks connected to a MegaRAID SAS controller.

 

The fact that this occurs every 6 hours made me think there was some sort of cronjob or the like that was running and causing a whole bunch of disk churn, which, combined with an already high disk load, was clogging up the controller. However, I can't find any such cronjob in /var/spool/cron/crontabs/root. I've checked several logs in /var/log, and nothing is jumping out anywhere around those time frames. I've updated to the latest ESXi patches, but that didn't help. Any ideas?

 

FWIW, my hw/sw is:

 

ESXi 5.5.0, 2456374

SuperMicro X10-DRHC

MegaRaid SAS Invader Controller (LSI 3108 SAS3)

3 consumer SSD drives in RAID0 (for the affected datastore)


Viewing all articles
Browse latest Browse all 8132

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>