Hopefully someone can help me with a fairly frustrating situation. I have done quite a few searches and read a number of articles here and elsewhere - but the situation doesn't seem to match precisely.
The problem arose with an "inelegant" reboot of my ESXi server. This is one of a few free installations using ScaleIO to map LUNs for the datastores. I was encountering what appeared to be a hung server, tried to perform a controlled reboot (which did not seem to take) followed by a power cycle of the server. When it came back up, the VM running on a local datastore came back fine, but the datastores on the ScaleIO LUNs were not showing up. Everything seems to be saying there is no filesystem - but this doesn't make sense for every LUN in question, and when I look at the disks with dd it seems to show data I would expect.
What I see:
* The LUNs themselves are discovered along with the partitions on them. When I look at the devices in the GUI or cli, I see them no problem.
[root@vm-host101:/var/log] ls /vmfs/devices/disks/eui*
/vmfs/devices/disks/eui.0f60934767ed6d2731f35a0d00000005
/vmfs/devices/disks/eui.0f60934767ed6d273d0a632d00000000
/vmfs/devices/disks/eui.0f60934767ed6d273d0a632d00000000:1
[root@vm-host101:/var/log] esxcli storage core device list | grep eui.
eui.0f60934767ed6d273d0a632d00000000
Display Name: EMC Fibre Channel Disk (eui.0f60934767ed6d273d0a632d00000000)
Devfs Path: /vmfs/devices/disks/eui.0f60934767ed6d273d0a632d00000000
eui.0f60934767ed6d2731f35a0d00000005
Display Name: EMC Fibre Channel Disk (eui.0f60934767ed6d2731f35a0d00000005)
Devfs Path: /vmfs/devices/disks/eui.0f60934767ed6d2731f35a0d00000005
* partedUtil and coma shows the GPT partitions still there.
[root@vm-host101:/var/log] partedUtil getptbl /dev/disks/eui.0f60934767ed6d273d0
a632d00000000
gpt
35507 255 63 570425344
1 128 570425304 AA31E02A400F11DB9590000C2911D1B8 vmfs 0
[root@vm-host101:/var/log] voma -m ptbl -f check -d /vmfs/devices/disks/eui.0f60
934767ed6d273d0a632d00000000
Running Partition table checker version 0.1 in check mode
Phase 1: Checking device for valid primary GPT
Detected valid GPT signatures
Number Start End Type
1 128 570425304 vmfs
Found a valid partition table on the device
Total Errors Found: 0
* esxcli shows no filesystems on those LUNs
[root@vm-host101:/var/log] esxcli storage filesystem list
Mount Point Volume Name UUID Mounted Type Size Free
------------------------------------------------- ----------- ----------------------------------- ------- ------ ------------ ------------
/vmfs/volumes/598ccaea-8afb9c77-80a4-001517d9a462 datastore1 598ccaea-8afb9c77-80a4-001517d9a462 true VMFS-6 241055039488 239537750016
/vmfs/volumes/dcaf3470-159653b7-59a1-55193a835180 dcaf3470-159653b7-59a1-55193a835180 true vfat 261853184 110673920
/vmfs/volumes/2044b777-2642e6ce-6fc4-7407f0f24801 2044b777-2642e6ce-6fc4-7407f0f24801 true vfat 261853184 110866432
/vmfs/volumes/598ccaf4-06170a72-d2b8-001517d9a462 598ccaf4-06170a72-d2b8-001517d9a462 true vfat 4293591040 4285333504
/vmfs/volumes/598ccadb-9d7c923c-6aa1-001517d9a462 598ccadb-9d7c923c-6aa1-001517d9a462 true vfat 299712512 83927040
* esxcfg-volume shows no snapshots
[root@vm-host101:/var/log] esxcfg-volume -l
[root@vm-host101:/var/log]
* voma similarly shows an issue with the file system
[root@vm-host101:/var/log] voma -m vmfs -f check -d /vmfs/devices/disks/eui.0f60
934767ed6d273d0a632d00000000:1
Checking if device is actively used by other hosts
Running VMFS Checker version 2.1 in check mode
Initializing LVM metadata, Basic Checks will be done
Initializing LVM metadata..-
LVM magic not found at expected Offset,
It might take long time to search in rest of the disk.
VMware ESX Question:
Do you want to continue (Y/N)?
0) _Yes
1) _No
Select a number from 0-1: 0
ERROR: LVM Major or Minor version Mismatch, Not supported
ERROR: Failed to Initialize LVM Metadata
VOMA failed to check device : Not Supported
Total Errors Found: 0
Kindly Consult VMware Support for further assistance
* dd shows data exists and even shows the datastore label I would expect.
[root@vm-host101:/var/log] dd if=/vmfs/devices/disks/eui.0f60934767ed6d273d0a632
d00000000:1 | od -c | head
0000000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
*
114000000 ^ 0N1 0E3 / 030 \0 \0 \0 Q e S 0J5 X 1 \0 0O1
114000020 0D4 0K4 0J0 P 0C1 0H1 i 0O1 026 \0 \0 \0 V M -
114000040 W i n d o w s - 7 - x 6 4 - P r
114000060 o \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
114000100 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
*
114000220 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 002 \0
114000240 \0 \0 \0 020 \0 \0 \0 \0 \0 e S 0J5 X 001 \0 \0
I had once seen what I would have thought to be a similar issue, which I rectified by mapping the LUN to a different server. But in this case this hasn't worked. I even built a new server and added it to the ScaleIO network and I still cannot see the filesystem.
In a "typical" Unix environment I would expect I could do an fsck, but I'm not seeing such an opportunity here.
Any help would be greatly appreciated.