I know there are many, many previous discussions on this topic and have not yet found out the issue.
I have two hosts in a cluster, which both were running ESXi, 6.5.0, 7388607. These hosts are HP BL460c Gen9 in an HP c7000 enclosure with flex fabric virtual connects. I wanted to upgrade these hosts but b/c these hosts were using the old net-bnx2x driver and not the qfle3 driver the normal VUM upgrade would fail when the host would come back up to a black screen. So instead of doing alot of manual driver updates, etc. my team opted to do a "fresh" install of 6.5.0, 11925212 (latest) since we've had multiple versions upgraded over the years. This fresh install would reuse the same management IP and vMotion IP's configured. I did a fresh install on the first host to 6.5.0, 11925212. Everything went great as any previous host reinstall that I've done. When I was ready to start on the second host, when the VM's vMotioned to the first host (now upgraded to latest), half of the VM's lost network connection when they landed on this host. Some VM's in the same VLAN were not affected after vMotion. A quick vmotion back didn't seem to fix the issue (my recollection on this part is fuzzy, I was just trying to do everything I could to get VM's back on network). I had another cluster of hosts using 6.5.0, 11925212 that see's the same storage for these VM's and when I vmotioned the VM's to that cluster the VM's randomly still would not have a network connection (but some would automatically ping). Once the VM Windows Guest was restarted or powered off and on, the network connection would come back. Since then I have upgraded the second host in the previous two host cluster to 6.5.0, 11925212, so now the hosts match versions. I have a few VM's left in this cluster still that if I vmotion to the other host (either host back and forth) that it randomly loses network connection. Sometimes it doesnt. For ex - Scenario 1 - VM1 on Host1 is pinging. I migrate VM1 to Host2, it will stop pinging. Migrate VM1 back to Host1, still not pinging. Power off VM or restart and it will resume pinging. Scenario 2 - VM1 on Host1 is pinging. I migrate VM1 to Host2, it will still ping.
Before 6.5.0, 11925212 we did not have this issue on these hosts with the same set of VM's in this cluster. Only after upgrading to 6.5.0, 11925212. For additional troubleshooting, I can edit the VM settings and sometimes choose a different port on the network adapter that is open and the VM resumes its network connection. Sometimes not. I am at a loss as to what the issue is here, but it seems related to 6.5.0, 11925212??? The same vDS is used across ALL clusters in the environment so its not that. I'm less inclined to think its out upstream physical switch b/c we did not have these issues before 6.5.0, 11925212. For now I turned off DRS so VM's do not move around. It does seem like both hosts have this issue. I do have a VMware SR open and am waiting to hear back, but I am very curious what the communities thoughts are.
Other observations in VM1's current network state:
--------------------------------
From my workstation:
-Cannot ping VM1
From Host2 VM1 is on:
-Cannot ping VM1 from its host (host mgmt is in different vlan)
From VM1 with connectivity issues:
-Cannot ping its host
-CAN ping another VM on the same vlan and same host (interesting)
-Cannot ping its gateway
-Cannot ping another VM on the same vlan on a different host in the cluster
From another (lets say VM2) without network connectivity issues on the same host:
-CAN ping the problematic VM successfully
-----------------------------------
An ESXTOP -n on the host indicates VM1 (problematic) and VM2 (no problems) are both using vmnic4
vmnic4 shows 100 under %DRPTX which from the surface seems alarming
vmnic5 shows 0.00 under %DRPTX