Has anyone else experienced an increase in CPU scheduler latency when moving from 6.0 to 6.5 (and 6.7)? I'm finding that the %RDY values are gnerally higher on 6.5 and 6.7 for equivalently loaded hosts. This may be specific to our workload as the effect is only present on certain VMs - those which engage in generating/consuming mulitcast audio streams:
In this graph, each mark represents a VM that engages in audio streaming - blue marks are running on 6.0 hosts, red on 6.5 and green on the only 6.7 host (the letters are to distinguish the hosts). The horizontal axis is host 95th percentile usage (metric cpu.usage, average), the vertical is 95th percentile readiness (cpu.readiness, average). The 6.5 hosts are generally lower-loaded as a consequence of the higher ready values - I don't run as many VMs on them because of this issue. I would like to be able to a) move to recent ESXi and b) potentially be able to increase host usage to accommodate further services on this platform.
I've tried contacting support, but I don't feel I've made much progress there so thought i'd throw it open to a wider pool of knowledge!
Thanks for any insights
John
PS Obviously this only looks at the averagre readiness - I am also trying to work on the application side to understand worst-case scheduling latency (even if it is only occasionally). ESXi stats are too low-frequency to spot a short duration (30ms) scheduling stall for a VM.
