Quantcast
Channel: VMware Communities : Discussion List - ESXi
Viewing all articles
Browse latest Browse all 8132

strange esxtop output, excessive pktrx = slow iscsi

$
0
0

So I have a Nexenta Community 3.1.2 VM on my ESXi 4.1U2 box, with the intent that it is able to take my physical storage/datastores and put them into a ZFS RAID format so my data has some redundancy.  I then export this volume as a iSCSI LUN and re-mount it in ESXi and format it as a "virtual" VMFS datastore so all my VMs are protected.

 

I have done performance testing on the Nexenta VM OS level, and can see that I can achieve 240 MB/s write 400 MB/s read sequentially.  Unfortunately the performance on that same exact volume mounted back to ESXi via iSCSI is much slower, typically about 120 MB/s write 140 MB/s is the max i can squeeze out of it.  Nexenta doesn't currently support VMXNET3 vNICs so I have been experimenting with setting up multiple 1 GB vNICs with corresponding VMKs with round robin mutipath with 1 iop in order to allow more than 1 Gbit/s of throughput over that iSCSI channel.  After many different testing combinations and got maybe 10% performance improvement over a single vNIC and VMK.  In general, more vNICs in multipath io with iops=1 gets me SLOWER IOPs performance but slightly higer total throughput than a single vNIC.

 

To reiterate it a bit, ALL performance described here is within a single VMWare 4.1 U2 host, none of the traffic actually flows over a physical NIC.  I have isolated all of my storage vmks on a separate vswitch, my vNICs are each in separate port groups with different non-routable ip subnets in each portgroup (corresponding with the vmk IP addresses).

 

While investigating these bottlenecks in esxtop, i have seen something strange and pasted an example below.

 

2:04:48pm up 2 days 11:27, 247 worlds, 8 VMs, 0 vCPUs; CPU load average: 0.45, 0.46, 0.46

 

   PORT-ID              USED-BY  TEAM-PNIC DNAME              PKTTX/s  MbTX/s    PKTRX/s  MbRX/s %DRPTX %DRPR
...snip...
  33554434                 vmk1       void vSwitch1            602.17  175.39    5596.39    3.03   0.00   0.0
  33554435                 vmk2       void vSwitch1              0.00    0.00       0.00    0.00   0.00   0.0
  33554436                 vmk3       void vSwitch1            598.94  175.11    5589.47    3.02   0.00   0.0
  33554437                 vmk4       void vSwitch1              0.00    0.00       0.00    0.00   0.00   0.0
  33554438                 vmk5       void vSwitch1              0.00    0.00       0.00    0.00   0.00   0.0
  33554439                 vmk6       void vSwitch1              0.00    0.00       0.00    0.00   0.00   0.0
  33554465    112883:VirtualSAN       void vSwitch1              0.00    0.00       0.00    0.00   0.00   0.0
  33554466    112883:VirtualSAN       void vSwitch1              0.00    0.00       0.00    0.00   0.00   0.0
  33554467    112883:VirtualSAN       void vSwitch1              0.00    0.00       0.00    0.00   0.00   0.0
  33554468    112883:VirtualSAN       void vSwitch1           5582.54    3.02   15986.13  183.35   0.00   0.0
  33554469    112883:VirtualSAN       void vSwitch1              0.00    0.00       0.00    0.00   0.00   0.0
  33554470    112883:VirtualSAN       void vSwitch1           5600.09    3.03   16013.84  183.64   0.00   0.0

 

In this configuration I have 2 iSCSI paths open and am monitoring the network performance as a possible bottleneck.  You can see vmk1 and vmk3 active on the ESXi host, corresponding to two interfaces on the VirtualSAN VM.  I expect the PKTTX on the vmk to roughly equal the PKTRX on the VirtualSAN interface, and for the inverse to be true (PKTTX:VirtualSAN = PKTRX:vmk).

 

I have seen numerous times for an extended period of time (e.g. 40 GB write when iometer prepares a disk, so about 20 minutes sustained) that the RX and TX values are not even close.  In the example above the vmk PKTTX is about 600 and the vmnic PKTRX is about 16000.  That is a huge difference and could be the source of my problems...

 

The one thing I have not done yet it so sniff the VirtualSAN's vnic interfaces to see if those extra PKTRX are retransmits from the vmk, or something else.

 

Any words of wisdom?  Has anyone ever done a pure virtual network (e.g. firewall/route between two portgroups) and seen the metrics be so far off?


Viewing all articles
Browse latest Browse all 8132

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>