{"id":515,"date":"2010-01-31T18:22:43","date_gmt":"2010-01-31T23:22:43","guid":{"rendered":"http:\/\/www.jasemccarty.com\/blog\/?p=515"},"modified":"2010-01-31T18:22:43","modified_gmt":"2010-01-31T23:22:43","slug":"nfs-and-iscsi-performance-gotcha-vmnic-autonegotiation-rx-tx","status":"publish","type":"post","link":"https:\/\/www.jasemccarty.com\/blog\/nfs-and-iscsi-performance-gotcha-vmnic-autonegotiation-rx-tx\/","title":{"rendered":"NFS performance gotcha: vmnic Autonegotiation, RX, &#038; TX"},"content":{"rendered":"<p>I recently migrated a production environment off of Fibre Channel over to NFS.  For anyone looking to implement either NFS or iSCSI in a vSphere or VI3 environment, I would definitely recommend reading the post <strong><a title=\"A &quot;Multivendor Post&quot; to help our mutual iSCSI customers using VMware\" href=\"http:\/\/virtualgeek.typepad.com\/virtual_geek\/2009\/06\/a-multivendor-post-to-help-our-mutual-nfs-customers-using-vmware.html\" target=\"_blank\">A \u201cMultivendor Post\u201d to help our mutual NFS customers using VMware<\/a><\/strong> hosted on <strong><a title=\"http:\/\/virtualgeek.typepad.com\/\" href=\"http:\/\/virtualgeek.typepad.com\/\" target=\"_blank\">Chad Sakac&#8217;s blog<\/a><\/strong> as well as on <strong><a title=\"A \u201cMultivendor Post\u201d to help our mutual NFS customers using VMware\" href=\"http:\/\/blogs.netapp.com\/virtualstorageguy\/2009\/06\/a-multivendor-post-to-help-our-mutual-nfs-customers-using-vmware.html\" target=\"_blank\">here<\/a><\/strong> Vaughn Stewart&#8217;s <strong><a title=\"The Virtual Storage Guy\" href=\"http:\/\/blogs.netapp.com\/virtualstorageguy\" target=\"_blank\">blog<\/a><\/strong>.\u00a0 It is a very good read, and stresses the point that the &#8220;storage network&#8221; should be configured appropriately for storage traffic.<\/p>\n<p><strong>Best Practices<\/strong><br \/>\nThe second bullet point in the section <strong>Performance consideration #2: Design a \u201cBet the Business\u201d Ethernet Network<\/strong> talks about enabling flow control.\u00a0 In the NetApp environment, typically switches are set to receive on and NFS targets are set to transmit on.<\/p>\n<p>I don&#8217;t have any EMC equipment, but I do have a NetApp filer.\u00a0 So I looked at the NetApp Technical Report:\u00a0 <strong><a title=\"TR-3749: NetApp and VMware vSphere Storage Best Practices\" href=\"http:\/\/media.netapp.com\/documents\/tr-3749.pdf\" target=\"_blank\">TR3749 NetApp and VMware vSphere Storage Best Practices<\/a><\/strong> as a reference in configuring my environment.<\/p>\n<p>Scott Lowe posted an article today on some <strong><a title=\"EMC Celerra Optimizations for VMware on NFS\" href=\"http:\/\/blog.scottlowe.org\/2010\/01\/31\/emc-celerra-optimizations-for-vmware-on-nfs\/\" target=\"_blank\">EMC Celerra Optimizations for VMware on NFS<\/a><\/strong>, and is a good read, but I could not find anything related to flow control at the ESX level.\u00a0 I have an open question with Chad Sakac about recommended flow control settings with EMC storage.<\/p>\n<p>On page 46 of TR-3749 (Section 9.3) the first paragraph reads: <em>Flow control is the process of managing the rate of data transmission between two nodes to prevent a fast sender from over running a slow receiver. Flow control can be configured ESX servers, FAS storage arrays, and network switches. It is recommended to configure the end points, <strong>ESX servers<\/strong> and NetApp arrays with flow control set to &#8220;send on&#8221; and &#8220;receive off.&#8221;<\/em><\/p>\n<p><strong>Configuring Storage Nics<\/strong><br \/>\nTo configure flow control from the Service Console it is pretty straightforward.\u00a0 Use <strong>ethtool<\/strong> to adjust the flow settings of a physical nic.<\/p>\n<p>The basic syntax to view the current configuration of a vmnic is:<\/p>\n<blockquote><p>ethtool -a ethX<\/p><\/blockquote>\n<p>The syntax to change the configuration of a vmnic is different:<\/p>\n<blockquote><p>ethtool -A ethX [autoneg on|off] [rx on|off] [tx on|off]<\/p><\/blockquote>\n<p>So to change the settings of vmnic2, the syntax would be:<\/p>\n<blockquote><p>ethtool -A vmnic2 autoneg off rx off tx on<\/p><\/blockquote>\n<p>Upon initial setup I configured each of the storage nics with autonegotiation off, receive off, and transmit on.\u00a0 So my hosts and my NetApp were set to transmit, and my switches were set to receive, per TR-3749.\u00a0 Performance was awesome, and NetApp filer CPU performance was low as well.\u00a0 Things looked good.<\/p>\n<p><strong>The Gotcha<\/strong><br \/>\nIt is not unheard of to keep an ESX host up for months or years at a time, so the &#8220;gotcha&#8221; wasn&#8217;t apparent until several months after migrating VM&#8217;s from our older FC SAN to the NFS datastores presented by the NetApp.\u00a0 With about 300 guests at the time of initial setup, watching CPU rise (somewhat) on my filer did not seem strange as I migrated more guests from FC to NFS.<\/p>\n<p>One of my hosts indicated a hardware issue, so I evacuated guests from it, and took it offline.\u00a0 After careful investigation, and a replacement part, the host was brought back online.\u00a0 I still didn&#8217;t notice my issue at this point.\u00a0 But the CPU utilization of this host was a little more than it had been in the past, when loaded with the same number of VM&#8217;s with about the same workload.<\/p>\n<p>Another a couple hosts needed to be moved from the temporary location they were in to a more permanent location.\u00a0 Again, I evacuated the VM&#8217;s, powered them down, moved them, reran connections, and put the hosts back into service.\u00a0 Again, the hosts behaved about the same as before, but I still didn&#8217;t notice the gain in CPU utilization.<\/p>\n<p>I was looking at my filer, and did notice that the CPU utilization had jumped by about 10% on average.\u00a0 I did notice that the guests were restarting a little more slowly during the most recent boot storm after a patch window.\u00a0 Now, no additional VM&#8217;s were\u00a0 added, and no other changes were made to the environment.\u00a0 The only changes, were that hosts had been rebooted.\u00a0 Keep in mind it is not uncommon to run 70-80 guests per host for me.<\/p>\n<p>The &#8220;gotcha&#8221; was that the flow control settings are <strong><em>not persistent<\/em><\/strong> after reboots of an ESX host.<\/p>\n<p>Running ethtool -a against all of the vmnics on the moved and rebooted hosts, showed that autonegotiate was not set to autonegotiate off\/receive off\/transmit on.<\/p>\n<p><strong>The Fix<\/strong><br \/>\nTo ensure that all of my storage vmnics (4 per host) are properly configured, I modified <strong>\/etc\/rc.local<\/strong> to include the appropriate commands upon startup after an ESX reboot.<\/p>\n<blockquote><p>ethtool -A vmnicW autoneg off rx off tx on<br \/>\nethtool -A vmnicX autoneg off rx off tx on<br \/>\nethtool -A vmnicY autoneg off rx off tx on<br \/>\nethtool -A vmnicZ autoneg off rx off tx on<\/p><\/blockquote>\n<p>Now every time a host is booted, the transmit configuration (per TR-3749) is restored.<\/p>\n<p>Note: This also works on ESXi, but will require modifying rc.local using the unsupported &#8220;<strong><a title=\"Tech Support Mode for Emergency Support\" href=\"http:\/\/kb.vmware.com\/kb\/1003677\" target=\"_blank\">Tech Support Mode.<\/a><\/strong>&#8221;<\/p>\n<p><strong>The Conclusion<\/strong><br \/>\nAfter correcting all hosts to reconfigure the &#8220;send on&#8221; settings on boot, VM&#8217;s are much more responsive during boot storms, overall host CPU is lower during normal operation, and the NetApp filer&#8217;s CPU utilization is lower as well.<\/p>\n<p>The point to the story, is that initial configurations can be lost on reboot depending on the &#8220;stickiness&#8221; of the configuration.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I recently migrated a production environment off of Fibre Channel over to NFS. For anyone looking to implement either NFS or iSCSI in a vSphere &hellip; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[12],"tags":[37,38,56,58],"class_list":["post-515","post","type-post","status-publish","format-standard","hentry","category-virtualization","tag-esx","tag-esxi","tag-netapp","tag-nfs"],"_links":{"self":[{"href":"https:\/\/www.jasemccarty.com\/blog\/wp-json\/wp\/v2\/posts\/515","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.jasemccarty.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.jasemccarty.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.jasemccarty.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.jasemccarty.com\/blog\/wp-json\/wp\/v2\/comments?post=515"}],"version-history":[{"count":0,"href":"https:\/\/www.jasemccarty.com\/blog\/wp-json\/wp\/v2\/posts\/515\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.jasemccarty.com\/blog\/wp-json\/wp\/v2\/media?parent=515"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.jasemccarty.com\/blog\/wp-json\/wp\/v2\/categories?post=515"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.jasemccarty.com\/blog\/wp-json\/wp\/v2\/tags?post=515"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}