Posts Tagged ‘performance’

Quick Way to Migrate VMs Between Standalone ESXi Hosts

September 26, 2017

Introduction

Since vSphere 5.1, VMware offers an easy migration path for VMs running on hosts managed by a vCenter. Using Enhanced vMotion available in Web Client, VMs can be migrated between hosts, even if they don’t have shared datastores. In vSphere 6.0 cross vCenter vMotion(xVC-vMotion) was introduced, which no longer requires you to even have old and new hosts be managed by the same vCenter.

But what if you don’t have a vCenter and you need to move VMs between standalone ESXi hosts? There are many tools that can do that. You can use V2V conversion in VMware Converter or replication feature of the free version of Veeam Backup and Replication. But probably the easiest tool to use is OVF Tool.

Tool Overview

OVF Tool has been around since Open Virtualization Format (OVF) was originally published in 2008. It’s constantly being updated and the latest version 4.2.0 supports vSphere up to version 6.5. The only downside of the tool is it can export only shut down VMs. It’s may cause problems for big VMs that take long time to export, but for small VMs the tool is priceless.

Installation

OVF Tool is a CLI tool that is distributed as an MSI installer and can be downloaded from VMware web site. One important thing to remember is that when you’re migrating VMs, OVF Tool is in the data path. So make sure you install the tool as close to the workload as possible, to guarantee the best throughput possible.

Usage Examples

After the tool is installed, open Windows command line and change into the tool installation directory. Below are three examples of the most common use cases: export, import and migration.

Exporting VM as an OVF image:

> ovftool “vi://username:password@source_host/vm_name” “vm_name.ovf”

Importing VM from an OVF image:

> ovftool -ds=”destination_datastore” “vm_name.ovf” “vi://username:password@destination_host”

Migrating VM between ESXi hosts:

> ovftool -ds=”destination_datastore” “vi://username:password@source_host/vm_name” “vi://username:password@destination_host”

When you are migrating, machine the tool is running on is still used as a proxy between two hosts, the only difference is you are not saving the OVF image to disk and don’t need disk space available on the proxy.

This is what it looks like in vSphere and HTML5 clients’ task lists:

Observations

When planning migrations using OVF Tool, throughput is an important consideration, because migration requires downtime.

OVF Tool is quite efficient in how it does export/import. Even for thick provisioned disks it reads only the consumed portion of the .vmdk. On top of that, generated OVF package is compressed.

Due to compression, OVF Tool is typically bound by the speed of ESXi host’s CPU. In the screenshot below you can see how export process takes 1 out of 2 CPU cores (compression is singe-threaded).

While testing on a 2 core Intel i5, I was getting 25MB/s read rate from disk and an average export throughput of 15MB/s, which is roughly equal to 1.6:1 compression ratio.

For a VM with a 100GB disk, that has 20GB of space consumed, this will take 20*1024/25 = 819 seconds or about 14 minutes, which is not bad if you ask me. On a Xeon CPU I expect throughput to be even higher.

Caveats

There are a few issues that you can potentially run into that are well-known, but I think are still worth mentioning here.

Special characters in URIs (string starting with vi://) must be escaped. Use % followed by the character HEX code. You can find character HEX codes here: http://www.techdictionary.com/ascii.html.

For example use “vi://root:P%40ssword@10.0.1.10”, instead of “vi://root:P@ssword@10.0.1.10” or you can get confusing errors similar to this:

Error: Could not lookup host: root

Disconnect ISO images from VMs before migrating them or you will get the following error:

Error: A general system error occurred: vim.fault.FileNotFound

Conclusion

OVF Tool requires downtime when exporting, importing or migrating VMs, which can be a deal-breaker for large scale migrations. When downtime is not a concern or for VMs that are small enough for the outage to be minimal, from now on OVF Tool will be my migration tool of choice.

Advertisement

First Look at UCS Performance Manager

May 12, 2016

Overview

perf_gaugeCisco UCS has been in the market for seven years now. It was quite expensive blade chassis when it was first introduced by Cisco in March 2009, but has reached the price parity with most of the server vendors these days.

Over the course of the last seven years Cisco has built a great set of products, which helps UCS customers in various areas:

  • UCS Central for configuration management across multiple Cisco UCS domains
  • UCS Director for infrastructure automation not only of UCS, but also network, storage and virtualization layers (don’t expect it to support any other vendors than Cisco for IP networks, though)
  • UCS Performance Manager for performance monitoring and capacity planning, which can also tap into your network, storage, virtualization and even individual virtual machines

UCS Performance Manager

UCS Performance Manager was first released in October 2014. The product comes in two versions – full and express. PM Express covers only servers, hypervisors and operating systems. The full version on top of that supports storage and network devices. Product is licensed on a per UCS server basis. So you don’t pay for additional network/storage devices or hypervisors.

PM supports vSphere hypervisor (plus Hyper-V), Cisco networking and EMC VNX / EMC VMAX / NetApp FAS storage arrays. By the list of the supported products you may quickly guess that the full version of Performance Manager is targeted mainly at NetApp FlexPod, VCE Vblock and EMC VSPEX customers.

Product architecture

UCS Performance Manager can be downloaded and quickly deployed as a virtual appliance. You might be shocked when you start it up first time, as the appliance by default comes configured with 8 vCPUs and 40GB of RAM. If you’re using it for demo purposes you can safely reduce it to something like 2-4 vCPUs and 8-12GB of RAM. You will experience some slowdowns during the startup, but performance will be acceptable overall.

UCS PM is built on Zenoss monitoring software and is essentially a customized version of Zenoss Service Dynamics with Cisco UCS ZenPacks. You may notice references to Zenoss throughout the management GUI.

ucspm_zenoss

Two main components of the solution are the Control Center and the Performance Manager itself. Control Center is a container orchestration product, which runs Performance Manager as an application in Docker containers (many containers).

ucspm_docker

When deploying Performance Manager you start with one VM and then you can scale to up to four VMs total. Each of the VMs can run in two modes – master or agent. When you deploy the first VM you will have to select it’s role at first login. You have to have one master host, which also runs an agent. And if you need to scale you can deploy three additional agent VMs and build a ZooKeeper cluster. One master host can support up to 500 UCS servers, when configured with 8 vCPUs and 64GB of RAM. Depending on your deployment size you may not ever need to scale to more than one Performance Manager VM.

Installation

After you’ve deployed the OVA you will need to log in to the VM’s CLI and change the password, configure the host as a master, set up a static IP, DNS, time zone, hostname and reboot.

Then you connect to Control Center and click “+ Application” button in the Applications section and deploy UCS PM on port 4979. For the hostname use Control Center’s hostname.

deploy_ucspm

Once the UCS PM application is deployed, click on the Start button next to UCS PM line in the Applications section

start_ucspm

Performance manager is accessible from a separate link which is Control Center’s hostname prefixed with “ucspm”. So if your CC hostname is ucspm01.domain.local, UCS PM link will be https://ucspm.ucspm01.domain.local:443. You can see it in Virtual Host Names column. You will have to add an alias in DNS which would point from ucspm.ucspm01.domain.local to ucspm01.domain.local, otherwise you won’t be able to connect to it.

When you finally open UCS PM you will see a wizard which will ask you to add the licences, set an admin account and add your UCS chassis, VMware vCenters and UCS Central if you happen to have one. In the full version you will have a chance to add storage and network devices as well.

ucspm_wizard

UCS performance monitoring

Probably the easiest way to start working with Performance Manager is to jump from the dashboard to the Topology view. Topology view shows your UCS domain topology and provides an easy way to look at various components from one screen.

ucspm_topology

Click on the fabric interconnect and you can quickly see the uplink utilization. Click on the chassis and you will get summarized FEX port statistics. How about drilling down to a particular port-channel or service profile or vNIC? UCS Performance Manager can give you the most comprehensive information about every UCS component with historical data up to 1 year based on the default storage configuration.

north_traffic

Another great feature you may want to straight away drill down into is Bandwidth Usage, which gives you an overview of bandwidth utilization across all UCS components, which you can look at from a server or network perspective. This can let you quickly identify such things as uneven workload distribution between the blades or maybe uneven traffic distribution between fabric interconnect A and B side or SAN/LAN uplinks going to the upstream switches.

ucspm_bandwidth

You can of course also generate various reports to determine your total capacity utilization or if you’re for example planning to add memory to your blades, you can quickly find out the number of DIMM slots available in the corresponding report.

memory_slots

VMware performance monitoring

UCS Performance Manager is not limited to monitoring only Cisco UCS blade chassis even in the Express version. You can add your hypervisors and also individual virtual machines. Once you add your vCenter to the list of the monitored devices you get a comprehensive list of VMware components, such as hosts, VMs, datastores, pNICs, vNICs and associated performance monitoring graphs, configuration information, events, etc.

Performance Manager can correlate VMware to UCS components and for example for a given VM provide you FC uplink utilization on the corresponding fabric interconnects of the chassis where this VM is running:

vmware_stats

If you want to go further, you can add individual VMs to Performance Manager, connected via WinRM/SSH or SNMP. Some cool additional functionality you get, which is not available in VMware section is the Dynamic View. Dynamic View lets you see VM connectivity from the ESXi host it’s running on all the way through to blade, chassis, vNIC, VIC, backplane port, I/O module and fabric interconnect. Which is very helpful for troubleshooting connectivity issues:

dynamic_view

Conclusion

UCS Performance Manager is not the only product for performance monitoring in virtualized environments. There are many others, VMware vRealize Operations Manager is one of the most popular of its kind. But if you’re a Cisco UCS customer you can definitely benefit from the rich functionality this product offers for monitoring UCS blade chassis. And if you are a lucky owner of NetApp FlexPod, VCE Vblock or EMC VSPEX, UCS Performance Manager for you is a must.

pm_dashboard

Exporting Performance Data from NetApp DataFabric Manager

May 30, 2013

OnCommand_Unifiedmanager_LowResQuick post on how to export custom data from DataFabric Manager Performance Advisor.

NetApp Management Console gives a convenient access to the Performance Advisor data and graphs for a comprehensive analysis of NetApp performance. But NMC only shows graphs and doesn’t give access to the exact numbers. But there is a way to export them for further analysis from the dfm cli:

> dfm perf data retrieve -o filer_name -C disk:disk_busy -b “2013-05-23 12:00:00” -e “2013-05-23 17:00:00” -s 3600 -x TimeIndexed > C:\Temp\dfm_export.txt

Default sample rate for the performance data is 15 minutes. It means that you will get 20 lines of data for a 5 hour period. You can specify the data sample rate in seconds by using ‘-s’ key. Particular performance counter is specified by ‘-C’ key. To list all the available counters run:

> dfm perf export counter list

Data is exported in a list format, if you want it to look more like a spreadsheet, use ‘-x TimeIndexed’. And that’s all for now.

Unexpected Deduplication Impact on VMware I/O Latency

May 28, 2013

NetApp deduplication is a postponed process. During normal operation Data ONTAP only calculates hashes for the data blocks. Actual deduplication is carried out off-hours as per configured schedule. Hash calculation doesn’t affect performance in most cases. I talked about that in my previous post. NetApp states in its documentation that deduplication is a low-priority process:

When one deduplication process is running, there is 0% to 15% performance degradation on other applications.

Once I faced a situation when deduplication was configured to be carried out during business hours on one of the volumes. No one noticed that at some point volume run out of space and Data ONTAP wasn’t able to perform deduplication from that time. Situation became worse when Data ONTAP was upgraded from version 7.3.2 to 8.1.0. Now during deduplication filer tried to upgrade the fingerprint metadata to a new version at 15:00 every day with the message: “Fingerprint is being upgraded” and failed. It seems that the metadata upgrade is a very resource-intensive process and heavily affects I/O latency.

This volume was not a VMware datastore, but it sit on the same aggregate together with the several VMFS LUNs. Here what happened to the VMware I/O latency every day at 15:00 (click to enlarge):

dedup_issue_ed

I deleted the host name and the datastores names from the graph. You can see the large latency spike, which won’t turn yourVMs into kernel panic, but it’s not the thing you would want your production environment to experience every day.

The solution was simple. After space was increased on this volume, deduplication metadata upgrade performed successfully and problem went away. Additionally, deduplication was shifted to off-hours.

The simple lesson to learn: don’t schedule deduplication during the day, you never know what could possibly go wrong.

Connecting VMware ESXi Hosts to NetApp: MPIO Configuration

May 23, 2013

fas3140aOverview

NetApp filers are active/active ALUA arrays. It means that you can access LUNs configured on one controller via the second one. But access to the partner’s LUNs is provided through the internal interconnect and is always slower. That’s why the paths to the controller through the partner are called “unoptimized”. Their primary usage is to provide backup paths in case of a failover.

Fixed path selection

VMware hosts by default use “VMW_SATP_DEFAULT_AA” Storage Array Type Policy (SATP) and “Fixed” Path Selection Policy (PSP) for active/active arrays. If ESXi host is configured with these SATP and PSP, it will access each LUN via one particular path, even if you have two FC ports on each of the controllers.

VMware host can’t automatically identify optimized path. So you can either set it manually or use functionality of NetApp Virtual Storage Console (VSC) plug-in for VMware. Just go to the Monitoring and Host Configuration -> Overview section of VSC, right click on ESXi host and click “Set Recommended Values”. If you don’t do that, ESXi hosts will run I/O traffic through a randomly identified path, which could turn out to be unoptimized. It means you will push heaps of I/O through the partner node and experience higher latencies.

You can check if you’re using non-optimized paths by looking for such warnings on NetApps:

filer_01> Mon May 6 10:30:45 EST [filer_01: ems.engine.inputSuppress:error]: Event ‘scsitarget.partnerPath.misconfigured’ suppressed 327 times since Mon May 6 09:30:48 EST 2013.
Mon May 6 10:30:45 EST [filer_01: scsitarget.partnerPath.misconfigured:error]: FCP Partner Path Misconfigured – Host I/O access through a non-primary and non-optimal path was detected.

Or run “lun stats -o” and look for huge numbers under “Partner Ops” and “Partner KBytes”.

ALUA configuration

If you want to utilize both links to the controller in a round robin fashion, you need to do some additional configuration. You should enable ALUA for your VMware ESXi hosts initiator group on NetApp:

igroup set <group> alua yes

Now you need to reboot ESXi host. After a reboot it will see that storage is ALUA-capable and change SATP to VMW_SATP_ALUA and PSP is “Most Recently Used”. To utilize load balancing between two controller paths you need to change PSP to “Round Robin”. Again, you can do that either manually or via VSC.

Note: Don’t ever use ALUA and VMW_SATP_ALUA if you have Windows Server 2003 MSCS or Windows Server 2008 Failover Cluster with shared RDM LUNs. It’s an unsupported configuration and you can run into a cluster failure situation. It’s described in many places:

In this case leave SATP as “VMW_SATP_DEFAULT_AA”,  PSP as “Fixed” and make sure that you use optimized paths.

NetApp Deduplication in a Nutshell

May 12, 2013

NetApp-Dedupe2NetApp uses Write Anywhere File Layout (WAFL) filesystem which is a key for NetApp’s efficient snapshot technology. If you’re already familiar with how snapshots are implemented in Data ONTAP, then understanding underlying mechanisms of deduplication is simple. Filer calculates hash for each data block it receives and preserves this in the form of metadata on the volume level. Then according to deduplication schedule (usually on weekends), filer runs metadata processing and for each duplicate hash changes data pointer to the original block of data.

Since for each data block filer needs to calculate hash on the fly, it has its penalty. On systems with CPU loaded by less than 50% performance impact is negligible. For heavily loaded systems, where CPU is nearly 100% busy, performance impact can be around 15%. For high-end 6000 systems penalty can jump up to 35% for random writes. Heavy sequential read operations can also suffer from deduplication, because read operations can be rerouted in random way across physical storage, depending on where the original data block actually is. In general, deduplication has low impact on system performance. But you can’t use it blindly and should keep in mind that in particular cases it can slow down your storage system.

Deduplication configuration is pretty simple. First of all, you need to activate deduplication on particular volume:

> sis on “targetvol”

If you already have data on the volume, you need to scan it. Otherwise, it won’t be deduped. It’s a common mistake. Deduplication is a low priority task, but keep in mind, however, that it can slightly impact your storage performance when done during business hours, especially if you run deduplication for several volumes simultaneously.

> sis start -s “targetvol”

To show the status of deduplication for particular volume:

> sis status “targetvol”

To see the deduplication schedule:

> sis config “targetvol”

And the most pleasant command to find out how much data you’ve saved:

> df -s “targetvol”

If you want to undo deduplication, first switch it off and then undo it using the following commands:

> sis off “targetvol”
> priv set advanced
*> sis undo “targetvol”
*> priv set

I, personally, was able to achieve 40% deduplication rate for VMware VMFS datastores, which is rather impressive, considering these were mixed OS + application data LUNs.

As a final note, I would like to point out, that deduplication is suitable only for environments with high percentage of similar data. VMware is a good example of it. You won’t get any significant deduplication ratio for swap file volumes, Exchnage mailboxes or Symantec Enterprise Vault which is already deduped.

Further reading:

TR-3958: NetApp Data Compression and Deduplication Deployment and Implementation Guide: Data ONTAP 8.1 Operating in 7-Mode

Storwize V7000 with vSphere 5 storage configuration

December 1, 2012

storwizeInformation on how to configure Storwize for optimal performance is very scarce. I’ll try to build some understanding of it from bits an pieces gathered throughout the Internet and redbooks.

Barry Whyte gave many insights on Storwize internals in his blog. Particularly his “Configuring IBM Storwize V7000 and SVC for Optimal Performance” series of posts. I’ll quote him here. The main Storwize redbook “Implementing the IBM Storwize V7000 V6.3” is mostly an administration guide and gives no useful information on the topic. I find “SAN Volume Controller Best Practices and Performance Guidelines” way more helpful (Storwize firmware is built on SVC code).

Total Number of MDisks

That’s what Barry says:

… At the heart of each V7000 controller canister is an Intel Jasper Forrest (Sandy Bridge) based quad core CPU. … When we added the tried and trusted (SSA) DS8000 RAID functionality in 2010 (6.1.0) we therefore assigned RAID processing on a per mdisk basis to a single core. That means you need at least 4 arrays per V7000 to get maximal CPU core performance. …

Number of MDisks per Storage Pool

SVC Redbook:

The capability to stripe across disk arrays is the single most important performance advantage of the SVC; however, striping across more arrays is not necessarily better. The objective here is to only add as many arrays to a single Storage Pool as required to meet the performance objectives.

If the Storage Pool is already meeting its performance objectives, we recommend that, in most cases, you add the new MDisks to new Storage Pools rather than add the new MDisks to existing Storage Pools.

Table 5-1 shows the recommended number of arrays per Storage Pool that is appropriate for general cases.

Controller type       Arrays per Storage Pool
DS4000/DS5000         4 - 24
DS6000/DS8000         4 - 12
IBM Storwise V7000    4 - 12

The development recommendations for Storwize V7000 are summarized below:

  • One MDisk group per storage subsystem
  • One MDisk group per RAID array type (RAID 5 versus RAID 10)
  • One MDisk and MDisk group per disk type (10K versus 15K RPM, or 146 GB versus 300 GB)

There are situations where multiple MDisk groups are desirable:

  • Workload isolation
  • Short-stroking a production MDisk group
  • Managing different workloads in different groups

We recommend that you have at least two MDisk groups, one for key applications, another for everything else.

Number of LUNs per Storage Pool

SVC Redbook:

We generally recommend that you configure LUNs to use the entire array, which is especially true for midrange storage subsystems where multiple LUNs configured to an array have shown to result in a significant performance degradation. The performance degradation is attributed mainly to smaller cache sizes and the inefficient use of available cache, defeating the subsystem’s ability to perform “full stride writes” for Redundant Array of Independent Disks 5 (RAID 5) arrays. Additionally, I/O queues for multiple LUNs directed at the same array can have a tendency to overdrive the array.

Table 5-2 provides our recommended guidelines for array provisioning on IBM storage subsystems.

Controller type                     LUNs per array
IBM System Storage DS4000/DS5000    1
IBM System Storage DS6000/DS8000    1 - 2
IBM Storwize V7000                  1

General considerations

vsphere5-logoLets take a look at vSphere use case scenario on top of Storwize with 16 x 600GB SAS drives in control enclosure and 10 x 2TB NL-SAS in extension enclosure (our personal case).

First of all we need to decide how many arrays we need. Do we have different workloads? No. All storage will be assigned to virtual machines which have in general the same random read/write access pattern. Do we need to isolate workloads? Probably yes, it’s generally a good idea to separate highly critical production VMs from everything else. Do we have different drive types? Yes. Obviously we don’t want to mix drive types in one RAID. Are we going to make different RAID types? Again, yes. RAID 10 is appropriate on SAS and RAID 5 on NL-SAS. So two MDisks – one RAID 10 on SAS and one RAID 5 on NL-SAS would be enough. Storwize nodes have 4 cores each. It may seem that you would benefit from 4 MDisks, but in fact you won’t. Here what Barry says:

In the case where you only have 1 or 2 HDD arrays, then the core stuff doesn’t really come into play. Its only when you get to larger systems, where you are driving more I/O than a single RAID core can handle that you need to spread them.

This is also true if you are running all SSD arrays, so 24x SSD would be best split into 4 arrays to get maximum IOPs, whereas 24x HDD are not going to saturate a single core, so (if you could create a 23+P! [ you can’t 15+P is largest we support ] then it would perform as well as 2x 11+P etc

To storage pools. In our example we have two MDisks, so you simply make two storage pools. In future if you hit performance limit, you create additional MDisks and then you have two options. If each MDisk separately is able to sustain your performance requirements, you make additional storage pools and redistribute workload between them. If you have huge load on storage and even redistribution of VMs between two arrays doesn’t help, then you better combine two MDisks of each type in its own storage pool for striping between MDisks.

Same story for number of LUNs. IBM recommends one to one LUN to MDisk relationship. But read carefully. Recommendation comes from the fact that different workloads can clash and degrade array performance. But if we have generally the same I/O patterns coming to the array it’s safe to make several LUNs on it, until latency is in the acceptable range. Moreover, when it comes to vSphere and VMFS, it’s beneficial to have at least two volumes in terms of manageability. With several LUNs you will at least have an ability to move VMs between LUNs for reconfiguration purposes. Also keep in mind that ESXi 5 hypervisor limit each host to storage queue of depth 32 per LUN. It means that if you have one big LUN and many VMs running on the host, you can quickly reach queue limit. On the other hand do not create too many LUNs or you will oversubscribe storage processors (SPs).

Sample configuration

IBM recommends constructing both RAID 10 and RAID 5 arrays from 8 drives + 1 spare drive. But since we have 16 SAS and 10 NL-SAS I would launch CLI and create two arrays: one 14 drives + 2 spares RAID 10 and one 8 drives + 2 spares RAID 5 (or 9 drives + 1 spare, but it’s not a good idea to create RAID with uneven number of drives). Each RAID in its own pool. Several LUNs in each pool. I would go for 2TB LUNs.

IBM DS4700 copyback failed

August 27, 2012

If you have a global hot spare (GHS) drive when one of the active hard drives failes, your data is reconstructed to a GHS. Then, when you replace the failed drive, storage system automatically initiates a copyback, which gets the data from the GHS back to the replacement drive. Sometimes it doesn’t happen and replacement drive stays in an Unassigned state. If it is the case go to the DS Storage Manager, right click on the RAID array and select Replace Drives. There you should see the failed drive. Choose replacement from unassigned drives and click Replace Drive. Copyback will start immediately.

Take into consideration that copyback can be long-lasting, depending on the array size. If it is a production system and its performance is critical, right click on the logical drive, choose Change -> Modification Priority. There you can set how much resources will be allocated for modification (such as copyback, reconstruction, etc) and performance. Change it to Low for maximum performance.

Jumbo Frames justified?

March 27, 2012

When it comes to VMware on NetApp, boosting  performance by implementing Jumbo Frames is always taken into consideration. However, it’s not clear if it really has any significant impact on latency and throughput.

Officially VMware doesn’t support Jumbo Frames for NAS and iSCSI. It means that using Jumbo Frames to transfer storage traffic from VMkernel interface to your storage system is the solution which is not tested by VMware, however, it actually works. To use Jumbo Frames you need to activate them throughout the whole communication path: OS, virtual NIC (change to Enchanced vmxnet from E1000), Virtual Switch and VMkernel, physical ethernet switch and storage. It’s a lot of work to do and it’s disruptive at some points, which is not a good idea for production infrastructure. So I decided to take a look at benchmarks, before deciding to spend a great amount of time and effort on it.

VMware and NetApp has a TR-3808-0110 technical report which is called “VMware vSphere and ESX 3.5 Multiprotocol Performance Comparison Using FC, iSCSI, and NFS”. Section 2.2 clearly states that:

  • Using NFS with jumbo frames enabled using both Gigabit and 10GbE generated overall performance that was comparable to that observed using NFS without jumbo frames and required approximately 6% to 20% fewer ESX CPU resources compared to using NFS without jumbo frames, depending on the test configuration.
  • Using iSCSI with jumbo frames enabled using both Gigabit and 10GbE generated overall performance that was comparable to slightly lower than that observed using iSCSI without jumbo and required approximately 12% to 20% fewer ESX CPU resources compared to using iSCSI without jumbo frames depending on the test configuration.
Another important statement here is:
  • Due to the smaller request sizes used in the workloads, it was not expected that enabling jumbo frames would improve overall performance.

I believe that 4K and 8K packet sizes are fair in case of virtual infrastructure. Maybe if you move large amounts of data through your virtual machines it will make sense for you, but I feel like it’s not reasonable to implement Jumbo Frames for virual infrastructure in general.

The another report finding is that Jumbo Frames decrease CPU load, but if you use TOE NICs, then no sense once again.

VMware supports jumbo frames with the following NICs: Intel (82546, 82571), Broadcom (5708, 5706, 5709), Netxen (NXB-10GXxR, NXB-10GCX4), and Neterion (Xframe, Xframe II, Xframe E). We use Broadcom NetXtreme II BCM5708 and Intel 82571EB, so Jumbo Frames implementation is not going to be a problem. Maybe I’ll try to test it by myself when I’ll have some free time.

Links I found useful: