System Administration

Understanding VMware VMkernel Traffic Routing

As a basis for an upcoming post on splitting vmkernel traffic across over layer 3 boundaries I wanted to describe how vmkernel traffic is routed on an ESX host. There seems to be a lot of confusion in this area and hopefully this will help to clear it up.

If you need a refresher on IP addresses, network masks, or subnets check out this Cisco article.

Directly Connected Networks

If a host is directly connected to a subnet it will use that interface to talk to devices in that subnet. For example if I have an interface with the IP 10.1.1.1 NETMASK 255.255.255.0, that interface will be used to talk to anything on the 10.1.1.0 network. This applies to every directly connected interface.

If I have three vmkernel port groups defined with the following IP information
vmk0: 10.1.0.1    255.255.255.0
vmk1: 10.1.1.1    255.255.255.0
vmk2: 10.1.2.1    255.255.255.0

Then vmk0 will be used to talk to everything on 10.1.0.0, vmk1 for 10.1.1.0, and vmk2 for 10.1.2.0.

Remote Networks

So, what happens when the device I am talking to is on a subnet that I am not directly connected to? This is where the routing table really comes into play so let’s take a look at it using:

vicfg-route –list

VMkernel Routes:
Network             Netmask             Gateway
10.1.0.0            255.255.255.0       Local Subnet
10.1.1.0            255.255.255.0       Local Subnet
10.1.2.0            255.255.255.0       Local Subnet
default             0.0.0.0             10.1.0.254

We see the directly connected networks with a Gateway of Local Subnet. This describes the direct communication that we discussed in Directly Connected Networks.

The last line is a result of our configuration of the “VMkernel Default Gateway” when setting up the vmkernel port group. What it says is send everything else to the router at 10.1.0.254.

The router is in the 10.1.0.0 network and since vmk0 is directly connected to that subnet we know that it will be used for all non local traffic.

A point of clarification

I have seen some confusing statements out there to the effect of “The vmkernel port group with the default gateway assigned will be used to send traffic.” As we have seen, this is not quite true.

All vmkernel ports use the same default gateway so there is no specific assignment per port group. The vmkernel port group that is directly connected to the specified gateway will be used. Unless specific routes are added that means the vmknic in the same subnet as the default gateway will be used for all routed vmkernel traffic.

The routing table can be customized using the vicfg-route command, but should be done rarely. I will discuss one reason you want to do that in my post on splitting vmkernel traffic when crossing layer 3 boundaries.

Side Note: Service Console vs. VMkernel
On the non ESXi versions of vSphere the service console and vmkernel each have their own TCP/IP stacks and therefore have their own IP configuration including routing tables. This means that any IP configuration of one has no effect on the other. The service console’s routing table can be viewed with the command “route” or “route -n”.

Unable to install ESX400-201002401 or ESX400-200912401

Update Manager 4.0 Update 1 was refusing to install either of these two patches. At first I was confused because I did not have Nexus deployed, but this was looking similar to the problem that Update 1 was supposed to fix.

I downloaded the bundle from VMware’s web site and tried to install the patch manually and got this error:

The following problems were encountered trying to resolve dependencies:
   cross_emulex-cim-provider_400.2.0.27.1-164009 provides 'emulex-cim-provider
   >= 400.2.0.27.1' (required by rpm_vmware-esx-cim_4.0.0-1.11.236512@i386), but
   is obsoleted by the host

The culprit is the HBAnywhere installation I loaded for HBA firmware management. I uninstalled HBAnywhere and the patches worked just fine.

$ sudo rpm -qa | grep elx
elxvmwarecorekit-esx40-4.0a44-1
$ sudo rpm -e elxvmwarecorekit-esx40-4.0a44-1

I originally had to follow this Emulex KB article to get the application installed.

I went to Emulex’s web site to find a HBAnywhere update package for ESX 4 U1 and now I can not find a download for the original version let alone an updated one. The firmware update manual still lists HBAnywhere as a utility for VMware though.

If anyone knows of an updated package or another way to do firmware updates online I would appreciate letting me know in the comments.

UPDATE 5/6/2010
The current version of HBAnywhere resolves this issue. Here are the newest downloads for VMware.

VMware vCLI “persistent login”

Here is a short convenience script that will simulate a persistent login to a VMware host system when using the vCLI on Windows. Typically you have to specify a lot of parameters that include login information or a session file. With this method you just run the script and provide the hostname, username, and password for the connection.

After running this script you can run commands like “vicfg-mpath.pl –list” without additional parameters.

There is much that could be done to improve this script; this is just a quick and dirty version to make my life easier. If I improve it in the future I will post updates.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#!/usr/bin/perl -w
 
use strict;
use warnings;
 
my $vcli_install_dir = "C:\\Program Files\\VMware\\VMware vSphere CLI\\bin";
chdir($vcli_install_dir) or die "Could not change to the vCLI directory: $vcli_install_dir";
 
print "Hostname:";
my $host_name = <STDIN>;
chomp($host_name);
 
$ENV{'VI_SERVER'} = $host_name;
my $session_file_name = $ENV{'TEMP'} . "\\vcli.session";
$ENV{'VI_SAVESESSIONFILE'} = $session_file_name;
 
system("..\\Perl\\apps\\session\\save_session.pl");
$ENV{'VI_SESSIONFILE'} = $session_file_name;
 
print "Spawning a logged in subshell.  Type exit to end the session.\n";
system("cmd.exe");
 
# Remove the session file.
unlink($session_file_name);

VMware SRM – finding VMs in a recovery plan

I am evaluating VMware Site Recovery Manager (SRM) and there are some things that I need to do in testing that I can not do in the base product. I am going to run some PowerShell to reconfigure the VMs and rename them. One of the first roadblocks is finding out which VMs are part of a particular recovery plan. Since the SRM API is limited I went into the database and dug around. Here is what I came up with. Let me know how it works for you. Especially if you have multiple protection groups. My configuration only has one.

Recommendations for better ways of doing this are appreciated.

USE [SRMDB01]
SELECT rp.plan_name, sv.shadowvmname AS shadowvm_name
FROM pdsr_shadowvm sv,
	(SELECT sg.mo_id AS groupmoid, CONVERT(VARCHAR(255), g.string_val) AS shadowvmmoid
		FROM pdsr_shadowgroup sg
		LEFT OUTER JOIN g_string_array g
		ON sg.vmmoids = g.seq_id) sg,
	(SELECT rp.name AS plan_name, CONVERT(VARCHAR(255), g.string_val) AS shadowgroupmoid
		FROM pdsr_recoveryprofile rp
		LEFT OUTER JOIN g_string_array g
		ON rp.shadowgroupmoids = g.seq_id) rp
WHERE sg.shadowvmmoid = sv.mo_id
	AND rp.shadowgroupmoid = sg.groupmoid
	AND rp.plan_name LIKE 'Recovery Plan 01'

Maximum vSwitches in vSphere

There appears to be a discrepancy in the VMware documentation regarding the maximum number of vSwitches. The Configuration Maximums document states that the limit is 248. The configuration guide lists the maximum as 127 which is what it was in 3.5. I am not sure if I misunderstand what they mean by “Standard switches per host 248”. If you see the error of my ways let me know.

Here is some code that I used to see how many vSwitches I could put on a host. This will create 126 vSwitches, each with 8 usable ports, (I am assuming that vSwitch0 is already configured. Any attempts to add another one fail.

$vmhost = Get-VMhost <mytestHost>.local
1..126 | % {New-VirtualSwitch -VMHost $vmhost -Name vSwitch$_ -NumPorts 16}

Secure Credential Storage

I have added some code to my Scripts/Programs page for securely storing credentials to disk to be used by scripts at a later time.

I explored the VI Credential cmdlets and I was not 100% happy with them so I decided to implement my own version that uses the Microsoft DPAPI for encryption.  Make sure to look at the README as it covers usage and current limitations.

I wrote these for use in some automated VMware monitoring I am doing, but there is no reason they have to be used with the VI Toolkit.  There are no dependencies on the toolkit.

Please provide feedback if you have any issues with the scripts.

Using clusterssh to admin multiple service consoles

We have our service consoles set up to disallow root logins and use sudo (with password) for access. Every once in a while this causes us some pain. How do you update file /etc/filex or run some command across a cluster or a bunch hosts when root privileges are needed?

There are multiple ways to approach this type of issue, but one I have not seen much of in the VMware blogosphere is clusterssh. It allows you to interactively type commands on a number of hosts at the same time. If something sticks out you can enter commands on any of the individual boxes.

Some linux distributions include this in their repositories so it can be pretty easy to give it a try (something like apt-get install clusterssh to install). It supposedly works on OS X too.

VI SDK – Beware cached data and assumptions

Applications often cache data and that is good because that generally makes them perform better for users. However, if you make assumptions that what you see is the absolute truth you can run into trouble.

Connect-ViServer myVirtualCenter001
 
# Let's check the status of the server to make sure I can go home.
(Get-vmhost myVmHost001.local | Get-View).OverallStatus
green
 
#Looks good.  Well, maybe not.
(Get-vmhost myVmHost001.local |Get-View).runtime.ConnectionState
disconnected

I know for a fact that this server is not even running VMware at this point so how could I get a green status? Caching. I am not saying that VMware is doing anything wrong in this case, but be aware that your view of the data may not be the most recent.

Always check your assumptions when the data matters.