Sometimes a small misconfiguration can cause havoc in a complex distributed system. It becomes really annoying when no proper output is provided by log files and status report. While investigating time issues in my lab I ran into the following error message while executing the ntpq -p
command:
TL;DR
NTP client is disabled, enable it via the GUI
The standard NTP query program (ntpq) is one of the quickest way to verify that the Network Time Protocol Daemon (ntpd) is up and running. The command ntpq -p
prints a list of peers known to the ESXi host as well as a summary of their state. Running the command on another ESXi host provided the following output.
Requesting the status of the NTPD status on the host with weird time issues, shows it’s not running. No proper feedback is provided by the command line other than it’s starting, no failure code is returned.
Management service initialisation, such as ntpd starts are logged in the file /var/log/syslog.log in ESXi 5.1 and up. Unfortunately, nothing useful is logged in this logfile as well.
I couldn’t find a command that provides accurate output whether the NTP client was enabled or not. Time to open up the web client. Host time configuration can be found when selecting the ESXi host, Manage, Time Configuration. Apparently NTP was not enabled.
Simple problem to fix, unfortunately there is no simple command line function that allows to verify while NTP client is enabled (sans PowerCli)
No network connection after re-registering VCSA using the I've copied it answer
Paulo Coelho once stated “Life moves very fast. It rushes from Heaven to Hell in a matter of seconds” Well I think he perfectly described a day working in the lab and rushing through a migration. I’m upgrading the lab and I moved the vCenter Server Appliance (VCSA) to its new home. While trying to do a million things all at once, I didn’t pay attention to the question whether I moved the virtual machine or whether I copied it. I selected the option “I copied it”. And that’s when the fun started, vCenter down.
TL;DR:
Selecting “I copied it” implies that this machine is a duplicate and that a new identity should be generated. This means that the VM is getting a new UUID and a new MAC address. SUSE Linux Enterprise Server 11 detects this new MAC address and views this as a new Ethernet Device. The VCSA does not allow the creation of a new ethernet controller. Rename 70-persistent-net.rules file and reboot to have SUSE auto-generate a new 70-persistent-net.rules file with the correct MAC Address that allows you restore network connectivity via the console.
Troubleshooting the problem
Both the web client and the VCSA config web page are unreachable, time to open up the VM console (Alt-F1). When logging in and pinging the gateway the error, the system returns the error message “Network is unreachable”
Before tinkering with the configuration files, I like to restart the services and see if the status report exposes interesting information.
“No configuration found for eth1”. The VCSA is configured with a single NIC and SUSE Linux Enterprise Server 11, which is the OS for the appliance, assigns the label eth0 to the first Ethernet adapter. VCSA networking is configured through the Virtual Appliance Management Interface (VAMI). Executing the command “/opt/vmware/share/vami/vami_config_net allows you to retrieve the current network configuration
When selecting option 6 “IP Address Allocation for eth1” VAMI reveals that it cannot read the interface files for ‘eth1’
The networking interface files are stored in the directory /etc/sysconfig/networking/devices. When listing the files (ls) only ifcfg-eth0 shows up. Reviewing the ifcfg-eth0 file with cat shows that the correct networking configuration is still applied to eth0.
It looks like the problem occurs due to the way SUSE handles devices. The following text is copied directly from the SUSE documentation:
When the Kernel detects a network card and creates a corresponding network interface, it assigns the device a name depending on the order of device discovery, or order of the loading of the Kernel modules. The default Kernel device names are only predictable in very simple or tightly controlled hardware environments. Systems which allow adding or removing hardware during runtime or support automatic configuration of devices cannot expect stable network device names assigned by the Kernel across reboots.
However, all system configuration tools rely on persistent interface names. This problem is solved by udev. The udev persistent net generator (/lib/udev/rules.d/75-persistent-net-generator.rules) generates a rule matching the hardware (using its hardware address by default) and assigns a persistently unique interface for the hardware. The udev database of network interfaces is stored in the file/etc/udev/rules.d/70-persistent-net.rules. Every line in the file describes one network interface and specifies its persistent name
Source: https://www.suse.com/documentation/sled11/book_sle_admin/data/sec_basicnet_manconf.html
When the ESXi host assigns the VM a new MAC Address, SUSE assigns a new unique interface to this MAC address and stores this in the file etc/udev/rules.d/70-persistent-net.rules.
It shows two Ethernet adapters, eth1 is using the MAC address currently assigned to the VM.
We are now entering a twilight zone, where there is one ethernet interface configured with an IP-address (ifcfg-eth0) while SUSE is applying all rules to a device it created and using the MAC Address assigned to the only NIC attached to the VM (Network Adapter 1). Time to clean up. Luckily udev rules are automatically generated during boot. To solve the mac address assignment fast, rename the file 70-persistent-net.rules
After rebooting the VCSA, review the 70-persistent-net.rules file to verify that SUSE assigned the MAC address to eth0.
You can now safely customize the system (Press F2 in the console) and configure the management network
A reboot of the VCSA is necessary as it appears that a restart of the management services is not enough to restore all services. Funny how times change, nowadays you get really happy seeing a blue screen.
Monitoring power consumption of your home lab with a smart plug
Home labs are interesting beasts, at one hand you would love to have all the compute, storage and network power available, on the other hand you do not want to have a power bill similar to a Google data center.
I have a decent setup, with 4 Xeon servers, two cisco 1GB switches, a 10Gb switch and 3 Synology’s, but I don’t keep everything on all the time. One server acts as the management server, running a Windows DC, vCenter appliance, the PernixMS server and some other stuff. These machines are always on, not only to save time when I want to use my lab but increased stability as well. Due to this, my network gear and storage systems are also on. Which made me wonder how much the need for availability and stability will cost me on a yearly basis. The big Xeon rigs equipped with multiple PCIe devices are usually shut down after tests because I expect them to consume lots of power. Time to stop guessing and start monitoring. As always Home Lab Sensei Erik Bussink pointed me out to a simple solution the Smart Plug Edimax SP-2101W Smart Plug Switch. Please leave a comment if you are using a different solution that is a better alternative to this device.
The device
Nothing much to add about the device itself, it is sleek enough so it will not eat up multiple power outlets.
The device is managed via an apple or android app, the following screenshots are taken from an Apple device, you can monitor it with both your iPhone or iPad. You can manage multiple smart plugs from one device. As I’ve spread my lab over two power-groups I’ve installed two power-plugs to monitor my home lab.
Unfortunately, the app doesn’t allow displaying two smart plugs simultaneously, you have to open each individually. The monitor page shows the real time power consumption registered by the plug. It displays Amps and Watts. Quite cool to see what happens when you power-on devices or even a virtual machine, this monitored server generates a spike of 30 watts when powering on a VM, it quickly returns to a steady state though. Fun to see that ESXi hosts do not consume a steady high state of power.
The Now button shows the real-time power consumption and the total power consumption registered of today, this week and this month. By providing the price of energy, it calculates the total cost additionally. Unfortunately I haven’t found the option to change the currency sign, so you are stuck with the dollar sign.
By selecting the Usage button provides you a chart to view the power consumption of that day.
The app allows you to analyze power consumption trends of your home lab by provides an overview based on 24 hours of data, a week, a month and a full year.
Conclusion
The smart plugs are a great addition to my home lab, it provides me insights in the consumption and it for me personally have removed the reluctancy of leaving my full lab on. The answer to the question whether you need a smart plug if you run a home lab is in my opinion a straight and simple no. You can estimate cost or you can just ignore it and pay the bill when it arrives. I’m just curious about these things and it helps to clear my conscious.
Tracking down noisy neighbors
A big part of resource management is sizing of the virtual machines. Right-sizing the virtual machines allows IT teams to optimize the resource utilization of the virtual machines. Right sizing has become a tactical tool for enterprise IT-teams to ensure maximum workload performance and efficient use of the physical infrastructure. Another big part of resource management is keeping track of resource utilization, some of these processes are a part of the daily operation tasks performed by specialized monitoring teams or the administrators themselves. Service Providers usually cannot influence the right sizing element, therefor they focus more on the monitoring part. What is almost universal across virtual infrastructure owners is the incidental nature of tracking down ‘noisy-neighbors’ VMs . Noisy neighbor VMs generate workload in such a way that it monopolizes resources and have negative impact on the performance of other virtual machines. Service Providers and enterprise IT teams have to deal with these consumer outliers in order to meet the SLAs of existing workloads and being able to satisfy the SLA requirements of new workloads.
It’s interesting that noisy neighbor tracking is an incidental activity as it can be so detrimental to the performance of the virtual datacenter. Tools such as vSphere Storage IO Control (short term focus) and vSphere Storage DRS (long term focus) assist to alleviate the infrastructure from the burden of noisy neighbors, but attacking this problem structurally is necessary to ensure consistent and predictable performance from your infrastructure. At long term, noisy neighbor VMs impact the projected consolidation ratio, which in turn influences the growth rate of the infrastructure. I’ve seen plenty of knee jerk reactions, creating a server and storage infrastructure sprawl due to introduction of these outlier workloads.
Identifying noisy neighbors can become a valuable tool in both strategic and tactical playbooks of the IT organization. Having insight of which VMs are monopolizing the resources allow IT teams to act appropriately. Similar to real life the behavior of noisy neighbor can be changed often, but sometimes that’s the nature of the beast and you just have to live with it. In that situation noisy neighbors become outliers of conduct and one ha to make external adjustments. This insight allows IT teams to respond along the entire vertical axis of the virtual datacenter, from application to infrastructure choice. By having the correct analysis, the IT team can provide insights to the application owner, allowing them to adjust accordingly. It helps the IT team to understand whether the environment can handle the workload and make adjustment to the infrastructure necessary. Sometimes the intensity of the workload is just what it is and hosting that workload is necessary to support the business. In that case the IT team has to understand whether the infrastructure is suitable to support the application. As most IT organization have access to multiple platforms, the accurate insight of characteristics (and requirements) of the workload allows them to identify the correct platform.
Virtual Datacenters are difficult to monitor. They are comprised of a disparate stack of components. Every component logs and presents data differently. Different granularity of information, different time frames, and different output formats make it extremely difficult to correlate data. In addition you need to be able to correctly identify the workload characteristics and interpret the impact it has on the shared environment. We do not live in a world anymore where we have to deal with isolated technology stacks. Applications typically do not run anymore on a single box, connected to a single and isolated raid array. Today everything within the infrastructure is shared, the level of hardware resource distribution is diluting with each introduction of new hardware. Where we used to run a single application in a VM on top of server with ten other VMs, sharing a couple of NICs and HBA’s, we slowly moved towards converged network platforms. In the last 10 years, we shared and shared more, the only monolith remaining is the application in the VM and that is rapidly changing as well with the popularity of containers and micro services. Yet most of our testing mechanisms and monitoring efforts are still based on the architecture we left behind 10 years ago. Virtual Datacenters require continuous analytics that fully comprehends the context of the environment, with the ability to zoom in and focus on outliers if necessary.
In the upcoming series I’m going to focus on how to explore cluster level workloads and progressively zooming into specific workloads based on IOPS, block size, throughput and unaligned IOs.
Managing your virtual datacenter and home lab with a MAC
The majority of virtual datacenters are managed from Windows systems. When I started with virtualization I also used a windows system, however when I joined VMware I received a MacBook and this was the beginning of the end. Soon ever window device was replaced with an Apple device in my home. The problem was that I still needed to manage by home lab. To circumvent this, I created a Windows admin VM and installed all my trusted Windows apps, such as Putty, vSphere client and WinSCP. Works great! Until you want to rebuild your lab or restructure the environment. It always felt as a burden and on top of that I didn’t want to spend CPU cycles and waste memory of my home lab on admin VM. Throughout the years I discovered tools for Mac OS that replaced their trusted Windows equivalent and with the new release of the HMTL 5 Web client fling it removed dependency on the Client Integration Plugin (CIP). Here is the list of program and tips and tricks I use on my Mac to manage my Home lab.
Putty > iTerm2
PuTTY is an SSH and telnet client for the Windows Platform allowing you to have access to the command line of the ESXi server. For the Mac platform I recommend iTerm2. Although Mac OS has a native terminal application, iTerm2 has a couple of cool features that I absolutely love. It can run multiple sessions, each in its own tab.
With profiles you can configure the connection settings to your ESX host and with a simple shortcut key combination (for example, Control-command-1, you open a tab to the ESXi host.
Download iTerm2 here.
Remote Desktop > Royal TSX
MS Remote Desktop is available for Mac OS, but the one remote desktop application you want to get is Royal TSX. The free version allows up for ten remote desktop connections, typically more than enough for the majority of home labs. I bought a licensed version as I’m using more than ten profiles and like to separate the workload part of the lab in a separate configuration document from the management part of the lab. One of the cool features is the tabbed layout, allowing you to switch between remote desktops quite easily.
The screens at home have a minimum resolution 2560 x 1440 resolution, Royal TSX allows for any resolution, even native Retina resolution. I like to use the smart zoom and the resolution set by the virtual machine allowing you to have a proper environment to work in without hitting the time-consuming scroll bars.
If security isn’t a big concern for you, you can specify the user and password for the connection at multiple levels. Either on the remote desktop connection profile itself or specify it on the ‘connection document’ for the entire environment. A nice time saver!
If you are the complete opposite and you need higher levels of security, such as Network Level Authentication (NLA) Royal TSX is the application to get. NLA is enabled by default and you can configure to use Transport Layer Security (TLS) as well.
Download Royal TSX here.
WinSCP > CyberDuck
WinSCP and Veeam Backup Free Edition (previously Veeam FastSCP) are the most popular Secure FTP applications that allows you to copy files directly onto the ESXi host. Unfortunately the once announced port to MacOS of WinSCP never came into fruition and therefor I looked for alternatives. There are plenty SFTP clients, the one I use and like is Cyberduck
It allows for creating connection profiles called bookmarks, allowing you to connect to the correct folder directly. It also supports various encryption ciphers and authentication algorithms if you operate in a secure environment. Cyberduck is like all the other listed tools free but the occasionally ask for a donation.
Download Cyberduck here.
VMware ESXi Embedded Host Client fling
The ESXi embedded host client fling allows you to manage the ESXi host directly through a web client. Its fast, it’s easy to install and it provides most of the functionality you need when you are building your lab before deploying the vCenter Appliance. One of the great assets to this tool is the integrated VM console. It’s directly accessible within the browser and does not require any addiotnal plugins or installers. Solving the annoying Client Integration Plugin problem most Mac users faced when connecting to the vCenter via the web client.
The Fling currently only supports ESXi 6.0, however William published a workaround for ESXi 5.x. found here: http://www.virtuallyghetto.com/2015/08/new-html5-embedded-host-client-for-esxi.html
Download the VMware ESXi Embedded Host Client fling here.
vSphere HTML5 Web Client Fling v1.2 (h5client)
This fling got released this week and it allows you to connect with an HTML5 based web client to the vCenter server. Be aware that this client is designed for managing vCenter only! This release focuses on removing the dependency of the client integration plugin allowing administrators to connect with the VM console via the web client and do the basic stuff. Combine that with the normal web client and you execute the majority of operations to setup and deploy your home lab / virtual datacenter.
The client is deployed as a vib on one of the ESXi host. For detailed install instructions visit the VMware vSphere blog.
Download the HTML5 Web Client Fling v1.2 here.
Function keys
Not a tool, but sometimes you are required to press a function key, such as F11 when installing ESXi. No problem when installing physical boxes, a challenge when installing a nested ESXi system using a remote (VM) console. In order to send the correct key, press FN-CMD-F11. This works on most function keys and other non-alphanumeric keys
Please leave a comment if you want to share your favorite tool or handy tips and tricks to save time.