Currently, I’m migrating the site to a new place. I hope everything will be back to normal at the beginning of next week.
A big part of resource management is sizing of the virtual machines. Right-sizing the virtual machines allows IT teams to optimize the resource utilization of the virtual machines. Right sizing has become a tactical tool for enterprise IT-teams to ensure maximum workload performance and efficient use of the physical infrastructure. Another big part of resource management is keeping track of resource utilization, some of these processes are a part of the daily operation tasks performed by specialized monitoring teams or the administrators themselves. Service Providers usually cannot influence the right sizing element, therefor they focus more on the monitoring part. What is almost universal across virtual infrastructure owners is the incidental nature of tracking down ‘noisy-neighbors’ VMs . Noisy neighbor VMs generate workload in such a way that it monopolizes resources and have negative impact on the performance of other virtual machines. Service Providers and enterprise IT teams have to deal with these consumer outliers in order to meet the SLAs of existing workloads and being able to satisfy the SLA requirements of new workloads.
It’s interesting that noisy neighbor tracking is an incidental activity as it can be so detrimental to the performance of the virtual datacenter. Tools such as vSphere Storage IO Control (short term focus) and vSphere Storage DRS (long term focus) assist to alleviate the infrastructure from the burden of noisy neighbors, but attacking this problem structurally is necessary to ensure consistent and predictable performance from your infrastructure. At long term, noisy neighbor VMs impact the projected consolidation ratio, which in turn influences the growth rate of the infrastructure. I’ve seen plenty of knee jerk reactions, creating a server and storage infrastructure sprawl due to introduction of these outlier workloads.
Identifying noisy neighbors can become a valuable tool in both strategic and tactical playbooks of the IT organization. Having insight of which VMs are monopolizing the resources allow IT teams to act appropriately. Similar to real life the behavior of noisy neighbor can be changed often, but sometimes that’s the nature of the beast and you just have to live with it. In that situation noisy neighbors become outliers of conduct and one ha to make external adjustments. This insight allows IT teams to respond along the entire vertical axis of the virtual datacenter, from application to infrastructure choice. By having the correct analysis, the IT team can provide insights to the application owner, allowing them to adjust accordingly. It helps the IT team to understand whether the environment can handle the workload and make adjustment to the infrastructure necessary. Sometimes the intensity of the workload is just what it is and hosting that workload is necessary to support the business. In that case the IT team has to understand whether the infrastructure is suitable to support the application. As most IT organization have access to multiple platforms, the accurate insight of characteristics (and requirements) of the workload allows them to identify the correct platform.
Virtual Datacenters are difficult to monitor. They are comprised of a disparate stack of components. Every component logs and presents data differently. Different granularity of information, different time frames, and different output formats make it extremely difficult to correlate data. In addition you need to be able to correctly identify the workload characteristics and interpret the impact it has on the shared environment. We do not live in a world anymore where we have to deal with isolated technology stacks. Applications typically do not run anymore on a single box, connected to a single and isolated raid array. Today everything within the infrastructure is shared, the level of hardware resource distribution is diluting with each introduction of new hardware. Where we used to run a single application in a VM on top of server with ten other VMs, sharing a couple of NICs and HBA’s, we slowly moved towards converged network platforms. In the last 10 years, we shared and shared more, the only monolith remaining is the application in the VM and that is rapidly changing as well with the popularity of containers and micro services. Yet most of our testing mechanisms and monitoring efforts are still based on the architecture we left behind 10 years ago. Virtual Datacenters require continuous analytics that fully comprehends the context of the environment, with the ability to zoom in and focus on outliers if necessary.
In the upcoming series I’m going to focus on how to explore cluster level workloads and progressively zooming into specific workloads based on IOPS, block size, throughput and unaligned IOs.
Just a short article, recently I discovered you can access Supermicro IPMI via SSH and power on the system by using the command:
A nice short command that saves you a lot of time by eliminating the need to log in the webUI and wait until the app responds.
Currently topic about labs are hot and when meeting people at the VMUGs or other tech conferences, I get asked a lot about my lab configuration. I’m a big fan of labs and I think everybody who works in IT needs a lab, whether it’s at home or in a centralised location.
At PernixData we have two major labs. One at the east coast and one at the west coast of the U.S of A. Both these labs are shared, that means you cannot do everything you like. However sometimes you want to break stuff, you want to pull cables, disks, kill an entire server or array. Just to see what happens. For these reasons having a lab that is 4000 miles away doesn’t really work, enough reasons to build a small lab at home.
Nested or physical hardware?
To nest or not to nest, that’s not even the question. Nesting is amazing and VMware itself is spending a lot of energy and time on nested environments (think HOL). Recently the fling VMware tools for Nested ESXi was released and I assume more nested ESXi flings will follow after seeing the attention it received by the community.
But to run nested ESXi, you need to have physical hardware. Thanks to a generous donation I received 6 Dell r610s and that covered my compute level requirements. But sometimes you want to test the software only and in those cases you do not need to fire up an incredible loud semi-datacenter rig. For those situations I created an ESXi host that is near silent when running full speed. This ESXi server hosts also a nested ESXi environment and is just a white box with a simple ASUS mobo, 24GB and the Intel 1GB Ethernet port. Once this machine is due for renewal, a white box following the baby dragon design will replace it.
To test the software at enterprise level, you require multiple levels of bandwidth, sometimes the bare minimum and sometimes copious amounts of it. The R610 sports 4 x 1GB Ethernet connections and that allows me to test scenario’s that can happen in a bandwidth-constrained environment. Usually the interesting cases happen when you have a lot of restrictions to deal with and these 1GB NICs are perfect for this. 10GB connections is on my wish list, but to have a nice setup you still need to invest more than a 1000 bucks to test it adequately.
A little bit over the top for my home lab, but the community came to the rescue and provided me with a solution; the infiniband hack. A special thanks goes out to Raphael Schitz and Eric Bussink for providing me the software and the information to run my lab at 10Gbps and being able to provide incredibly low latencies to my virtual machines. With the infiniband setup I can test scenarios where bandwidth is not a restriction and investigate specific setups and configurations. For more info listen to the vBrownbag techtalk where Erik Bussink dives into the topic “InfiniBand in the Lab”
The storage layer is provided by a number of storage virtual appliances each backed by a collection of different SSD disks and WD Black Caviar 750GB disks. Having multiple solutions allows me to test various scenarios such as all flash arrays, hybrid and all magnetic disks arrays. If I need understand the specific dynamics of an array, I just log in to one of the two US based labs.
My home office is designed to be an office and not a datacenter. So where do you place 19” rack servers without ruining the esthetics of your minimalistic designed home office ;). Well you create a 19” rack on wheels so you can role it out of sight and place it wherever you want it. Introducing the portable Ikea lack 19” datacenter rack.
Regular readers of my blog or twitter followers know I’m a big fan hacking IKEA furniture. I created a whiteboard desk that got attention of multiple sites and ikeahackers.net provided me with a lot of ideas how to hack the famous lack table side table.
I bought two lack tables, a couple of L shaped brackets, 4 wheels and nuts and bolts. The first lack table provides the base platform. Only the tabletop is used, the legs are discarded and acted as backup if I made a mistake during the drilling of the holes in the legs of the second table.
I didn’t test the center of the tabletop, but the corners of the table top are solid and can be used to install wheels. I used heavy duty ball bearing wheels with an offset swivel caster design that permits ease of directional movement. Simple 5mm nuts and bots keep the L shape brackets in to place, but beware the table legs are not made of solid wood. They are hollow! Only a few centimeters of the top of the leg is solid, this to hold the screw that connects the table and leg. To avoid having the server pull the screw through the leg due to its weight, I used washers to keep them in place
From a hardware perspective 10GbE is still high on my wishlist. When looking at the software layer I want to create a more automated-way of deploying and the testing of PernixData FVP software. One of the things I’m looking into is using and incorporating Auto Deploy in the lab. But that’s another blog post.