PLATFORM 9 - TRANSFORM YOUR VIRTUAL INFRASTRUCTURE INTO A PRIVATE CLOUD WITHIN SECONDS

Recently I had the joy of reconnecting with some of my old VMware colleagues to learn that their new startup was coming out of stealth. Today Platform 9 announced their SaaS platform. In short, Platform 9 allows IT organisations to transform their local IT infrastructure into a self-service private cloud. The beauty of this product is that it can be implemented on existing infrastructures. No need to create a new infrastructure to introduce the private cloud within your organisation. Just install the agent on your hypervisor layer, connect with the Platform 9 cloud management platform and you are off into the world of private clouds. The ease of integration is amazing and I believe that Platform 9 will be the accelerator of private cloud adoption. No need to go to AWS, no migration to Azure. You manage your own resources while allowing the customer to provision their own virtual machines or containers. Today Platform 9 supports KVM, but they will support both VMware and docker environments soon. I can dive into the details of Platform 9 but Eric Wright has done a tremendous job of publishing an extensive write-up and I recommend reading his article to learn more about Platform 9 private cloud offering. If you want to meet the team of Platform 9 and hear their vision, visit booth #324 at the solution exchange of VMworld 2014.

LIFE IN THE DATA CENTER - A STORY OF LOVE, BETRAYAL AND VIRTUALIZATION

I’m excited to announce the first ever “collective novel”, in which members of the virtualization community collaborated to create a book with intrigue, mystery, romance, and a whole lot of geeky data center references. The concept of the project is that one person writes a section and then passes it along. The writers don’t know their fellow contributors. They get an unfinished story in their mailbox and are allowed to take the story in whatever direction it needs to go. The only limitation is the author imagination. For me it was a fun and interesting project. Writing a chapter for a novel is a whole different ballgame than writing technical focused content. As I rarely read novels it’s a challenge how to properly describe the situation the protagonist is getting himself into. On top of that I needed to figure out how to extend and expand the story line set by the previous authors but also get the story into a direction I prefer. And to make it more challenging, you do not know what the next author will be writing, therefor your intention for the direction of the storyline may be ignored. All in all a great experience and I hope we can do a second collective novel. I’m already collecting ideas ☺ I would like to thank Jeff Aaron. He came up with the idea and guided the project perfectly. Once again Jon Atterbury did a tremendous job on the formatting and artwork of the book. And of course I would like to thank the authors of taking time out of their busy schedules to contribute to the book. The authors: [caption id=“attachment_4495” align=“alignleft” width=“125”] Jeff Aaron (@jeffreysaaron)[/caption] [caption id=“attachment_4491” align=“alignleft” width=“125”] Josh Atwell (@Josh_Atwell)[/caption] [caption id=“attachment_4490” align=“alignleft” width=“125”] Kendrick Coleman (@KendrickColeman)[/caption] [caption id=“attachment_4488” align=“alignleft” width=“125”] Amy Lewis (@commsNinja)[/caption] [caption id=“attachment_4489” align=“alignleft” width=“125”] Lauren Malhoit (@malhoit)[/caption] [caption id=“attachment_4492” align=“alignleft” width=“125”] Bob Planker (@plankers)[/caption] [caption id=“attachment_4494” align=“alignleft” width=“125”] Satyam Vaghani (@SatyamVaghani)[/caption] [caption id=“attachment_4493” align=“alignleft” width=“125”] Chris Wahl (@ChrisWahl)[/caption]

LET CLOUDPHYSICS HELP RID YOURSELF OF HEARTBLEED

Unfortunately the Open SSL Heartbleed bug (CVE-2014-0224) is present in the ESXi and vCenter 5.5 builds. VMware responded by incorporating a patch to solve the OpenSSL vulnerability in the OpenSSL 1.0.1 library. For more info about the ESXI 5.5 patch read KB 2076665, VMware issued two releases for vCenter 5.5, read KB 2076692. Unfortunately some NFS environments experienced connection loss after applying the ESXi 5.5 patch, VMware responded by releasing patch 2077360 and more recently vCenter update 1b. The coverage on the NFS problems and the amount of ESX and vCenter update releases to fix a bunch of problems may left organizations in the dark whether they patched the Heartbleed vulnerability. Cloudphysics released a free Heartbleed analytic card in their card store that helps identify which hosts in your environment are unprotected. Check out the recent article of Cloudphysics CTO, Irfan Ahmad about their recently released Heartbleed analytic package. I would recommend to run the card and rid yourself of this nasty bug.

HOMELAB - POWER-ON YOUR SUPERMICRO SYSTEM BY SSH'ING INTO IPMI

Just a short article, recently I discovered you can access Supermicro IPMI via SSH and power on the system by using the command: start /system1/pwrmgtsvc1 A nice short command that saves you a lot of time by eliminating the need to log in the webUI and wait until the app responds.

WHICH HA ADMISSION CONTROL POLICY DO YOU USE?

Yesterday Duncan and I where discussing the 5.5 update of the vSphere clustering deepdive book and we were debating which HA admission control policy is the most popular. Last week I asked around on twitter, but hopefully a short poll will give us better insights. Please cast your vote. [socialpoll id=“2195435”]

GOTCHA - DISABLE RESERVE ALL GUEST MEMORY SETTING DOES NOT REMOVE THE RESERVATION

A while ago I wrote about the nice feature Reserve all guest memory available in vSphere 5.1 and 5.5. The feature automatically adjusts the memory reservation when the memory configuration changes. Increase the memory size and the memory reservation is automatically increased as well. Reduce the memory size of a virtual machine, and the reservation is immediately reduced. This week I received an email from someone who used the settings temporarily and when disabling this setting he was surprised that the reservation was not set to 0, reverting back to the default. [caption id=“attachment_4377” align=“aligncenter” width=“603”] Expected behavior[/caption] [caption id=“attachment_4379” align=“aligncenter” width=“604”] Real product behavior[/caption] Although I understand his point of view, the reality is that when you enabled the feature your intent was to apply a memory reservation to the virtual machine. The primary function of this setting is to take away the responsibility of adjusting the reservation when you change the memory reservation. If your goal is to remove the memory reservation, disable the setting Reserve all guest memory and then change the memory reservation to 0.

VSPHERE 5.5 HOME LAB

For a while I’ve been using three Dell R610 servers in my home lab. The machines specs are quite decent, each server equipped with two Intel Xeon 5530 CPUs, 48GB of memory and four 1GB NICs. With a total of 24 cores (48 HT Threads) and 144GB of memory the cluster has more than enough compute power. However from a bandwidth perspective they are quite limited, 3 Gbit/s SATA and 1GbE network bandwidth is not really pushing the envelope. These limitations do not allow me to properly understand what a customer can expect when running FVP software. In addition I don’t have proper cooling to keep the machines cool and their power consumption is something troubling. Time for something new, but where to begin? CPU Looking at the current lineup of CPUs doesn’t make it easier. Within the same CPU vendor product line multiple types of CPU socket exist, multiple different processor series exist with comparable performance levels. I think I spent most of my time on figuring out which processor to select. Some selection criteria were quite straightforward. I want a single CPU system and at least 6 cores with Hyper-Threading technology. The CPU must have a high clock speed, preferably above 3GHz. Intel ARK (Automated Relational Knowledge base) provided me the answer. Two candidates stood out; the Intel Core i7 4930 and the Intel Xeon E5 1650 v2. Both 6 core, both HT-enabled, both supporting the advanced technologies such as VT-x, VT-d and EPT. http://ark.intel.com/compare/77780,75780 The main difference between the two CPU that matters the most to me is the higher number of supported memory of the Intel Xeon E5. However the i7-4930 supports 64GB, which should be enough for a long time. But the motherboard provided me the answer Motherboard Contrary to the variety of choices at CPU level, there is currently one Motherboard that stands out for me. It looks it almost too good to be true and I’m talking about the SuperMicro X9SRH-7TF. This thing got it all and for a price that is unbelievable. The most remarkable features are the on-board Intel X540 Dual Port 10GbE NIC and the LSI 2308 SAS controller. 8 DIMM slots, Intel C602J chipset and a dedicated IPMI LAN port complete the story. And the best part is that its price is similar of a PCI version of the Intel X540 Dual Port 10GbE NIC. The motherboard only supports Intel E5 Xeons, therefor the CPU selection is narrowed down to one choice, the Intel Xeon E5 1650 v2. CPU Cooler The SuperMicro X9SRH-7TF contains an Intel LGA2011 socket with Narrow ILM (Independent Loading Mechanism) mounting. This requires a cooler designed to fit this narrow socket. The goal is to create silent machines and the listed maximum acoustical noise of 17.6 dB(A) of the Noctua NH-U9DX i4 “sounds” promising. Memory The server will be equipped with 64GB. Four 16GB DDR3-1600 modules allowing for a future upgrade of memory. The full product name: Kingston ValueRAM KVR16R11D4/16HA Modules. Network Although two 10 GbE NICs provide more than enough bandwidth, I need to test scenarios where 1GbE is used. Unfortunately vSphere 5.5 does not support the 82571 chipset used by the Intel PRO/1000 Pt Dual Port Server Adapter currently inserted in my Dell servers. I need to find an alternative 1 GbE NIC recommendations are welcome. Power supply I prefer a power supply that is low noise and fully modular. Therefore I selected the Corsair RM550. Besides a noise-reducing fan the PSU has a Zero RPM Fan Mode, which does not spin the fan until it is under heavy load, reducing the overall noise level of my lab when I’m not stressing the environment. Case The case of choice is the Fractal Design Define R4. Simple but elegant design, enough space inside and has some sound reducing features. Instead of the standard black color, I decided to order the titanium grey. SSD Due to the PernixDrive program I have access to many different SSD devices. Currently my lab contains Intel DC 3700 100GB and Kingston SSDNOW enterprise e100 200GB drives. Fusion I/O currently not (yet) in the PernixDrive program was so kind to lend me a Fusion I/O IODrive of 3.2 TB, unfortunately I need to return this to Fusion someday. Overview

HELP MY DRS CLUSTER IS NOT LOAD BALANCING!

Unfortunately I still see this cry for help appearing on the VMTN forums and on twitter. And they usually are accompanied by screenshots like this: This screen doesn’t really show you if your DRS cluster is balanced or not. It just shows if the virtual machine receives the resources they are entitled to. The reason why I don’t use the word demand is that DRS calculates priority based on virtual machine and resource pool resource settings and resource availability. To understand if the virtual machine received the resources it requires, hover over the bar and find the virtual machine. A new window is displayed with the metric “Entitled Resources Delivered” DRS attempts providing the resources requested by the virtual machine. If the current host is not able to provide the resources, DRS move it to another host that is able to provide the resources. If the virtual machine is receiving the resources it requires then there is no need to move the virtual machine to another hosts. Moves by DRS consume resources as well and you don’t want to waste resources on unnecessary migrations. To avoid wasting resources, DRS calculates two metrics, the current host load standard deviation and the target host load standard deviation. These metrics indicate how far the current load of the host is removed from the ideal load. The migration threshold determines how far these two metrics can lie apart before indicating that the distribution of virtual machines needs to be reviewed. The web client contains this cool water level image that indicates the overall cluster balance. It can be found at the cluster summary page and should be used as a default indicator of the cluster resource status. One of main arguments is that a host contain more than CPU and memory resources alone. Multiple virtual machines located on one host, can stress or saturate the network and storage paths extensively, whereas a better distribution of virtual machine across the hosts would also result in a better distribution of resources at the storage and network path layer. And this is a very valid argument, however DRS is designed to take care of CPU and Memory resource distribution and is therefor unable to take these other resource consumption constraints into account. In reality DRS takes a lot of metrics into account during its load balance task. For more in-depth information I would recommend to read the article: “DRS and memory balancing in non-overcommitted clusters” and “Disabling mingoodness and costbenefit”.

CONSUMER GRADE SSD VERSUS ENTERPRISE GRADE SSD, WHICH ONE TO PICK?

Should I use consumer grade SSD drives or should I use enterprise grade SSD drives? This a very popular question and I receive it almost on a daily basis. Lab or production environment, my answer is always the same: Enterprise grade without a doubt! Why? Enterprise Grade drives have a higher endurance level, they contain power loss data protection features and they consistently provide high level of performance. All align with a strategy ensuring reliable and consistent performance. Lets expand on these three key features; Endurance Recently a lot of information is released about the endurance levels of consumer grade SSDs and tests show that they operate well beyond the claimed endurance levels. Exciting news as it shows how much progression is made during the last few years. But be aware that vendors test their consumer grade SSDs with client workloads while enterprise grade SSDs are tested with worst-case data center workload. The interesting question is whether the SSD vendor is list the rate a drive in DWPD or drive-writes per-day in a conservative manner or an aggressive manner? As I don’t want to gamble with customers’ data, I’m not planning to find out whether the consumer SSD wasn’t able to sustain high levels of continuous data center load. I believe vSphere architectures have high endurance requirements; therefore use enterprise drives as they are specifically designed and tested for this use. Power loss data protection features Not often highlighted but most enterprise SSDs contain power loss data protection features. These SSDs typically contains a small buffer or cache in which the data is stored before it’s written to disk. Enterprise SSD leverages various on-board capacitance solutions to provide enough energy for the SSD to move the data from the cache to the drive itself. Protecting the drive and the data. It protects the drive because if a sector is partially written it becomes unreadable. This can lead to performance problems, as the drive will perform time-consuming error recovery on that sector. Select Enterprise drives with power loss data protection features, it avoids erratic performance levels or even drive failure after a power-loss. Consistent performance Last but certainly not least is the fact that enterprise SSDs are designed to provide a consistent level of performance. SSD vendors expect their enterprise disks to be used intensively for an extended period of time. This means that possibility of a full disk increases dramatically when comparing it to a consumer grade SSD. As data can only be written to a cell that is in an erased state, high levels of write amplification is expected. Please read this article to learn more about write amplification (write amp). Write amp impacts the ratio of drive writes to host writes, that means that when write amp occurs the number of writes a drive needs to make increases considerably in order to execute those host writes. One way to reduce this strain is to “over-provision” the drive. Vendors, such as Intel, allocate a large amount of flash resource to allow the drive to absorb these write amp operations. This results in a more consistent rate of IOPS and predictable IOPS. Impact on IOPS and Latency I’ve done some testing in my lab, and used two enterprise flash drives, a Intel DC S3700 and a Kingston E-100. I also used two different consumer grade flash devices. I refrain from listing the type and vendor name of these disks. I ran the first test from 11:30 to 11:50 I ran the test an enterprise grade SSD drive, the rate of IOPS was consistent and predictable. The VM was migrated to the host with the consumer grade SSD and the same test was run again, not a single moment did the disk provide a steady rate of IOs. Anandtech.com performed similar tests and witnessed similar behaviour, the publish their results in the article “Exploring the Relationship Between Spare Area and Performance Consistency in Modern SSDs” An excellent read, highly recommended. [caption id=“attachment_4262” align=“aligncenter” width=“646”] Picture courtesy of Anandtech.com[/caption] Click on the different drive sizes to view their default performance and the impact of spare flash resources on the ability to provide consistent performance. Next step was to determine latency behaviour. Both Enterprise grade SSD provided an extreme case of predictable latency. To try to create an even playing field I ran read tests instead of write centric tests. The first graph was a read test on the Kingston e100. Latency was consistent providing predictable and consistent application response time. The consumer grade drive performance charts were not as pretty. The virtual machine running the read test was the only workload hitting the drive and yet the drive had trouble providing steady response times. Please note that the test were ran multiple times and the graphs shown are the most positive ones for the consumer grades. Multiple (enterprise-level) controllers were used to avoid any impact from that layer. As more and more SSD drive hit the market we decided to help to determine which drives fit in a strategy ensuring reliable and consistent performance. Therefor PernixData started the PernixDrive initiative, in which we test and approve flash devices. Conclusion Providing consistent performance is key for predictable application behaviour. This applies to many levels of operation. First of all it benefits day-to-day customer satisfaction and helps you to reduce troubleshooting application performance. Power-loss data protection features help you to cope with short-term service loss, and avoid continuous performance loss as the drive can survive power-loss situations. Reverting applications to a non-accelerated state, due to complete loss of SSD drive can result in customer dissatisfaction or neglecting your SLA. Higher levels of drive-writes per-day help you to create and ensure high levels of consistent performance for longer terms. In short, use the correct tool for the job and go for enterprise SSD drives.

WHO TO VOTE FOR?

This week Eric Siebert opened up the 2014 edition of top virtualization blog contest. For the industry this is one of the highlights and applaud the effort Eric and his team of volunteers put in to make this work. I cannot wait to the watch the show in which they unveil this years top 25 winners. A big thank you to Eric and the team! Most of the time you will see blog articles that highlight this years effort and I think they are great. As there are so many great bloggers writing and sharing their thoughts and ideas, it’s very easy to miss out on some brilliant post. A quick scan of these posts helps to (re)discover the wealth of information that is out there. Last year I was voted number 2, however this year the frequency (hopefully not the quality) of my blog articles went down. This was due to my career change and the new responsibilities my job role encompasses. Plus creating the vSphere design book took a lot of time and effort. For this years VMworld we have planned something even better, so please stay tuned for this years VMworld book! But this post is not about me as a blogger and my material, but to highlight some of the bloggers that help the community understand the product better, comprehend the behavior of the complex systems we work with every day and the insights they provide by spending a lot of their (spare) time writing and creating these great articles. Voting for them you will help them understand that their time and effort is well spend! First of all, guys like Duncan Epping, Cormac Hogan, William Lam and Eric Sloof relentlessly churn out great collateral, whether it is a written article, podcast or video. It keeps the community well fed when it comes to quality information. Writing a great article is a challenge, doing this on a continuous basis is even more impressive! But I would like to highlight some of the guys that are considered “new” guys. They are all industry veterans, but they decided to pick up blogging recently. I would like to highlight these guys, but there are many more of course. Pete Koehler - vmpete.com Pete writes a lot about PernixData, but that’s not the reason I want to highlight him. His articles are quite in-depth and I love reading those articles as I learn from them every time Pete decides to post his most recent insights. For example in the article “Observations of PernixData in a Production environment” he covers the IOPS, Throughput & Latency relationship in great detail. In this exercise he discovers that applications do not use a static block size, something you don’t read that often. He correlates specific output and explains how each metric interacts which each other, educating you along the way and helping you to do a better and more effective job in your own environment. Josh Odgers - joshodgers.com Josh is listed both on the general blogging list as well as a newcomer and I think he deserves to be “rookie of the year” Josh’s insight are very valuable and its always a joy to read his articles. His VCDX articles are top notch and are a must read for every aspiring VCDX candidate. Just too bad he decided to join Nutanix ;). Luca Dell’Oca – virtualtothecore.com Dropping knowledge both in English and Italian, Luca is covering new technologies as well as insight full tips and tricks on a frequent basis. Ranging from reclaiming space on a Windows 2012 installation to a complete write up on how to create a valuable I/O test virtual machine. A blog that should be visited regularly. Willem ter Harmsel - willemterharmsel.nl Not your average virtualization blog, Willem covers the startup world by interviewing CEO’s and CTOs of the hottest and newest startups this world currently has to offer. Willem provides insights of upcoming technology and allows its readers to place and compare different technologies. A welcome change of pace after spending a day knee-deep into the bits and bytes Consuming those stories and articles on a daily basis, are they helpful in your daily work? Please show your appreciation and vote today on your favorite blogs! Thanks! Please vote now!