VIRTUALLY SPEAKING PODCAST: HOST DEEP DIVE

Last Friday I had the honor to join Pete Flecha a.k.a. Pedro Arrow and John Nicholson on their always fantastic podcast Virtually Speaking. Together with Niels we talked about what it takes to write a book such as the VMware vSphere 6.5 Host Resources Deep Dive. Thanks John and Pete for having me on again. Check it out. https://soundcloud.com/virtuallyspeakingpodcast/episode-49-host-resources-deep-dive

EXPLORING THE CORE MOTIVATION OF WRITING A BOOK

More than a week ago Niels and I released the VMware vSphere 6.5 Host Resources Deep Dive and the community has welcomed it with open arms. The book is finding its way across the globe, from Argentina to New Zealand. To see the massive amounts of tweets praising the books brings us pride and joy. Over the last couple of days, I have received many inquiries what it takes to write a book and if I could provide some hints and tips. I thought it might be an interesting blog post. Three questions you need to ask yourself

HOST DEEP DIVE STICKERS AND MORE

Last week we released the VMware vSphere 6.5 Host Resources Deep Dive book and Twitter and Facebook exploded. We’ve seen some pretty bad-ass pictures on our Twitter feeds such as this one by Jamie Girdwood (@creamcookie) It’s always nice to hear some praise after spending more than 800 hours on something. (When writing and self-publish a book, expect to spend over 90 minutes on one page). Thanks! The top three most often heard questions were:

WHY THE RECENT REPORTED INTEL HT BUG IS NOT IN YOUR DATA CENTER

Yesterday I tweeted out the warning message about the HT bug of Skylake and Kaby Lake processors posted on debian.org. https://lists.debian.org/debian-devel/2017/06/msg00308.html My tweet got a LOT of retweets. A lot replied with concerns about their systems. I believe most Data Centers will not suffer from this bug as it is present on Skylake and Kaby Lake processors. What is the Bug? According to the warning: Unfixed Skylake and Kaby Lake processors could, in some situations, dangerously misbehave when hyper-threading is enabled. Disable hyper-threading immediately in BIOS/UEFI to work around the problem. Read this advisory for instructions about an Intel-provided fix. https://www.intel.com/content/www/us/en/processors/xeon/xeon-e3-1200v5-spec-update.html Unlikely Present in Your Data Center The reason why I believe most systems in data centers are not hit by this bug is that it solely applies to E3 Xeons from the Skylake microarchitecture. E3 CPUs are designed to operate in a single socket system, they have no QuickPath Interconnect. Therefore unable to create a symmetric multiprocessing system. The current E5 (dual-socket) system is based on the Broadwell microarchitecture. The Skylake microarchitecture is expected to appear within the next couple of months. According to the report, they will have the fix included when the product launched. If you are running a NUC in your lab, you might want to check to see whether your system might hit that bug http://ark.intel.com/products/codename/82879/Kaby-Lake http://ark.intel.com/products/codename/37572/Skylake The link will forward you to a perl script that can help detect if your system is affected or not. Many thanks to Uwe Kleine-König for suggesting, and writing this script. https://lists.debian.org/debian-devel/2017/06/msg00309.html -

KEYNOTING DEUTSCHE VMUG AND LONDON VMUG

Later on this month, I will be attending the Deutsche VMUG and the London VMUG. As part of the events, I have the opportunity to deliver the keynote on the upcoming service VMware Cloud on AWS. Many of you will already be aware that Niels and I are releasing the VMware vSphere 6.5 Host Resource Deep Dive. Together we will provide a session at both events zooming into ESXi hosts designs, highlighting some interesting behavior from a component and VMkernel perspective. DEUTSCHE VMUG USERCON 2017 14 June 2017 KAP Europa, Kongresshaus der Messe Frankfurt Osloerstrasse 5 Frankfurt am Main, 60327 DE LONDON VMUG 22 June 2017 10:00 AM - 5:15 PM (UTC) TechUK 10 St Bride Street London, EC4A 4AD If you can’t make it to the LONDON VMUG, join us at vBeers that night. We will be heading over to the Fourpure brewing company at 22 Bermondsey Trading Estate, Rotherhithe New Road, London Hope to see you at one of these events!

MEMORY-LIKE STORAGE MEANS FILE SYSTEMS MUST CHANGE - MY TAKE

I’m an avid reader of thenextplatform.com. They always provide great insights into new technology. This week they published the article “Memory-Like Storage Means File Systems Must Change” and as usually full of good stuff. The focus of this article is about the upcoming non-volatile memory technologies that leverage the memory channel to provide incredible amounts of bandwidth to the storage medium. I can’t wait to see this happen and we can start to build systems with performance characteristics that weren’t conceivable a half a decade ago. The article mentions 3D XPoint and Intel Apache Pass is the codename for 3D XPoint in DIMM format. It could be NVDIMM it could be something else. We don’t know yet. This article argues that storage systems need to change and I fully agree. If you consider the current performance overhead on recently released PCIe NVMe 3D XPoint devices, it is clear that the system and the software have the largest impact on latency. The solved the device characteristics pretty much; it’s now the PCIe bus and the software stack that delays the I/O. Moving to the memory bus makes sense. Less overhead and almost five times the bandwidth. For example, four-lane PCIe 3.0 provides a theoretical bandwidth of close to 4 GB/s while 2400 MHz memory has a peak transfer rate of close to 19 GB/s. This sounds great and very promising, but I do wonder how will it impact memory operations. The key is to deliver an additional level of memory hierarchy, increasing capacity while abstracting the behavior of the new media. It’s key to understand that memory is accessed after an L3 miss. It can spend a lot of time waiting on DRAM. A number often heard is that it can spend 19 out of every 20 instruction slots waiting on data from memory. This figure seems accurate as the latency of an instruction inside a CPU register is one ns while memory latency is close to 15 ns. Each core requires memory bandwidth, and this impacts the average memory bandwidth per core. Introducing a media that is magnitudes slower than DRAM can negatively affect the overall system performance. More cycles are wasted on waiting on memory media. Please remember that not every workload is storage I/O bound. Great system design is not only about making I/O faster; it’s about removing bottlenecks in a balanced matter. It’s essential that the storage I/O should not interrupt DRAM traffic. An analogy would be a car that can go 65MPH. The car in front of him drives 55 MPH. By selecting another lane, the slower car does not interfere anymore, and he can drive the speed he wants. The problem is in this lane cars typically drive 200 MPHs. The key point for both NVDIMM as Intel Apache Pass is that adding storage on the memory bus to improve I/O latency should not interfere with DRAM operations. This content is an excerpt of the upcoming vSphere 6.5 Host Resources Deep Dive book.

VIRTUALLY SPEAKING PODCAST: VMWARE CLOUD ON AWS & HOSTDEEPDIVE

Last Friday I had the honor to join Pete Fletcha a.k.a. Pedro Arrow and John Nicholson on their always fantastic podcast Virtually Speaking. Unfortunately, John was ill that morning, but Duncan helped us out by taking a break from his vacation. We spoke about the upcoming service VMware Cloud on AWS (#VMWonAWS). Why it bring such a tremendous value for customers who are in the process of building a hybrid cloud, and how it can help organizations who are already a customer of both VMware and AWS. Closing off we touched upon the progress of the upcoming book ‘vSphere 6.5 Host Resources Deep Dive’. I had a blast being a guest again, enjoy the show! https://soundcloud.com/virtuallyspeakingpodcast/episode-43-vmware-cloud-on-aws?utm_source=soundcloud&utm_campaign=wtshare&utm_medium=Twitter&utm_content=https%3A//soundcloud.com/virtuallyspeakingpodcast/episode-43-vmware-cloud-on-aws

IMPACT OF CPU HOT ADD ON NUMA SCHEDULING

On a regular basis, I receive the question if CPU Hot-add impacts CPU performance of the VM. It depends on the vCPU configuration of the VM. CPU Hot-Add is not compatible with vNUMA, if hot-add is enabled the virtual NUMA topology is not exposed to the guest OS and this may impact application performance. Please note that vNUMA topology is only exposed when the vCPU count of the VM exceeds the core count, thus if the ESXi host contains two CPU packages with 10 cores, the vNUMA topology is presented to the VM if the vCPU count equals 11 or more.

PERFORMANCE STUDY: DRS CLUSTER MANAGEMENT WITH RESERVATION AND SHARES

Last Friday a new performance study was published about DRS Cluster Management. This paper covers the behavior of reservation and shares within a DRS cluster in-depth. It’s a great read! And being honest, it’s always awesome to see a reference to the vSphere Clustering Deep Dive in official documentation. Download it here: http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/performance/drs-cluster-mgmt-perf.pdf

VSPHERE 6.5 HOST DEEP DIVE UPDATE

Maybe you have noticed that no new content has appeared on the site for a while. And the upcoming book “vSphere 6.5 Host Resources Deep Dive” is to blame for this situation. Last year, Niels Hagoort and I started working on the companion book of the highly successful vSphere Clustering Deep Dive book. We set out writing this book to refocus on the fundament component of the virtual data center, the ESXi host. Today’s focal point is on upper levels/overlay’s (SDDC stack, NSX, Cloud). These topics are exciting and take IT services to the next level, but we also understand that proper host design and management fabricates the foundation for success. As a result, this book explores the host resources, CPU, memory, storage, and network in depth. Our goal is to provide you with an in-depth view of the four major host resources. Instead of showing you where to click to achieve a certain configuration, we explain the inner-workings of these components and how various physical and virtual constructs interact with each other. We believe that this method provides a basis – a foundation on its own - that helps you to design and build the best possible architecture that aligns with the customer requirements each and every time. As you can imagine, trying to write a fitting companion to the cluster deep dive is no small feat. Research, reverse engineering and reading through a lot of academic papers consume most of our time besides our day-time job, hence the progress is not as fast as we would like. Expect the book to be released between April and May this year. Working on this book reminds me of the African Proverb “If you want to go quickly, go alone. If you want to go far, go together”. Although Niels and I generate the content, a lot of people are involved ensuring the quality is up to par. Both Niels and I would like to acknowledge the following persons: Jane Rimmer (has the challenging task of restructure our content into proper English). Chris Gianos (Lead Engineer of Intel Xeon microarchitecture), Haoqiang Zheng (Principal Engineer CPU Scheduler VMkernel) Valentin Bondzio (All-Star Badass GSS VMware) Duncan Epping (Chief Technologist Storage BU VMware) Marco van Baggum (Architect ITQ) Myles Gray (Infrastructure Engineer Novosco) Rutger Kosters (Solution Architect Rubrik) Anthony Spiteri (Technical Evanglist Veeam Software) Joop Carels (Sr. Solution Integrator Ericsson) We expect to publish the book in print in the April/May timeframe. An ebook version will be scheduled to appear at the end of this year. Throughout the writing process, we update the books’ twitter account (@HostDeepDive) and Facebook page with sneak peeks and interesting reference material such as academic papers. Please subscribe to these channels to receive updates.