Category: Deep Dive (page 1 of 3)

Free vSphere 6.5 Host Resources Deep Dive E-Book

In June of this year, Niels and I published the vSphere 6.5 Host Resources Deep Dive, and the community was buzzing. Twitter exploded, and many community members provided rave reviews.

This excitement caught Rubriks attention, and they decided to support the community by giving away 2000 free copies of the printed version at VMworld. The interest was overwhelming, before the end of the second signing session in Barcelona we ran out of books.

A lot of people reached out to Rubrik and us to find out if they could get a free book as well. This gave us an idea, and we sat down with Rubrik and the VMUG organization to determine how to cater the community.

We are proud to announce that you can download the e-book version (PDF only) for free at rubrik.com. Just sign up and download your full e-book copy here.

Spread the word! And if you like, thank @Rubrik and @myVMUG for their efforts to help the VMware community advance.

Get your Free Book at VMworld

At VMworld, the presenters of the following sessions will be giving away free copies of the Host Deep Dive book to the audience.

Saturday
Performance Bootcamp
Mark Achtemichuk
Saturday, Aug 26, 8:00 a.m. – 5:00 p.m.
More information about pre-VMworld Performance Bootcamp

Sunday
An Introduction to VMware Software-Defined Storage [STO2138QU]
Lee Dilworth, Principal Systems Engineer, VMware
Sunday, Aug 27, 4:00 p.m. – 4:30 p.m. | Oceanside C, Level 2

Monday
A Deep Dive into vSphere 6.5 Core Storage Features and Functionality [SER1143BU]
Cody Hosterman, Technical Director–VMware Solutions, Pure Storage
Cormac Hogan, Director – Chief Technologist, VMware
Monday, Aug 28, 11:30 a.m. – 12:30 p.m. | Mandalay Bay Ballroom G, Level 2

Extreme Performance Series: Benchmarking 101 [SER2723BUR]
Joshua Schnee, Senior Staff Engineer @ VMware Performance, VMware
Mark Achtemichuk, Staff Engineer, Performance, VMware
Monday, Aug 28, 4:00 p.m. – 5:00 p.m. | Mandalay Bay Ballroom B, Level 2

Maximum Performance with Mark Achtemichuk [VIRT2368GU]
Mark Achtemichuk, Staff Engineer, Performance, VMware
Monday, Aug 28, 5:30 p.m. – 6:30 p.m. | Reef E, Level 2

The Top 10 Things to Know About vSAN [STO1264BU]
Duncan Epping, Chief Technologist, VMware
Cormac Hogan, Director – Chief Technologist, VMware
Monday, Aug 28, 5:30 p.m. – 6:30 p.m. | Mandalay Bay Ballroom H, Level 2

VMware vSAN: From 2 Nodes to 64 Nodes, Architecting and Operating vSAN Like a VCDX for Scalability and Simplicity [STO2114BU]
Greg Mulholland, Principal Systems Engineer, VMware
Jeff Wong, Customer Success Architect, VMware
Monday, Aug 28, 5:30 p.m. – 6:30 p.m. | Surf E, Level 2

Tuesday
Extreme Performance Series: Performance Best Practices [SER2724BU]
Reza Taheri, Principal Engineer, VMware
Mark Achtemichuk, Staff Engineer, Performance, VMware
Tuesday, Aug 29, 2:30 p.m. – 3:30 p.m. | Oceanside D, Level 2

Wednesday
vSphere 6.5 Host Resources Deep Dive: Part 2 [SER1872BU]
Frank Denneman, Senior Staff Architect, VMware
Niels Hagoort, Owner, HIC (Hagoort ICT Consultancy)
Wednesday, Aug 30, 8:30 a.m. – 9:30 a.m. | Breakers E, Level 2

Extreme Performance Series: Benchmarking 101 [SER2723BUR]
Joshua Schnee, Senior Staff Engineer @ VMware Performance, VMware
Mark Achtemichuk, Staff Engineer, Performance, VMware
Wednesday, Aug 30, 8:30 a.m. – 9:30 a.m. | Lagoon L, Level 2

vSAN Networking and Design Best Practices [STO3276GU]
John Nicholson, Senior Technical Marketing Manager, VMware
Wednesday, Aug 30, 11:30 a.m. – 12:30 p.m. | Reef C, Level 2

vSAN Hardware Deep Dive Panel [STO1540PU]
Ed Goggin, Staff Engineer 2, VMware
David Edwards, Principal Engineer, Director Solutions, Resurgent Technology
Ken Werneburg, Group Manager Technical Marketing, VMware
Jeffrey Taylor, Technical Director, VMware
Ron Scott-Adams, Hyper-Converged Systems Engineer, VMware
Wednesday, Aug 30, 1:00 p.m. – 2:00 p.m. | Mandalay Bay Ballroom D, Level 2

A Closer Look at vSAN Networking Design and Configuration Considerations [STO1193BU]
Cormac Hogan, Director – Chief Technologist, VMware
Andreas Scherr, Senior Solution Architect, VMware
Wednesday, Aug 30, 4:00 p.m. – 5:00 p.m. | Mandalay Bay Ballroom G, Level 2

Thursday
Virtual Volumes Technical Deep Dive [STO2446BU]
Patrick Dirks, Sr. Manager, VMware
Pete Flecha, Sr Technical Marketing Architect, VMware
Thursday, Aug 31, 10:30 a.m. – 11:30 a.m. | Oceanside B, Level 2

Book Signing
We will be doing two book signing sessions as well.
At the Rubrik booth #412 on Monday, Aug 28, 2:00 p.m. – 3:00 p.m.
At the VMworld Book store on Tuesday, Aug 29, 11:30 a.m. – 12:00 p.m.
Or just feel free to approach us when you see us walking by.

Host Deep Dive Stickers and More

Last week we released the VMware vSphere 6.5 Host Resources Deep Dive book and Twitter and Facebook exploded. We’ve seen some pretty bad-ass pictures on our Twitter feeds such as this one by Jamie Girdwood (@creamcookie)

It’s always nice to hear some praise after spending more than 800 hours on something. (When writing and self-publish a book, expect to spend over 90 minutes on one page). Thanks!

The top three most often heard questions were:

  1. When will you release an ebook version?
  2. Do you have any stickers?
  3. When is Niels joining VMware?

When will you release an ebook version?
We hope to get the ebook finalized after VMworld. Vacation time is coming up, and we also need to prep for VMworld (vSphere 6.5 Host Resources Deep Dive: Part 2 [SER1872BU]). It might happen sooner, but that depends on the process of creating an eBook itself. Unfortunately, it’s not as easy as sharing a PDF online. Please stay tuned.

Do you have any stickers?
We got you covered. We met up with our designer over at digitalmaterial.nl and explained our wishes. We received a lot of comments on the depth of the book. Such as the one from Duncan’s article Must have book: Host Resources Deep Dive:

As most of you know, I wrote the Clustering Deepdive series together with Frank, which means I kinda knew what to expect in terms of level of depth. Kinda, as this is a whole new level of depth. I don’t think I have ever seen (for example) topics like NUMA or NIC drivers explained at this level of depth. If you ask me, it is fair to say that Frank and Niels redefined the term “deep dive”.

So instead of snorkeling and hovering a bit below sea-level, we help you get into the depths of the material. What better way to express this than a divers helmet. We will bring 250 stickers to VMworld. First come first serve. If you can’t wait, download the 800 DPI PNG here and create one for yourself.

White Background

Transparent Background

I think the design rocks, so much that Niels and I decided to put it on some t-shirts as well. We are not backed by a vendor, so we can’t give away shirts. Similar to the book, we kept the price low. We created two campaigns, one for the US and one for EU.This allows you to get the order as fast as possible. The shirts and hoodies come in various colors.

When is Niels joining VMware?
I don’t know, he should though!

Memory-Like Storage Means File Systems Must Change – My Take

I’m an avid reader of thenextplatform.com. They always provide great insights into new technology. This week they published the article “Memory-Like Storage Means File Systems Must Change” and as usually full of good stuff. The focus of this article is about the upcoming non-volatile memory technologies that leverage the memory channel to provide incredible amounts of bandwidth to the storage medium. I can’t wait to see this happen and we can start to build systems with performance characteristics that weren’t conceivable a half a decade ago.

The article mentions 3D XPoint and Intel Apache Pass is the codename for 3D XPoint in DIMM format. It could be NVDIMM it could be something else. We don’t know yet. This article argues that storage systems need to change and I fully agree. If you consider the current performance overhead on recently released PCIe NVMe 3D XPoint devices, it is clear that the system and the software have the largest impact on latency. The solved the device characteristics pretty much; it’s now the PCIe bus and the software stack that delays the I/O. Moving to the memory bus makes sense. Less overhead and almost five times the bandwidth. For example, four-lane PCIe 3.0 provides a theoretical bandwidth of close to 4 GB/s while 2400 MHz memory has a peak transfer rate of close to 19 GB/s.

This sounds great and very promising, but I do wonder how will it impact memory operations. The key is to deliver an additional level of memory hierarchy, increasing capacity while abstracting the behavior of the new media.

It’s key to understand that memory is accessed after an L3 miss. It can spend a lot of time waiting on DRAM. A number often heard is that it can spend 19 out of every 20 instruction slots waiting on data from memory. This figure seems accurate as the latency of an instruction inside a CPU register is one ns while memory latency is close to 15 ns. Each core requires memory bandwidth, and this impacts the average memory bandwidth per core. Introducing a media that is magnitudes slower than DRAM can negatively affect the overall system performance. More cycles are wasted on waiting on memory media.

Please remember that not every workload is storage I/O bound. Great system design is not only about making I/O faster; it’s about removing bottlenecks in a balanced matter. It’s essential that the storage I/O should not interrupt DRAM traffic.

An analogy would be a car that can go 65MPH. The car in front of him drives 55 MPH. By selecting another lane, the slower car does not interfere anymore, and he can drive the speed he wants. The problem is in this lane cars typically drive 200 MPHs.

The key point for both NVDIMM as Intel Apache Pass is that adding storage on the memory bus to improve I/O latency should not interfere with DRAM operations.

This content is an excerpt of the upcoming vSphere 6.5 Host Resources Deep Dive book.

Impact of CPU Hot Add on NUMA scheduling

On a regular basis, I receive the question if CPU Hot-add impacts CPU performance of the VM. It depends on the vCPU configuration of the VM. CPU Hot-Add is not compatible with vNUMA, if hot-add is enabled the virtual NUMA topology is not exposed to the guest OS and this may impact application performance.

Please note that vNUMA topology is only exposed when the vCPU count of the VM exceeds the core count, thus if the ESXi host contains two CPU packages with 10 cores, the vNUMA topology is presented to the VM if the vCPU count equals 11 or more.

vNUMA in a Nutshell
The benefit of a wide-VM is that the guest OS is informed about the physical grouping of the vCPUs. In the example of a 12 vCPU VM on a dual-10 core system, the NUMA scheduler creates 2 virtual proximity domains (VPD) better know as NUMA-clients and distributes the 12 vCPUs equally across them. As a result, a load-balancing group is created containing 6 vCPUs that are scheduled on a physical CPU package. A load-balancing group is internally referred to as a physical proximity domain (PPD). Please note that the PPD does not determine the scheduling of vCPU on a specific HT or full core, a PPD can be seen as a vCPU to CPU affinity group

From a memory perspective, the guest OS is presented with a vNUMA node sized, separated address space. These address spaces are local to the subset of the vCPUs. As a result, a 12 vCPU 32 GB VM gets to detect a system with two NUMA nodes. Each NUMA node contains 6 CPUs and has a local address space of 16 GB. Contrary to popular belief vNUMA does not expose the full CPU and memory architecture, a better way to describe it that vNUMA shows a tailor-made world to the VM.

vNUMA to Physical mapping-1

But what happens when the VM is configured with less vCPUs than the core count of the physical CPU package and CPU Hot-Add is enabled? Will there be performance impact? And the answer is no. The VPD configured for the VM fits inside a NUMA node, and thus the CPU scheduler and the NUMA scheduler optimizes memory operations. It’s all about memory locality. Let’s make use of some application workload test to determine the behavior of the VMkernel CPU scheduling.

For this test, I’ve installed DVD Store 3.0 and ran some test loads on the MS-SQL server. To determine the baseline, I’ve logged in the ESXi host via an SSH session and executed the command: sched-stats -t numa-pnode. This command shows the CPU and memory configuration of each NUMA node in the system. This screenshot shows that the system is only running the ESXi operating system. Hardly any memory is consumed. TotalMem indicates the total amount of physical memory in the NUMA node in kb. FreeMem indicates the amount of free physical memory in the NUMA node in kb.

01-Unload-ESXi-Host

An 8 vCPU 32 GB VM is created with CPU hot add disabled. NUMA scheduler has selected NUMA node 1 for initial placement and the system consumes ~13759 MB (67108864-53019184=14089680/1024).

02-8vCPU

The command memstats -r vm-stats -s name:memSize:allocTgt:mapped:consumed:touched -u mb allows us to verify the VM memory consumption of the VM.

03-VM memstats

The numbers are a close match, please note that VM-stats does not include overhead memory and that the VMkernel can consume some additional overhead in the same NUMA node for other processes.

When hot-add is enabled (power down VM is necessary to enable this feature), nothing really changes. The memory for this VM is still allocated from a single NUMA node.

04-8vCPU-hot-add

To get a better understanding of the CPU scheduling constructs at play here, the following command provides detailed insight of all the NUMA related settings of the VM. (Command courtesy of Valentin Bondzio)

vmdumper -l | cut -d \/ -f 2-5 | while read path; do egrep -oi "DICT.*(displayname.*|numa.*|cores.*|vcpu.*|memsize.*|affinity.*)= .*|numa:.*|numaHost:.*" "/$path/vmware.log"; echo -e; done

05-vmdumper

It shows hot-add is enabled and the VM is configured with a single VPD that is scheduled on a single PPD. In normal language, the vCPUs of the VM are contained with a single physical NUMA node. It’s the responsibility of the NUMA scheduler that physical local memory is consumed. To verify if the VM is consuming local memory, Esxtop can be used (memory, f, NUMA stats). However sched-stats -t numa-clients provides me also a lot of insight

06-8vCPU-hot-add-numa-client

As a result, you can conclude that enabling hot-add on a NUMA system does not lead to performance degradation as long as the vCPU count does not exceed the core count of the CPU package. That means that hot-add can be enabled on VMs, but the instruction must be clear that adding vCPUs can happen up and to the threshold of the physical core count. After that point, the VM becomes a wide-VM and vNUMA comes into play. And in the case of CPU hot-add, its sidelined.

What’s the impact of disregarding the physical NUMA topology? The key lies within the message that’s entered in the VMware.log of the VM after boot.

07-Forcing UMA

The VMkernel is forced into using UMA, Unified Memory Access on a NUMA architecture. As a result, memory is interleaved between the two physical NUMA nodes. In essence, it’s load-balancing memory across two nodes, while ignoring the vCPU location. Let’s explore this behavior a bit more.
Christmas is coming early for this VM and it gets another 4 vCPUs. Hot-add is disabled again and thus vNUMA is full in play. The Vmdumper command reveals the following:

08-12vCPU-vNUMA

The vCPUs are split up in two virtual nodes (VPD0 & VPD1), each containing 6 vCPUs. After running the DVD Store query the following memory allocation happened:

09-Non-Uniform Memory Allocation

The guest OS (Windows 2012 R2) consumed some memory from node 1, SQL consumed all of its memory from node 0. For people intimate with SQL resource management this might be strange behavior and this is true. To display memory management at the VMkernel layer I had to restrict SQL to only run on a subset of CPUs. I’ve allowed SQL to run on the first 4 vCPUs. All these were mapped to CPUs located in NUMA node 0. The NUMA scheduler ensured these CPUs consumed local memory.

After powering down and enabling Hot-add the same test was run again. No NUMA architecture is exposed to the guest OS and therefore a single memory address space is used by Windows. The memory scheduler follows the rules of UMA and interleaves memory between the two physical nodes. And as the output shows, memory is consumed from both NUMA nodes in a very balanced manner. The problem is, the executing vCPUs are all located in NUMA node 0, therefore they have to fetch a lot of memory from remote, creating an inconsistent – less – performing application.

10-UMA

Conclusion
Hot-add great feature for when you stay within the confines of the CPU package but expect performance degradation, or at least inconsistent performance when going beyond the CPU core count.

This content will appear in the upcoming vSphere 6.5 Host Resources Deep Dive book I’m writing with Niels Hagoort (expected May time-frame). For updates about the book, please follow us on twitter @hostdeepdive or like our page on Facebook

Older posts

© 2017 frankdenneman.nl

Theme by Anders NorenUp ↑