<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Sizing VMs and NUMA nodes</title>
	<atom:link href="http://frankdenneman.nl/2010/02/sizing-vms-and-numa-nodes/feed/" rel="self" type="application/rss+xml" />
	<link>http://frankdenneman.nl/2010/02/sizing-vms-and-numa-nodes/</link>
	<description></description>
	<lastBuildDate>Sun, 25 Jul 2010 20:31:07 +0200</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: VMware ESX &#38; ESXi Error &#8211; Can&#8217;t boot system as genuine NUMA &#124; TechHead.co.uk</title>
		<link>http://frankdenneman.nl/2010/02/sizing-vms-and-numa-nodes/comment-page-1/#comment-777</link>
		<dc:creator>VMware ESX &#38; ESXi Error &#8211; Can&#8217;t boot system as genuine NUMA &#124; TechHead.co.uk</dc:creator>
		<pubDate>Sun, 25 Jul 2010 20:31:07 +0000</pubDate>
		<guid isPermaLink="false">http://frankdenneman.nl/?p=620#comment-777</guid>
		<description>[...] of physical memory to which a VM and its vCPUs and memory are then allocated to.&#160; There is an excellent article from Frank Denneman which covers the topic of sizing CPU and NUMA nodes in which he goes into a good level of detail [...]</description>
		<content:encoded><![CDATA[<p>[...] of physical memory to which a VM and its vCPUs and memory are then allocated to.&#160; There is an excellent article from Frank Denneman which covers the topic of sizing CPU and NUMA nodes in which he goes into a good level of detail [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sean Clark</title>
		<link>http://frankdenneman.nl/2010/02/sizing-vms-and-numa-nodes/comment-page-1/#comment-646</link>
		<dc:creator>Sean Clark</dc:creator>
		<pubDate>Wed, 16 Jun 2010 22:18:04 +0000</pubDate>
		<guid isPermaLink="false">http://frankdenneman.nl/?p=620#comment-646</guid>
		<description>Frank,
Great article.  I&#039;m looking for any whitepapers that has been done to show the difference in VM performance when accessing all local memory of a NUMA node versus VMs that have to access all remote memory or a mix.  Have you encountered such a study?  I think it would be helpful to make the decision on when and where to use 8 vCPU VMs.  Based on my understanding of NUMA systems, ideally, you would want to have ESX servers with 8 core or 12 core processors if you want to offer 8-vCPU VMs.  Otherwise, if I offer 8-vCPU VMs but only have NUMA nodes with 4 cores, my VM would never experience the full benefit of NUMA optimizations.  

-Sean Clark</description>
		<content:encoded><![CDATA[<p>Frank,<br />
Great article.  I&#8217;m looking for any whitepapers that has been done to show the difference in VM performance when accessing all local memory of a NUMA node versus VMs that have to access all remote memory or a mix.  Have you encountered such a study?  I think it would be helpful to make the decision on when and where to use 8 vCPU VMs.  Based on my understanding of NUMA systems, ideally, you would want to have ESX servers with 8 core or 12 core processors if you want to offer 8-vCPU VMs.  Otherwise, if I offer 8-vCPU VMs but only have NUMA nodes with 4 cores, my VM would never experience the full benefit of NUMA optimizations.  </p>
<p>-Sean Clark</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Anton Zhbankov</title>
		<link>http://frankdenneman.nl/2010/02/sizing-vms-and-numa-nodes/comment-page-1/#comment-570</link>
		<dc:creator>Anton Zhbankov</dc:creator>
		<pubDate>Thu, 27 May 2010 09:20:21 +0000</pubDate>
		<guid isPermaLink="false">http://frankdenneman.nl/?p=620#comment-570</guid>
		<description>Frank, russian translation of the article: http://blog.vadmin.ru/2010/05/numa.html</description>
		<content:encoded><![CDATA[<p>Frank, russian translation of the article: <a href="http://blog.vadmin.ru/2010/05/numa.html" rel="nofollow">http://blog.vadmin.ru/2010/05/numa.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Disabling TPS hurting performance? » Yellow Bricks</title>
		<link>http://frankdenneman.nl/2010/02/sizing-vms-and-numa-nodes/comment-page-1/#comment-539</link>
		<dc:creator>Disabling TPS hurting performance? » Yellow Bricks</dc:creator>
		<pubDate>Tue, 11 May 2010 15:01:42 +0000</pubDate>
		<guid isPermaLink="false">http://frankdenneman.nl/?p=620#comment-539</guid>
		<description>[...] be shared between NUMA nodes. Another thing that Frank Denneman already described in his article here is that when memory pages are allocated remotely there is a memory penalty associated with it. (Did [...]</description>
		<content:encoded><![CDATA[<p>[...] be shared between NUMA nodes. Another thing that Frank Denneman already described in his article here is that when memory pages are allocated remotely there is a memory penalty associated with it. (Did [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Simon Seagrave</title>
		<link>http://frankdenneman.nl/2010/02/sizing-vms-and-numa-nodes/comment-page-1/#comment-381</link>
		<dc:creator>Simon Seagrave</dc:creator>
		<pubDate>Sun, 21 Mar 2010 10:38:04 +0000</pubDate>
		<guid isPermaLink="false">http://frankdenneman.nl/?p=620#comment-381</guid>
		<description>Hi Frank,

A great article on a topic that is often unknown or overlooked by many.

Keep up the good work!  

Cheers,


Simon (TechHead)</description>
		<content:encoded><![CDATA[<p>Hi Frank,</p>
<p>A great article on a topic that is often unknown or overlooked by many.</p>
<p>Keep up the good work!  </p>
<p>Cheers,</p>
<p>Simon (TechHead)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Re: Memory Compression &#124; blindpete.com</title>
		<link>http://frankdenneman.nl/2010/02/sizing-vms-and-numa-nodes/comment-page-1/#comment-309</link>
		<dc:creator>Re: Memory Compression &#124; blindpete.com</dc:creator>
		<pubDate>Tue, 02 Mar 2010 13:51:18 +0000</pubDate>
		<guid isPermaLink="false">http://frankdenneman.nl/?p=620#comment-309</guid>
		<description>[...] &#8211; By default there is no inter node transparent page sharing (read Frank&#8217;s article for more info on this [...]</description>
		<content:encoded><![CDATA[<p>[...] &#8211; By default there is no inter node transparent page sharing (read Frank&#8217;s article for more info on this [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Re: Memory Compression » Yellow Bricks</title>
		<link>http://frankdenneman.nl/2010/02/sizing-vms-and-numa-nodes/comment-page-1/#comment-308</link>
		<dc:creator>Re: Memory Compression » Yellow Bricks</dc:creator>
		<pubDate>Tue, 02 Mar 2010 13:40:00 +0000</pubDate>
		<guid isPermaLink="false">http://frankdenneman.nl/?p=620#comment-308</guid>
		<description>[...] &#8211; By default there is no inter node transparent page sharing (read Frank&#8217;s article for more info on this [...]</description>
		<content:encoded><![CDATA[<p>[...] &#8211; By default there is no inter node transparent page sharing (read Frank&#8217;s article for more info on this [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: frankdenneman</title>
		<link>http://frankdenneman.nl/2010/02/sizing-vms-and-numa-nodes/comment-page-1/#comment-289</link>
		<dc:creator>frankdenneman</dc:creator>
		<pubDate>Tue, 23 Feb 2010 09:55:14 +0000</pubDate>
		<guid isPermaLink="false">http://frankdenneman.nl/?p=620#comment-289</guid>
		<description>Thanks,

Can you tell me a bit more about the memory pressure? How are the VMs configured (memory wise)?
A customer of mine uses DL785 as well and they don&#039;t have a problem, they don&#039;t create a VM with more than 4GB.</description>
		<content:encoded><![CDATA[<p>Thanks,</p>
<p>Can you tell me a bit more about the memory pressure? How are the VMs configured (memory wise)?<br />
A customer of mine uses DL785 as well and they don&#8217;t have a problem, they don&#8217;t create a VM with more than 4GB.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brandon</title>
		<link>http://frankdenneman.nl/2010/02/sizing-vms-and-numa-nodes/comment-page-1/#comment-288</link>
		<dc:creator>Brandon</dc:creator>
		<pubDate>Tue, 23 Feb 2010 08:15:35 +0000</pubDate>
		<guid isPermaLink="false">http://frankdenneman.nl/?p=620#comment-288</guid>
		<description>Good article, I stumbled accross it because Im finding ESX isnt so great at balancing out a NUMA box when its only moderately load (seems OK when there is a decent load on the ESX hardware).  

On our 24 core bl685 (4 x 6), we find that NUMA nodes 0 and 1 are pretty busy (unfortunately resulting in elevated cpu ready times on the VMS), whilst NUMA nodes 2 and 3 are almost unused. 

I have a colleague who as a 32 core (dl785) farm with similar issues.  ESX seems to have a weakness when it comes to balancing lightly loaded NUMA boxes, esx4 seems less willing to balance out a box compared with 3.5...</description>
		<content:encoded><![CDATA[<p>Good article, I stumbled accross it because Im finding ESX isnt so great at balancing out a NUMA box when its only moderately load (seems OK when there is a decent load on the ESX hardware).  </p>
<p>On our 24 core bl685 (4 x 6), we find that NUMA nodes 0 and 1 are pretty busy (unfortunately resulting in elevated cpu ready times on the VMS), whilst NUMA nodes 2 and 3 are almost unused. </p>
<p>I have a colleague who as a 32 core (dl785) farm with similar issues.  ESX seems to have a weakness when it comes to balancing lightly loaded NUMA boxes, esx4 seems less willing to balance out a box compared with 3.5&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Collin C. MacMillan</title>
		<link>http://frankdenneman.nl/2010/02/sizing-vms-and-numa-nodes/comment-page-1/#comment-245</link>
		<dc:creator>Collin C. MacMillan</dc:creator>
		<pubDate>Mon, 15 Feb 2010 14:28:52 +0000</pubDate>
		<guid isPermaLink="false">http://frankdenneman.nl/?p=620#comment-245</guid>
		<description>Frank:

Good article, but a clarification needs to be noted. In your Pitfall #2 section, you lead with the following statement:

&quot;Typically each socket will get assigned the same amount of memory; the physical memory (minus service console memory) is divided between the sockets. For example 16GB will be assigned to each NUMA node on a two socket server with 32GB total physical.&quot;

The above reads as if some logical apportionment of memory takes place in NUMA systems. Node memory is typically allocated on a PHYSICAL basis, and assigned to the NUMA node based on the DIMM slot/bank configuration of the system board . In 2P systems, one bank of slots is mastered by a single CPU (local) and the amount of memory &quot;assigned&quot; to that node is entirely dependent on the number of slots filled and size of the DIMM. Each CPU&#039;s memory node is likewise constrained. (Note: some size-constrained NUMA systems provide memory slots to only one CPU node.)

This distinction is necessary because non-uniform distribution of memory across the nodes in a NUMA system can be a physical provisioning issue - sometimes an intentional one. Someone asked about the penalty realized in remote node memory: for testing purposes that question can be easily answered by assigning (physically) 100% of memory to a single CPU&#039;s bank (usually CPU0) and using CPU affinity to determine the performance penalty for remote memory access (move the VM from CPU0 to CPU1 and compare the difference running stream, or another memory benchmark, etc.)

Where the physical allocation distinction is also important is in memory configurations that may ship from vendors without regard to NUMA balancing. For instance, a 32GB (8x 4GB DIMM) configuration could be shipped in the following configurations (DPC = DIMM per channel):

CPU0 = 3DPC x 2-Channel plus 2DPC x 1-channel and CPU1 = 0 DIMM
CPU0 = 2DPC x 3-Channel and CPU1 = 2DPC x 1-Channel
CPU0 = 2DPC x 3-Channel and CPU1 = 1DPC x 2-Channel
CPU0 = 2DPC x 2-Channel and CPU1 = 2DPC x 2-Channel

While more permutations exist, only one of them balanced (i.e. 50-50 distribution of memory). The other configurations, while valid, result in different system assumptions and VM placement. For instance, the 2DPC x 3-channel node will have a greater memory bandwidth than the 2- or 1-channel configurations. Placement (affinity) of VM&#039;s in this node would result in better performance for memory-bound workloads. It is not uncommon to find systems shipped with CPU0 full and CPU1 partially populated.

While NUMA allows for memory imbalances to be handled gracefully in ESX, there may be hidden costs as you point out in terms of remote-node memory latency and the performance ramifications therein. Economics could suggest an intentional imbalanced node configuration is necessary - i.e. placing 72GB (8GB, 3DPC x 3-channel)) in one &quot;super&quot; node (accommodating a 64GB VM) and 36GB (4GB, 3DPC x 3-channel) in another (several 2-6GB VMs). 

The above &quot;real world&quot; example would result in a savings of about $3,200 in CAPEX - per system - versus the &quot;balanced&quot; configuration of 18x 8GB DIMMs. ESX&#039;s NUMA balancing algorithms should be able to properly place those VM&#039;s needing the larger contiguous bank of memory in the proper CPU node. In a system with many 48-64GB VM&#039;s in such a confederation would likely need a DRS anti-affinity rule(s) to prohibit co-placement across a cluster. However, combined with ESX&#039;s NUMA scheduler, the &quot;optimal&quot; balance of performance should be automatically achieved...

Like you said, ESX servers are not black boxes, and understanding the system architecture is key in extracting performance and economies of scale. As multi-node, multi-hop systems come on-line (AMD Magny-Cours, Intel&#039;s Nehalem-EX, etc.) in 2010, ESX admins will explore intentionally non-balanced &quot;super node&quot; memory configurations to enable placement of &quot;outlier&quot; VMs without wrecking CAPEX models. Do you happen to know how Xen/Hyper-V would handle the same NUMA situation?</description>
		<content:encoded><![CDATA[<p>Frank:</p>
<p>Good article, but a clarification needs to be noted. In your Pitfall #2 section, you lead with the following statement:</p>
<p>&#8220;Typically each socket will get assigned the same amount of memory; the physical memory (minus service console memory) is divided between the sockets. For example 16GB will be assigned to each NUMA node on a two socket server with 32GB total physical.&#8221;</p>
<p>The above reads as if some logical apportionment of memory takes place in NUMA systems. Node memory is typically allocated on a PHYSICAL basis, and assigned to the NUMA node based on the DIMM slot/bank configuration of the system board . In 2P systems, one bank of slots is mastered by a single CPU (local) and the amount of memory &#8220;assigned&#8221; to that node is entirely dependent on the number of slots filled and size of the DIMM. Each CPU&#8217;s memory node is likewise constrained. (Note: some size-constrained NUMA systems provide memory slots to only one CPU node.)</p>
<p>This distinction is necessary because non-uniform distribution of memory across the nodes in a NUMA system can be a physical provisioning issue &#8211; sometimes an intentional one. Someone asked about the penalty realized in remote node memory: for testing purposes that question can be easily answered by assigning (physically) 100% of memory to a single CPU&#8217;s bank (usually CPU0) and using CPU affinity to determine the performance penalty for remote memory access (move the VM from CPU0 to CPU1 and compare the difference running stream, or another memory benchmark, etc.)</p>
<p>Where the physical allocation distinction is also important is in memory configurations that may ship from vendors without regard to NUMA balancing. For instance, a 32GB (8x 4GB DIMM) configuration could be shipped in the following configurations (DPC = DIMM per channel):</p>
<p>CPU0 = 3DPC x 2-Channel plus 2DPC x 1-channel and CPU1 = 0 DIMM<br />
CPU0 = 2DPC x 3-Channel and CPU1 = 2DPC x 1-Channel<br />
CPU0 = 2DPC x 3-Channel and CPU1 = 1DPC x 2-Channel<br />
CPU0 = 2DPC x 2-Channel and CPU1 = 2DPC x 2-Channel</p>
<p>While more permutations exist, only one of them balanced (i.e. 50-50 distribution of memory). The other configurations, while valid, result in different system assumptions and VM placement. For instance, the 2DPC x 3-channel node will have a greater memory bandwidth than the 2- or 1-channel configurations. Placement (affinity) of VM&#8217;s in this node would result in better performance for memory-bound workloads. It is not uncommon to find systems shipped with CPU0 full and CPU1 partially populated.</p>
<p>While NUMA allows for memory imbalances to be handled gracefully in ESX, there may be hidden costs as you point out in terms of remote-node memory latency and the performance ramifications therein. Economics could suggest an intentional imbalanced node configuration is necessary &#8211; i.e. placing 72GB (8GB, 3DPC x 3-channel)) in one &#8220;super&#8221; node (accommodating a 64GB VM) and 36GB (4GB, 3DPC x 3-channel) in another (several 2-6GB VMs). </p>
<p>The above &#8220;real world&#8221; example would result in a savings of about $3,200 in CAPEX &#8211; per system &#8211; versus the &#8220;balanced&#8221; configuration of 18x 8GB DIMMs. ESX&#8217;s NUMA balancing algorithms should be able to properly place those VM&#8217;s needing the larger contiguous bank of memory in the proper CPU node. In a system with many 48-64GB VM&#8217;s in such a confederation would likely need a DRS anti-affinity rule(s) to prohibit co-placement across a cluster. However, combined with ESX&#8217;s NUMA scheduler, the &#8220;optimal&#8221; balance of performance should be automatically achieved&#8230;</p>
<p>Like you said, ESX servers are not black boxes, and understanding the system architecture is key in extracting performance and economies of scale. As multi-node, multi-hop systems come on-line (AMD Magny-Cours, Intel&#8217;s Nehalem-EX, etc.) in 2010, ESX admins will explore intentionally non-balanced &#8220;super node&#8221; memory configurations to enable placement of &#8220;outlier&#8221; VMs without wrecking CAPEX models. Do you happen to know how Xen/Hyper-V would handle the same NUMA situation?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
