Search Results for: machine learning

Home Lab Fundamentals: DNS Reverse Lookup Zones

June 13, 2016 by frankdenneman

When starting your home lab, all hints and tips are welcome. The community is full of wisdom, yet sometimes certain topics are taken for granted or are perceived as common knowledge. The Home Lab fundamentals series focusses on these subjects, helping you how to avoid common pitfalls that provide headaches and waste incredible amounts of time.
One thing we always keep learning about vSphere is that both time and DNS needs to be correct. DNS resolution is important to many vSphere components. You can go a long way without DNS and use IP-addresses within your lab, but at one point you will experience weird behavior or installs just stop without any clear explanation.In reality vSphere is build for professional environments where it’s expected that proper networking structure is in place, physical and logical. When reviewing a lot of community questions, blog posts and tweets, it appears that DNS is partially setup, i.e. only forward lookup zones are configured. And although it appears to be ”just enough DNS to get things going, many have experienced that their labs start to behave differently when no reverse lookup zones are present. Time-outs or delays are more frequent, the whole environment isn’t snappy anymore. Ill-configured DNS might give you the idea that the software is crap but in reality, it’s the environment that is just configured crappy. When using DNS, use the four golden rules; forward, reverse, short and full. DNS in a lab environment isn’t difficult to set up and if you want to simulate a proper working vSphere environment then invest time in setting up a DNS structure. It’s worth it! Besides expanding your knowledge, your systems will feel more robust and believe me, you will wait a lot less on systems to respond.
vCenter and DNS
vCenter inventory and search rely heavy on DNS. And since the introduction of vCenter Single Sign-On service (SSO) as a part of the vCenter Server management infrastructure DNS has become a crucial element. SSO is an authentication broker and security token exchange infrastructure. As described in the KB article Upgrading to vCenter Server 5.5 best practices (2053132);

With vCenter Single Sign-On, local operating system users become far less important than the users in a directory service such as Active Directory. As a result, it is not always possible, or even desirable, to keep local operating system users as authenticated users.

This means that you are somewhat pressured into using an ‘external’ identity source for user authentication, even for your lab environment . One of the most popular configurations is the use of Active Directory as an identity source. Active Directory itself uses DNS as the location mechanism for domain controllers and services. If you have configured SSO to use Microsoft Active Directory for authentication, you might have seen some weird behavior when you haven’t created a reverse DNS lookup zone.

Installation of vCenter Server (Appliance) fails if the FQDN and IP addresses used are not resolvable by the DNS server specified during the deployment process. The vSphere 6.0 Documentation Center vSphere DNS requirements state the following:

Ensure that DNS reverse lookup returns a Fully Qualified Domain Name (FQDN) when queried with the IP address of the host machine on which vCenter Server is installed. When you install or upgrade vCenter Server, the installation or upgrade of the Web server component that supports the vSphere Web Client fails if the installer cannot look up the fully qualified domain name of the vCenter Server host machine from its IP address. Reverse lookup is implemented using PTR records.

Before deploying vCenter I recommend to deploy a virtual machine on the first host running a DNS server. The ESXi Embedded Host Client allows you to deploy a virtual machine on an ESXi host without the need of having an operational vCenter first. As I use active Directory as identity source for authentication, I deploy a Windows AD server with DNS before deploying the vCenter Server Appliance (VCSA). Toms IT pro has a great article on how to configure DNS on a Windows 2012 server, but if you want to configure a lightweight DNS server running on Linux, follow the steps Brandon Lee has documented. If you want to explore the interesting world of DNS, you can also opt to use Dynamic DNS to automatically register both the VCSA and ESXi hosts in the DNS server. Dynamic DNS registration is the process by which a DHCP client register its DNS with a name server. For more information please check out William article “Does ESXi Support DDNS (Dynamic DNS)?” . Although he published it in 2013. it’s still a valid configuration in ESXi 6.0.
Flexibility of using DNS
Interestingly enough, having a proper DNS structure in place before deploying the virtual infrastructure provides future flexibility. One of the more annoying time wasters is the result of using an IP address instead of an FQDN during setup of the VCSA. When you use only an IP-address instead of a Fully Qualified Domain Name (FQDN) during setup, changing the hostname or IP-address will produce this error:
IPv4 configuration for nic0 of this node cannot be edited post deployment.
Kb article 2124422 states the following:

Attempting to change the IP address of the VMware vCenter Server Appliance 6.0 fails with the error: IPv4 configuration for nic0 of this node cannot be edited post deployment. (2124422)

This occurs when the VMware vCenter Server Appliance 6.0 is deployed using an IP address. During the initial configuration of the VMware vCenter Server Appliance, the system name is used as the Primary Network Identifier. If the Primary Network Identifier is an IP address, it cannot be changed after deployment.
This is an expected behavior of the VMware vCenter Server Appliance 6.0. To change the IP address for the VMware vCenter Server Appliance 6.0 that was deployed using an IP address, not a Fully Qualified Domain Name, you must redeploy the appliance with the new IP address information.

Changing the hostname will result in the Platform Service Controller (responsible for SSO) to fail. According to Kb article:

Changing the IP address or host name of the vCenter Server or Platform Service controller cause services to fail (2130599)

Changing the Primary Network Identifier (PNID) of the vCenter Server or PSC is currently not supported and will cause the vSphere services to fail to start. If the vCenter Server or PSC has been deployed with an FQDN or IP as the PNID, you will not be able to change this configuration.
To resolve this issue, use one of these options:

Revert to a snapshot or backup prior to the IP address or hostname change.

Redeploy the vSphere environment.

This means that you cannot change the IP-address or the host name of the vCenter Appliance. Yet another reason to deploy a proper DNS structure before deploying your VCSA in your lab.
FQDN and vCenter permissions
Even when you have managed to install vCenter without a reverse lookup zone, the absence of DNS pointer records can obstruct proper permission configuration according to (KB article 2127213)

Unable to add Active Directory users or groups to vCenter Server Appliance or vRealize Automation permissions

Attempting to browse and add users to the vCenter Server permissions (Local Permission: Hosts and Clusters > vCenter >Manage >Permissions)(Global Permissions: Administration > Global Permissions) fails with the error:

Cannot load the users for the selected domain

A workaround for this issue is to ensure that all DNS servers have the Reverse Lookup Zone configured as well as Active Directory Domain Controller (AD DC) Pointer (PTR) records present. Please note that allowing domain authentication (assuming AD) on the ESXi host does not automatically add it to an AD managed DNS zone. You’ll need to manually create the forward lookup (which will give the option for the reverse lookup creation too).
SSH session password delay
When running multiple hosts most of you will recognize the waste of time when (quickly) wanting to log into ESXi via an SSH session. Typically this happens when you start a test and you want to monitor ESXTOP output. You start your ssh session, to save time you type on the command line ssh root@esxi.homelab.com and then you have to wait more than 30 seconds to get a password prompt back. Especially funny when you are chasing a VM and DRS decided to move it to another server when you weren’t paying attention. To get rid of this annoying time waster forever:

DNS name resolution using nslookup takes up to 40 seconds on an ESXi host(KB article 2070192)

When you do not have a reverse lookup zone configured, you may experience a delay of several seconds when logging in to hosts via SSH.

When you’re management machine is not using the same DNS structure, you can apply the quick hack of adding “useDNS no” to the /etc/ssh/sshd.config file on the ESXi host to avoid the 30-second password delay.
Troubleshoot DNS
BuildVirtual.net published an excellent article on how to troubleshoot ESXi Host DNS and Routing related issues. For more information about setting the DNS configuration from the command line, review this section of the VMware vSphere 6.0 Documentation Center
vSphere components moving away from DNS
As DNS is an extra dependency, a lot of newer technologies try to avoid incorporate DNS dependencies. One of those is VMware HA. HA has been redesigned and the new FDM architecture avoided DNS dependencies. Unfortunately not all VMware official documentation has been updated with this notion: https://kb.vmware.com/kb/1003735 states that ESX 5.x also has this problem but that is not true. Simply put, VMware HA in vSphere 5.x and above does not depend on DNS for operations or configurations.
Home Lab Fundamentals Series:

Up next in this series: vSwitch0 routing

Insights into VM density

February 15, 2016 by frankdenneman

For the last 3 months my main focus within PernixData has been (and still is) the PernixCloud program. In short PernixData Cloud is the next logical progression of PernixData Architect and provides visibility into and analytics of virtual datacenters, it’s infrastructure, and it’s applications. By providing facts on the various elements of the virtual infrastructure, architects and administrators can design their environment in a data-driven way.
The previous article “Insights into CPU and Memory configurations of ESXi hosts” zoomed in to the compute configuration of 8000 ESXi hosts and helped us understand which is the most popular system in today’s datacenter running VMware vSphere. The obvious next step was to determine the average number of virtual machines on these systems. Since that time the dataset has expanded and it now contains data of more than 25.000 ESXi hosts.
An incredible dataset to explore I can tell you and it’s growing each day. Learning how to deal with these vast quantities of data is incredibly interesting. Extracting various metrics from a dataset this big is challenging. Most commercial tools are not designed to cope with this amount of data, thus you have to custom build everything. And with the dataset growing at a rapid pace, you are constantly exploring the boundaries of what’s possible with software and hardware.
VM Density
After learning what system configurations are popular, you immediately wonder how many virtual machines are running on that system. But what level of detail do you want to know? Will this info be useable for architects and administrators to compare their systems and practical to help them design their new datacenter?
One of the most sought after question is the virtual CPU to physical CPU ratio. A very interesting one, but unfortunately to get a result that is actually meaningful you have to take multiple metrics into account. Sure you can map out the vCPU to pCPU ratio, but how do you deal with the fact of oversizing of virtual machines that has been happing since the birth of virtualization? What about all these countless discussions whether the system only needs a single or double CPU because it’s running a single threaded program? How many times have you heard the remark that the vendor explicitly states that the software requires at least 8 CPU’s? Therefor you need to add utilization of CPU to get an accurate view, which in turn leads to the question what timeframe you need to use to understand whether the VM is accurately sized or whether the vCPUs are just idling most of the time? You are now mixing static data (inventory) and transient data (utilization). Same story applies for memory.
In consequence I focused just on the density of virtual machines per host. The whole premise of virtualization is to exploit the variation of activity of applications, combined with distribution mechanisms as DRS and VMturbo you can argue that virtual and physical compute configurations will be matched properly. Therefor it’s interesting to see how far datacenters stretch their systems and understand the consolidation ratio of virtual machines. Can we determine a sweet spot of the number of virtual machines per host?
The numbers
Discovered earlier, dual socket systems are the most popular system configuration in the virtual datacenters, therefor I focused on these systems only. With the dataset now containing more than 25.000 ESXi hosts, it’s interesting to see what the popular CPU types are.

The popular systems contained in total 12, 16, 20 and 24 cores. Therefor the popular CPU’s of today are 6, 8, 10 and 12 cores. But since we typically see a host as a “closed” system and trust on the host local CPU scheduler to distribute the vCPUs amongst the available pCPUs, all charts use the total cores per system instead of on a per-CPU basis. For example a 16 cores system is ESXi host containing two 8 cores CPUs.
Before selecting a subset of CPU configurations let’s determine the overall distribution of VM density.

Interesting to see that it’s all across the board, VM density ranging from 0-10 VM’s per host up to more than 250. There were some outliers, but I haven’t included them. One system runs over 580 VM’s, this system contains 16 cores and 192 GB. Let’s dissect the VM density per CPU config.
Dissecting it per CPU configuration

Instead of focusing on all dual socket CPU configurations, I narrowed it down to three popular configurations. The 16 core config as it’s the most popular today, and the 20 to 24 core as I expect this to be the configuration as the default choice for new systems this year. This allows us to compare the current systems in today’s datacenter to the average number and help you to understand what VM density possible future systems run.
Memory
Since host memory is an integral part of providing performance to virtual machines, it’s only logical to determine VM density based on CPU and Memory configurations. What is the distribution of memory configuration of dual socket systems in today’s virtual datacenters?

Memory config and VM density of 16 cores systems
30% of all 16 cores ESXi hosts is equipped with 384GB of memory. Within this configuration, 21 to 30 VMs is the most popular VM density.

Memory config and VM density of 20 cores systems
50% of all 20 cores ESXi hosts is equipped with 256GB of memory. Within this configuration, 31 to 40 VMs is the most popular VM density.

Interesting to see that these systems, on average, have to cope with less memory per core than the 16 cores system (24GB per core versus 12,8GB per core)
Memory config and VM density of 24 cores systems
39% of all 24 cores ESXi hosts is equipped with 384GB of memory. Within this configuration, 101 to 150 VMs is the most popular VM density.

101 to 150 VM’s sound like a VDI platform usage. Are these systems the sweetspot for virtual desktop environments?
Conclusion
Not only do we have an actual data on VM density now, other interesting facts were discovered as well. When I was crunching these numbers one thing that stood out to me was memory configurations used. Most architects I speak with tend to configure the hosts with as much memory as possible and swap out the systems when their financial lifespan has ended. However I seen some interesting facts, for example memory configurations such as 104 GB or 136 GB per system. How do you even get 104GB of memory in such a system, did someone actually found a 4GB DIMM laying around and decided to stick it in th system? More memory = better performance right? Please ready my memory deepdive series on how this hurts your overall performance. But I digress. Another interesting fact is that 4% of all 24 cores systems in our database are equipped with 128GB of memory. That is an average of 5,3 GB per core, 64GB per NUMA node. Which immediately raises questions such as average host memory per VM or VM density per NUMA node. The more we look at data, the more questions arise. Please let me know what questions you have!

Don’t backup. Go forward with Rubrik

March 24, 2015 by frankdenneman

Rubrik has set out to build a time machine for cloud infrastructures. I like the message as it shows that they are focused on bringing simplicity to the enterprise backup world. Last week I had the opportunity to catch up with them and they had some great news to share as they were planning to come out of stealth this week. And that day is today.
Rubrik platform
This time machine is delivered on a 2U commodity appliance that runs the Rubrik software. By installing this appliance you greatly reduce the number of machines that are necessary to provide backup and restore services today. Reducing the number of machines simplifies the infrastructure for architects and support, while the UI of Rubrik simplifies the day-to-day operations of the administrators.
User Interface
No agents are needed in the virtual datacenters to discover the workload and the user interface is centered on policy driven SLAs. Unfortunately I can’t show the user-interface, but trust me this is something you longed for a long time. Due to the pedigree of the co-founders it comes as no surprise that the Rubrik platform is fully programmable with REST API’s.
Typically moving to a new backup system introduces risk and cost. Learning curves are high, misconfigured backup configurations possibly risking data loss. Policy driven and the ability to use REST APIs ensure that the platform easily integrates in every environment. The policies are so easy to use that no training is necessary; this reduces the impact of transition to a new backup system. The low learning curve means that no countless hours are lost by figuring out how to safely backup your data, while the REST APIs allow advanced tech crews to integrate Rubrik in their highly automated service offerings.
Architecture
One thing that made me very happy to see is the Rubriks’ ability to “cloud-out” your data. Rubrik provides a gateway to AWS allowing you to send “aged data” to the cloud in a very secure way. This feature benefits the complexity reduction of local architecture. Instead of having to incorporate a tape library, you now only need an Internet connection. Having worked with a big tape libraries myself for years I know this will not only bring a lot of datacenter space back and reduce your energy bill, you WILL have way less heat to cool.
As the team understand the concept of distributed architectures thoroughly (more about that in the next paragraph) it doesn’t come as a surprise that it scales very well. The architecture can scale to 1000s of nodes. What’s interesting is that it can mount the snapshots directly on the Rubrik platform allowing virtual machines to run directly on the appliance. Think about the possibilities for development. Snapshot your current production workload and test your new code instantly without any impact on active services.
Rubrik starts off by supporting VMware vSphere and it makes sense to focus on the biggest market out there as a startup. But support for other hypervisors and cloud infrastructures (to cloud-out data) will follow.
I expect Rubrik to become a success, the product aligns with the todays enterprise datacenter requirements and the pedigree of the team is amazing. As mentioned before the co-founders have a very rich background in distributed systems, Arvind Jain was the founding Engineer of Riverbed and was a Distinguished Engineer at Google before co-founding the other three members. Interesting enough (for me at least) there are stong ties with PernixData. Prior to Rubrik, Bipul Sinha was a partner at LightSpeed before founding Rubrik. I had the great pleasure of talking to Bipul often, as he is the initial investor of PernixData. Funny enough I can recall a conversation where Bipul asked me on my view of the backup world. I believe boring and totally not sexy was my initial reply. Guess he is setting out to change that fast! The CTO of Rubrik, Arvind Nithrakashyap, worked at Oracle where he co-founded Oracle Exadata. The other Co-founder of Exadata is PernixData CEO Poojan Kumar. Last but certainly not least Soham Mazumdar, who worked at Google on the search engine and founded Tagtile.
As of today you can sign-up for the early access program. Go visit the website to read more and follow them on twitter.
Exciting times ahead for Rubrik! Don’t Backup. Go Forward!