• Skip to primary navigation
  • Skip to main content

frankdenneman.nl

  • AI/ML
  • NUMA
  • About Me
  • Privacy Policy

Database workload characteristics and their impact on storage architecture design – part 1

September 23, 2014 by frankdenneman

Frequently PernixData FVP is used to accelerate databases. Databases are for many a black box solution. Sure we all know they consume resources like there is no tomorrow, but can we make some general statements about database resource consumption from a storage technology perspective? I asked Bala Narasimhan, our director of Products, a couple of questions to get a better understanding about the database operations and how FVP can help to provide the performance the business needs.
BalaThe reason why I asked Bala about databases is because of his rich background in database technology. After spending some time at HP writing kernel memory management software, he moved to Oracle and was responsible for memory SGA and PGA. One of his proudest achievements was to build the automatic memory management in 10G. He then went on and worked at a startup where he rewrote the open source database, Postgres, to be a scale out, columnar relational databases for data warehousing and analytics. Bala recently recorded a webinar eliminate performance bottlenecks in virtualized Databases. Bala’s twitter account can be found here. As the topic databases is an extensive one, the article is split up into a series of smaller articles, making it more digestible.

Question 1: What are the various databases use cases one typically sees?

There is a spectrum of use cases, with OLTP, Reporting, OLAP and analytics being the common ones. Reporting, OLAP (online analytical processing) and Analytics can be seen as a part of the data warehousing family. OLTP (online transaction processing) databases are typically aligned with a single application and acts as an input source for data warehouses. Therefore a data warehouse can be seen as a layer on top of the OLTP database optimized for reporting and analytics.
When you deal with setting up architectures for databases you have to ask yourself, what do you try to solve? What is technical requirement of the workload? Is it latency intensive, do you retrieve or do you want to read a lot of data as fast as possible? Is the application latency sensitive or throughput bound? Meaning that if you go from left to right in the table on average the block size grows. Hint: the larger the block size means that on average you are dealing with a more throughput bound workload instead of a latency sensitive block size. From left to right the database design go from normalized to denormalized.

OLTP Reporting OLAP Analytics
Database Schema Design

OLTP is an excellent example of a normalized schema. A database schema can be seen as a container objects and allows to logically group objects such as tables, views and stored procedures. When using a normalized schema you start to split a table into smaller tables. For example, lets assume a bank database has only one table that logs all activities by all its customers. This means that there are multiple rows in this table for each customer. Now if a customer updates her address you need to update many rows in the database for the database to be consistent. This can have a impact on the performance and concurrency of the database. Instead of this, you could build out a schema for the database such that there are multiple tables and there is only one table that has customer details in it. This way when the customer changes her address you only need to update one row in this table and this improves concurrency and performance If you normalize your database enough every insert, delete and update statement will only hit a single table, very small updates that require fast responds, therefor small blocks, very latency sensitive.
While OLTP databases tend to be normalized, data warehouses tend to be denormalized and therefore have lesser number of tables. For example, when querying the DB to find out who owns account 1234, it needs to join two tables, the Account-table with the Customer-table. In this example it is a two way join but it is possible for data warehousing systems to do many way joins (that is, joining multiple tables at once) and these are generally throughput bound.

Business Processes

An interesting way to look at the databases is its place in a business process. This provides you insight about the availability, concurrency and response requirements of the database. Typically OLTP databases are at the front of the process, customer-facing process, dramatically put they are in the line of fire. You want to have fast response, you want to read, insert and update data as fast as possible therefore the database are heavily normalized for reasons described above. When the OLTP database is performing slow or is unavailable it will typically impact revenue-generating processes. Data warehousing operations generally occur away from customer facing operations. Data is typically loaded into the data warehouse from multiple sources to provide the business insights into its day-to-day operations. For example, a business may want to understand from its data how it can drive quality and cost improvements. While we talk about a data warehouse as a single entity this is seldom the case. Many times you will find that a business has one large data warehouse and many so called ‘data marts’ that hang from it. Database proliferation is a real problem in the enterprise and managing all these databases and providing them the storage performance they need can be challenging.
Let’s dive into the four database types to understand their requirements and the impact on architecture design:

OLTP

OLTP workloads have a good mix of read and write operations. It is latency sensitive, and it requires the support for high levels of concurrency. When talking about concurrency a good example are ATM machines. Each customer at an ATM machine is generating a connection doing a few simple instructions, however a bank typically has a lot of ATM machines servicing its many customers concurrently. If a customer wants to withdraw money, the process needs to read the records of the customer in the database. It needs to confirm that he or she is allowed to withdraw the money, and then it needs to record (write) the transaction. In DBA jargon that is a SQL SELECT statement followed by an UPDATE statement. A proper OLTP database should be able to handle a lot of users at the same time preferably with a low latency. It’s interactive in nature, meaning that latency impacts user experience. You cannot keep the customer waiting for a long time at the ATM machine or a bank teller. From an availability perspective you cannot afford to have the database go down, the connections cannot be lost, it just needs to be up and running all the time (24×7).

OLTP Reporting OLAP Analytics
Availability +++
Concurrency +++
Latency sensitivity +++
Throughput oriented +
Ad hoc +
I/O Operations Mix R/W
Reporting

Reporting databases experience predominately read intensive operations and requires more throughput than anything else. Concurrency and availability are not as important for reporting databases as they are for OLTP. Characteristically workload is repeated read of data. Reporting is usually done when the users want to understand the performance of the business, for example how many accounts were opened this week, how many accounts were closed, is the private banking account team hitting it’s quota of acquiring new customers? Think of reporting as predictable requests, the user knows what data he wants to see and has a specific report design that structures the data in order needs to understand these numbers. This means, this report is repetitive which allow the DBA to design and optimize database and schema so that this query gets executed predictable and efficiently. Database design can be optimized for this report. Typical database schema designs for reporting include the Star Schema and the Snow Flake Schema.
As it serves the back office processes, availability and concurrency are not a strict requirement of this kind of database. As long as the database is available when the report is required. Enhanced throughput helps tremendously.

OLTP Reporting OLAP Analytics
Availability +++ +
Concurrency +++ +
Latency sensitivity +++ +
Throughput oriented + +++
Ad hoc + +
I/O Operations Mix R/W Read Intensive
OLAP

OLAP can be seen as the analytical counterpart of OLTP. Where OLTP is the original source of data, OLAP is the consolidation of data, typically originating from various OLTP databases. A common remark made in database world is that OLAP provides a multi-dimension view, meaning that you drill down the data coming from various sources and then analyze the data amongst different attributes. This workload is more ad-hoc in nature then reporting as you slice and dice the data in different ways depending on the nature of the query. The workload is primarily read intensive and can run complex queries involving aggregations of multiple databases, therefore its throughput oriented. An example of an OLAP query would be the amount of additional insurance services gold credit card customers were signing up for during the summer months.

OLTP Reporting OLAP Analytics
Availability +++ + +
Concurrency +++ + +
Latency sensitivity +++ + ++
Throughput oriented + +++ +++
Ad hoc + + ++
I/O Operations Mix R/W Read Intensive Read Intensive
Analytics

Analytical workload is truly ad-hoc in nature. Whereas reporting aims to provide perspective of the numbers that are being presented, analytics provide insights in why the numbers are what they are. Reporting provides the how many new accounts where acquired by the private banking account team, analytics aims to provide insights why the private banking account team did not hit their quota in the last quarter. Analytics can query multiple databases and can be multi-step processes. Typically analytic queries write out large temporary results. Potentially it generates large intermediate results before slicing and dicing the temp data again. This means this data needs to be stored as fast as possible, the data is read again for the next query therefor read performance is crucial as well. Output is the input of the next query and this can happen multiple times, requiring both fast read and write performance otherwise your query will slow down dramatically.
Another problem is the sort process, for example you are retrieving data that needs to be sorted however the dataset is so large that you can’t hold everything in memory during the sort process resulting in spilling data to disk.
Because analytics queries can be truly ad-hoc in nature it is difficult to design an effecient schema for it upfront. This makes analytics an especially difficult use case from a performance perspective.

OLTP Reporting OLAP Analytics
Availability +++ + + +
Concurrency +++ + + +
Latency sensitivity +++ + ++ +++
Throughput oriented + +++ +++ +++
Ad hoc + + ++ +++
I/O Operations Mix R/W Read Intensive Read Intensive Mix R/W
Designing and testing your storage architecture in line with DB-workload

By having a better grasp of the storage performance requirements of each specific database you can now design your environment to suits its need. Understanding these requirements helps you to test the infrastructure more focused on the expected workload.
Instead of running “your average db workload” in Iometer this allows you to test more towards latency or throughput oriented workloads when understanding what type of database will be used. The next article of this series dives into understanding whether tuning databases or storage architectures can solve performance.

Other parts of this series

Part 2 – Data pipelines
Part 3 – Ancillary structures for tuning databases

Filed Under: Miscellaneous

Improve public speaking by reading a book?

September 8, 2014 by frankdenneman

Although it sounds like an oxymoron I do have the feeling that books about this topic can help you become a better public speaker, or in a matter of fact more skillful in driving home your message.

After our talk at VMworld a lot of friends complimented not only on the talk itself but also on the improvements I’ve made when it comes to public speaking. My first public speaking engagement was VMworld 2010 at Vegas, 8 o’clock Monday morning for 1200 people. Talk about a challenge! Since then I have been slowly improving my skills. Last year I’ve done more talks than the previous 3 years before combined. Although Malcolm Gladwell’s 10.000 –hour rule is heavily debated nowadays, I do believe that practice is by far the best way to improve your skill. By itself getting 10.000 hours of public speaking time is rather a challenge and just going through the motions alone will be very inefficient. To maximize efficiency I started to dive into the theory behind public speaking or even more broadly theory about communicating. Over the year I read a decent stack of books but these four stood out the most.

1: Confessions of a public speaker by Scott Berkun
Funny and highly practical. If you want to buy only one book, this one should be it. The book helps you with the act of public speaking; How to deal with stage fright, how to work a tough room, what are the things I need to take care of to make my talk go smoothly.

2: Made to Stick by Chip and Dan Heath
This book helps you structure the message you public-speaking-books
want to convey. It helps you to dive into the core of your message and communicate them in a memorable way. It’s a great book to read, lots of interesting stories and it’s one of those books that you should read multiple times to keep on refining your skillset.

3: Talk like Ted by Carmine Gallo
To some extent a combination of the two first books. The interesting part is the focus on the listener experience and its capability to focus for 18 minutes. In addition, it gives you insights into some of the greatest TED talks.

4 Pitch Perfect by Bill and Alisa Bowman
This book helps you to enhance your communication skills. It dives deeper into the act’s verbal and non-verbal language. It helps you to become cognizant of some of the mistakes everyone makes, yet can be avoided quite easily. The book helps you to drive your point in a more confident, persuasive, and certain manner.

The beauty of these books is that you can use them, learn from them even if you are not a public speaker. In everyday life we all need to communicate, we all want our idea to be heard and possibly get a buy-in from others. I believe these books will help you achieve this. If you have found other books useful and interesting please leave a comment.

Filed Under: Miscellaneous

Virtual machines versus Containers who will win?

August 21, 2014 by frankdenneman

Ah round X in the battle between who will win, which technology will prevail and when will the displacement of technology happen. Can we stop with this nonsense, with this everlasting tug-of-war mimicking the characteristics of a schoolyard battle. And I can’t wait to hear these conversations at VMworld.
In reality there aren’t that many technologies that completely displaced a prevailing technology. We all remember the birth of the CD and the message of revolutionising music carriers. And in a large way it did, yet still there are many people who prefer to listen to vinyl. Experience the subtle sounds of the medium, giving it more warmth and character. The only solution I can think of that displaced the dominant technology was video disc (DVD & Blue Ray) rendering video tape completely obsolete (VHS/Betamax). There isn’t anybody (well let’s only use the subset Sane people) that prefers a good old VHS tape above a Blue ray tape. The dialog of “Nah let’s leave the blue-ray for what it is, and pop in the VHS tape, cause I like to have that blocky grainy experience” will not happen very often I expect. So in reality most technologies coexist in life.
Fast forward to today. Dockers’ popularity put Linux Containers on the map for the majority of the IT population. A lot of people are talking about it and see the merits of leveraging a container instead of using a virtual machine. To me the choice seems to stem from the layer you present and manage your services. If your application is designed to provide high availability and scalability, then a container may be the best fit. If your application doesn’t than place it in a virtual machine and leverage the services provided by the virtual infrastructure. Sure there are many other requirements and constraints to incorporate in your decision tree, but I believe the service availability argument should be one of the first steps.
Now the next step is, where do you want to run your container environment? If you are a VMware shop, are you going to invest time and money to expand your IT services with containers or are you going to leverage an online PAAS provider? Introducing an APPS centric solution into an organization that has years of experience in managing Infrastructure centric platforms might require a shift of perspective
Just my two cents.

Filed Under: Miscellaneous

Disable vMotion for a single VM

August 18, 2014 by frankdenneman

This question pops up regularly on the VMTN forums and reddit. It’s a viable question but the admins who request this feature usually don’t want Maintenance mode to break or any other feature that helps them to manage large scale environments. When you drill down, you discover that they only want to limit the option of a manual vMotion triggered by an administrator.
Instead of configuring complex DRS rules, connect the VM to an unique portgroup or use bus sharing configurations, you just have to add an extra permission to the VM.
The key is all about context and permission structures. When executing Maintenance mode the move of a virtual machine is done under a different context (System) then when the VM is manually migrated by the administrator. As vCenter honors the most restrictive rule you can still execute a Maintenance mode operation of a host, while being unable to migrate a specific VM.
Here is how you disable vMotion for a single VM via the Webclient:
Step 1: Add another Role let’s call it No-vMotion

  1. Log in as a vCenter administrator
  2. Go to the home screen
  3. Select Roles in the Administration screen
  4. Select Create Role Action (Green plus icon)
  5. Add Role name (No-vMotion)
  6. Select All Priveleges
  7. Scroll down to Resource
  8. Deselect the following Privileges:
  • Migrate powered off virtual machine
  • Migrate powered on virtual machine
  • Query vMotion

Edit role No-vMotion
Step 2: Restrict User privilege on VM.

  1. Select “Host and Clusters” or “VMs and Templates” view, the one you feel comfortable with.
  2. Select the VM and click on the Manage tab
  3. Select Permissions
  4. Click on “Add Permissions” (Green plus icon)
  5. Click on Add and select the User or Group who you want to restrict.
  6. In my example I selected the user FrankD and clicks on Add on OK
  7. On the right side of the screen in the pulldown menu select the role “No-vMotion” and click on OK.

2-Add-Permission
Ensure that the role is applied to This object.
3-This-Object
FrankD is a member of the vCenterAdmins group which has Administrator privileges propagated through the virtual datacenter and all its children.
However FrankD has an additional role on this object “No-vMotion”. Let’s check if it works. Log in with the user id you restricted and right-click the VM. As shown, the option Migrate is greyed out. The VM is running on Host ESX01
4-No-Migrate
The option Mainentance Mode is still valid for Host ESX01.
5-Enter-Maintenance Mode
Click on the option “More Tasks” in the Recent Task window, here you can verify that FrankD is the initiator of the operation Maintenance mode, and System migrated the virtual machine.
6-Context

Filed Under: vMotion

Platform 9 – transform your virtual infrastructure into a private cloud within seconds

August 12, 2014 by frankdenneman

Recently I had the joy of reconnecting with some of my old VMware colleagues to learn that their new startup was coming out of stealth. Today Platform 9 announced their SaaS platform.
In short, Platform 9 allows IT organisations to transform their local IT infrastructure into a self-service private cloud. The beauty of this product is that it can be implemented on existing infrastructures. No need to create a new infrastructure to introduce the private cloud within your organisation. Just install the agent on your hypervisor layer, connect with the Platform 9 cloud management platform and you are off into the world of private clouds. The ease of integration is amazing and I believe that Platform 9 will be the accelerator of private cloud adoption. No need to go to AWS, no migration to Azure. You manage your own resources while allowing the customer to provision their own virtual machines or containers. Today Platform 9 supports KVM, but they will support both VMware and docker environments soon.
I can dive into the details of Platform 9 but Eric Wright has done a tremendous job of publishing an extensive write-up and I recommend reading his article to learn more about Platform 9 private cloud offering. If you want to meet the team of Platform 9 and hear their vision, visit booth #324 at the solution exchange of VMworld 2014.

Filed Under: Miscellaneous

  • « Go to Previous Page
  • Page 1
  • Interim pages omitted …
  • Page 32
  • Page 33
  • Page 34
  • Page 35
  • Page 36
  • Interim pages omitted …
  • Page 83
  • Go to Next Page »

Copyright © 2026 · SquareOne Theme on Genesis Framework · WordPress · Log in