Know your workload environment
When looking at the workload from a performance angle, it is useful to note that there are broadly three types of performance environments. Each type is a bit different when it comes to monitoring performance, so I will discuss the problems of each of them.
The first one I dub “Webshop”: a database that has a very large
amount of tiny snappy SQLstatements. Think of geospacial databases,
webshops or queue applications such as Biztalk. In such environment
there are measures taken to control the performance. For example: the
snappy SQL’s have their executionplans frozen, statistics are not on
autoupdate and – in the case of the MSSQL types of databases – a rigor
testing on locking and blocking was done during the codewriting.
Due to these measures the performance stability is high, but a problem that can still arise in such system is a capacity
problem, so any sign of upcomming componentstress is a call to action
for the DBA to avoid a meltdown. Be aware of the difference between the
two: you can have a capacity problem without performance problems and
vise versa. Anyways, the motto of these systems is: Responsetime is king.
The second environment is its absolute counterpart, the Data Warehouse. The workload consists of chained batches which need to be processed as quickly as possible. Componentstress is not alarming, it is by design. But as long this workload of chained batches has a stable duration and consistently stays within the allocated timewindow there is little concern for performance tuning. Motto of these systems: throughput is king.
The third environment one I dub “OLTP”, which has a mix of the above workloads: long reports, small transactions and everything in between. Differentiate the workload in priorities is key here. This is sometimes taking the shape of SLA’s, formally or informally defined.
The Webshop type: a closer look
One of the problems of a webshop type is the monitoring overhead. Most RDMS systems I know can add an inhibitive footprint to the SQL statement, which can bring a systems to a halt, I noticed to my dismay. So how to trendwatch the performance if you can’t monitor the activity properly? I employed two different solutions here. In the first, I figure out a threshold value for monitoring. Any SQL above this threshold is captured and the majority of statements go under the radar. This has the disadvantage that, in case of an incident, the SQL statements will exceed the threshold en masse and skew the performance graph. But still it has for me the merit that I am aware of how many thresholds violations took place and therefor how stable the performance is. An alternative is to check if I can measure performance by proxy, that is, look at the resource consumption. After a while of investigating a system you may get a feel on how well the performance is controlled and therefor how rigid the relation is between performance and resource consumption. For example, you may end up using CPU consumption and Batches per Second (as the frequency).
The DWH: a closer look
Monitoring a DWH is a lot easier. On systems where there are longrunning processes there is often already a repository of runtimes to tap into. Even reporttools such Microsofts Reportserver have tables where you can retrieve the information for building a workloadtable. So check if there is such repository, as the information can more easily harvest here. For example, on a database you can capture the SQL statements, but an enduser is waiting for a report, which can consist of several statements. An application-level repository groups this nicely together. Also the batch and report parameters are stored, which is convenient for anomaly investigation.
In my experience, a performance trend on a DWH exposes non-scalable processes, such as using full loads instead of loading delta’s, delete * from a table instead of truncates etc.
The OLTP: a closer look
The OLTP type can be a mixed bag with longrunning reports, snappy SQLstatements and everything in between. What is special to this environment is that you need to divide the workload to priority more than in the other two environments. Let the workload reflect of what you want to know; waste – such as unimportant workload or workload at unimportant times – is not uncommon.
Now we have seen what this method of measuring TET can do, let’s turn to the technical details on how to gather data to create the workload tables, ranking reports and the trendlines. I will give every RDBMS its own post.