October 1995
Welcome Back
Last month, I described some
basic performance rules
that can be used to monitor the behavior of a computer system. I also made available a
performance toolkit that includes an implementation of the rules, and a GUI front end called ruletool.se. We also held a competition to give ruletool.se a better name; the winning name will be announced in a future column. This month, I will take a closer look at the virtual_adrian.se script and explain how the SE performance toolkit works so you can build your own scripts.
Look Out, There's a Guru About
Sysadmin: What's wrong with this system? Can you tune it, please? Here's the root password.
virtual_adrian.se
Performance Tuner and Monitor
Check and Tune the System
I'm always asked for a magic bullet - the secret tweak to a kernel variable that will miraculously make the whole machine run much faster. I'm sorry, but tuning the kernel should be the last thing you try. It is very rare for this kind of change to make a difference that can be reliably measured at all. That said, there are a small number of common kernel tuning variables that I like to tune. I generally bring older releases in line with later ones, so most tweaks are for Solaris 2.3 and are not needed for Solaris 2.4 or 2.5./etc/system
. The problem is that these variables may be incorrectly set due to folklore. They may also be unnecessary or harmful in later releases, and the /etc/system
file is often propagated intact to very different systems. Deciding when a tweak is useful and what the value should be for a particular release is complex. Your motto should be, 'If in doubt, do without!'maxusers
tunable automatically scales as you add memory, so you don't need to set it unless you have several GB of RAM and are short on kernel memory. I implemented a check-and-tune routine in virtual_adrian.se that checks and tunes some values directly. It tells you what it does, and tells you to set variables in /etc/system
for the next reboot if they cannot be set on line. The check is performed once only (at start-up) and is implemented by a routine called static_check().
Note that the current value of each tunable is checked by my script. Please do not blindly set all these values in your own /etc/system
file as you may decrease something that you thought you were increasing.
The tuning actions performed for Solaris 2.4 and 2.5 are:
lotsfree
as CPUs are added, to 128 pages per CPU. This has no effect on uniprocessor systems, which default to 128. A larger free list helps prevent transient kernel memory allocation failures. If the free list is at zero when the kernel needs memory and can't wait, kmem errors will be reported by the kmem error rule.
ufs_ninode
to 5000. Request that the DNLC size be increased to 5000 by adding set ncsize=5000
in /etc/system
. (If the values are already this big, leave them alone.)
autoup
), and runs every five seconds (tune_t_fsflushr
). Fsflush checks each page of memory in turn. Systems with a lot of memory can waste CPU time. If you are running a raw disk resident database, fsflush does not need to run often. In virtual_adrian.se, a configurable process monitor watches a single process and complains if it uses too much CPU. By default process id 3 (fslush) is monitored and a complaint occurs if it takes more than five percent of one CPU.
nfsstat -m
on the NFS clients. Note that Solaris 2.5 supports NFS over TCP/IP, the timers are not needed and are reported as zero in this case. The timers can only be read when running as root, which is the main reason this check is not part of ruletool.se. I check that the overall smoothed round trip time is under 50ms (srtt for All:). This is similar in concept to checking that disk I/O service time is better than 50ms in the disk rule.Figure 1 Example output from
nfsstat -m
------------------------------------------------------------------------- /home/username from server:/export/home3/username Flags: vers=2,hard,intr,down,dynamic,rsize=8192,wsize=8192,retrans=5 Lookups: srtt=7 (17ms), dev=4 (20ms), cur=2 (40ms) Reads: srtt=16 (40ms), dev=8 (40ms), cur=6 (120ms) Writes: srtt=19 (47ms), dev=3 (15ms), cur=6 (120ms) All: srtt=15 (37ms), dev=8 (40ms), cur=5 (100ms) -------------------------------------------------------------------------
virtual_adrian.se
------------------------------------------------------------------------- % virtual_adrian.se Warning: Cannot init kvm: Permission denied Warning: Kvm not initialized: Variables will have invalid values Adrian is monitoring your system starting at: Thu Sep 21 00:37:26 1995 Warning: Cannot get info for pid 3. superuser permissions are needed to access every process Using predefined rules for disk, net, rpcc, swap, ram, kmem, cpu, mutex, dnlc and inode Checking the system every 30 seconds... -------------------------------------------------------------------------
The warning about kvm (the raw kernel data interface) can be ignored, as the script does not attempt to use any kvm data unless it runs as root. When you run it as root you see the extra rules are configured.
------------------------------------------------------------------------- # /opt/RICHPse/examples/virtual_adrian.se Adrian is monitoring your system starting at: Thu Sep 21 00:46:01 1995 Process watcher pid set to 3, process name fsflush, max CPU usage 5.0% NFS client threshold set at All: srtt=20 (50ms) max NFS round trip Minimum client NFS ops/sec considered active 2.00/s Using predefined rules for disk, net, rpcc, swap, ram, kmem, cpu, mutex, dnlc and inode Checking the system every 30 seconds... -------------------------------------------------------------------------
I ran the command
% find / -ls >/dev/null
which makes the disk and the name caches rather busy, and got this output on a 32MB SPARCstation IPX.
------------------------------------------------------------------------- Adrian detected slow disk(s): Thu Sep 21 00:48:33 1995 Move load from busy disks to idle disks State disk r/s w/s Kr/s Kw/s wait actv svc_t %w %b delay red c0t3d0 23.9 5.6 81.1 44.5 2.6 1.3 130.7 17 78 3856.1 amber c0t5d0 1.3 1.9 7.1 35.2 0.0 0.1 30.3 0 6 94.9 Adrian detected Directory Name Cache problem (amber): Thu Sep 21 00:48:33 1995 Poor DNLC hitrate, increase ncsize DNLC hitrate 44.1%, reference rate 125.50/s DNLC has 617 entries, try increasing it (and inode cache) to 1234 Adrian detected Inode Cache problem (amber): Thu Sep 21 00:48:33 1995 Poor inode cache hitrate, increase ufs_ninode Inode hitrate 26.9%, reference rate 64.63/s Adrian detected RAM shortage (amber): Thu Sep 21 00:49:05 1995 The system is getting short on RAM, perhaps add some more procs memory page faults cpu r b w swap free pi po sr in sy cs smtx us sy wt id 0 0 0 43940 700 8 2 55 342 842 205 11 23 18 20 39 -------------------------------------------------------------------------
As you can see, the output is somewhat verbose, and is based on extended versions of familiar command output where appropriate.
The whole point of the SE toolkit is that you can customize the tools very easily. To encourage you to do this, the rest of this column is a tour of the SE language and toolkit classes.
Figure 2 Code for iostat.se
------------------------------------------------------------------------- #! /opt/RICHPse/bin/se #include <stdio.se> #include <tdlib.se> #include <unistd.se> #include <string.se> #include <kstat.se> #include <sysdepend.se> #include <p_iostat_class.se> #include <dirent.se> #include <inst_to_path_class.se> #define SAMPLE_INTERVAL 5 main(int argc, string argv[2]) { p_iostat p_iostat$disk; p_iostat tmp_disk; int i; int interval = SAMPLE_INTERVAL; int ndisks; switch(argc) { case 1: break; case 2: interval = atoi(argv[1]); break; default: printf("use: %s [interval]\n", argv[0]); exit(1); } ndisks = p_iostat$disk.disk_count; for(;;) { sleep(interval); printf("extended disk statistics\n"); printf("disk r/s w/s Kr/s Kw/s wait actv svc_t %%w %%b\n"); for(i=0; i < ndisks; i++) { p_iostat$disk.number$ = i; tmp_disk = p_iostat$disk; printf("%-8.8s %4.1f %4.1f %6.1f %6.1f %4.1f %4.1f %6.1f %3.0f %3.0f\n", tmp_disk.name$, tmp_disk.reads, tmp_disk.writes, tmp_disk.kreads, tmp_disk.kwrites, tmp_disk.avg_wait, tmp_disk.avg_run, tmp_disk.service, tmp_disk.wait_percent, tmp_disk.run_percent); } } } -------------------------------------------------------------------------To illustrate the language, let's walk through this script:
------------------------------------------------------------------------- % iostat.se extended disk statistics disk r/s w/s Kr/s Kw/s wait actv svc_t %w %b sd3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd5 0.0 4.6 0.0 29.2 0.0 0.1 18.3 1 5 -------------------------------------------------------------------------
States and Actions
I decided to extend the usual red/amber/green conditions that most tools implement. I wanted to indicate a few extra conditions that sometimes occur:
------------------------------------------------------------------------- lr_disk_t lr_disk$dr; lr_disk_t tmp_dr; /* use the live disk rule */ tmp_dr = lr_disk$dr; if ( tmp_dr.state > ST_GREEN) { printf("The disks are in the %s state: %s\n", state_string(tmp_dr.state), tmp_dr.action); } -------------------------------------------------------------------------
Rule Definition
Each rule was initially defined in terms of the output of standard system commands. For example the shorthand vmstat30.r
means: Run the command vmstat with a 30-second interval, and look at the column labelled with an r
.ncpus
in the rules. The run queue length is divided by the number of CPUs. This is based on the assumption that every CPU takes a job off the run queue in each time slice.
Table 1 CPU Rule
------------------------------------------------------------------------- CPU RULE LEVEL ACTION ------------------------------------------------------------------------- 0 == vmstat30.r White 1. CPU Idle 0 < (vmstat30.r / ncpus) < 3.0 Green No Problem 3.0 <= (vmstat30.r / ncpus) < 5.0 Amber 2. CPU Busy 5.0 <= (vmstat30.r / ncpus) Red 2. CPU Busy -------------------------------------------------------------------------
Figure 3 Code for Pure CPU Rule
------------------------------------------------------------------------- rule_thresh_dbl cpu_runq_idle = {"RUNQ_IDLE", 0.0, "", 4, 1, "Spare CPU capacity" }; rule_thresh_dbl cpu_runq_busy = {"RUNQ_BUSY", 3.0, "", 4, 1, "OK up to this level" }; rule_thresh_dbl cpu_runq_overload = {"RUNQ_OVERLOAD", 5.0, "", 4, 1, "Warning up to this level" }; print_pr_cpu(ulong file) { print_thresh_dbl(file, cpu_runq_idle); print_thresh_dbl(file, cpu_runq_busy); print_thresh_dbl(file, cpu_runq_overload); } class pr_cpu_t { /* output variables */ int state; string action; /* input variables */ ulong timestamp; int runque; /* i.e. p_vmstat.runque load level */ int ncpus; /* i.e. sysconf(_SC_NPROCESSORS_ONLN) */ /* threshold variables */ double cpu_idle; double cpu_busy; double cpu_overload; pr_cpu$() { double cpu_load; ulong lasttime; /* previous timestamp */ if (timestamp == 0) { /* reset defaults */ cpu_idle = get_thresh_dbl(cpu_runq_idle); cpu_busy = get_thresh_dbl(cpu_runq_busy); cpu_overload = get_thresh_dbl(cpu_runq_overload); return; } if (timestamp != lasttime) { cpu_load = runque; cpu_load /= ncpus; if (cpu_load <= cpu_idle) { state = ST_WHITE; action = "There is more CPU power configured than you need right now"; } else { if (cpu_load < cpu_busy) { state = ST_GREEN; action = "No problem"; } else { if (cpu_load < cpu_overload) { state = ST_AMBER; action = "The CPU is quite busy, perhaps add more CPU power"; } else { state = ST_RED; action = "CPU overload, add more power or quit some programs"; } } } lasttime = timestamp; } } }; -------------------------------------------------------------------------
The code defines three threshold structures and a function to print them to a file descriptor. The function is provided for convenient use in scripts.
CPU Live Rule Code
The live rule wraps up the code needed to read the current values of the required input variables, with the pure rule. As shown in Figure 4 below, the definition is simpler than the pure rule. The only data items defined are a state code and action string. Again, the class function name is used as a prefix for active variables. An active instance of the pure rule is defined, along with a temporary copy that just holds the defined data.
The live rule initializes itself when it is first declared, i.e. before the script starts to run, by reading the time, updating the global copy of the vmstat class data, setting the number of CPUs correctly in the pure rule, then resetting the pure rule while initializing the temporary copy. The state code and action string are set up.
-------------------------------------------------------------------------
class lr_cpu_t {
/* output variables */
int state;
string action;
lr_cpu$()
{
ulong lasttime = 0; /* previous timestamp */
ulong timestamp = 0;
pr_cpu_t pr_cpu$cpu;
pr_cpu_t tmp_cpu;
if (timestamp == 0) {
timestamp = time();
pvm_update(timestamp);
pr_cpu$cpu.ncpus = GLOBAL_pvm_ncpus;
pr_cpu$cpu.timestamp = 0;
tmp_cpu = pr_cpu$cpu; /* reset pure rule */
action = uninit;
state = ST_WHITE;
lasttime = timestamp;
return;
}
timestamp = time();
if (timestamp == lasttime) {
return;
}
/* use the rule */
pvm_update(timestamp);
pr_cpu$cpu.runque = GLOBAL_pvm[0].runque;
pr_cpu$cpu.timestamp = timestamp;
tmp_cpu = pr_cpu$cpu;
state = tmp_cpu.state;
action = tmp_cpu.action;
lasttime = timestamp;
}
};
-------------------------------------------------------------------------
That's All Folks!
Thank you for reading to the end of this column. I've completed the introduction to the SE toolkit that I started last month. I will return to the subject of tools in the future.
Send your comments and questions to adrian.cockcroft@sun.com.