CURT (CPU Utilization Reporting Tool) User's Guide

Overview of CURT

Curt is a tool that takes an AIX trace file as input and produces a number of statistics related to processor (CPU) utilization and process/thread activity. It will work with both uniprocessor and multiprocessor AIX traces if the processor clocks are properly synchronized.

The AIX trace file which is gathered using the trace command should contain at least the trace events (trace hooks) listed below. These are the events curt looks at to calculate its statistics:

  HKWD_KERN_SVC, HKWD_KERN_SYSCRET, HKWD_KERN_FLIH, HKWD_KERN_SLIH,
  HKWD_KERN_SLIHRET, HKWD_KERN_DISPATCH, HKWD_KERN_RESUME, HKWD_KERN_IDLE,
  HKWD_SYSC_FORK, HKWD_SYSC_EXECVE, HKWD_KERN_PIDSIG, HKWD_SYSC__EXIT
  HKWD_SYSC_CRTHREAD

This means that, if you specify the -j flag on your trace command, you must include these numbers for curt:

-j 100,101,102,103,104,106,10C,119,134,135,139,200,465

Or, you can use -J curt instead.

For versions of AIX prior to 5.2.0, curt requires the ptools library to execute, so be sure that you have installed the proper version of ptools.utilities. For AIX 5.2.0, curt uses the ptools library which is part of the bos.perf.tools fileset.

Syntax

curt -i inputfile [-o outputfile] [-n gennamesfile] [-m trcnmfile] [-a pidnamefile] [-f timestamp] [-l timestamp] [-ehpst]

Flags

-i inputfile Specify the input AIXTrace file to be analyzed.
-o outputfile Specify the output file (default is stdout).
-n gennamesfile Specify a names file produced by gennames.
-m trcnmfile Specify a names file produced by trcnm.
-a pidnamefile Specify a pid->process name mapping file.
-f timestamp Start processing trace at "timestamp" seconds.
-l timestamp Stop processing trace at "timestamp" seconds.
-e Output elapsed time information for system calls.
-h Display usage text (this information).
-p Output detailed process information.
-s Output information about errors returned by system calls.
-t Output detailed thread by thread information.

If the trace process name table is not accurate, or if more descriptive names are desired, use the -a flag to specify a Pid to process name mapping file. This is a file with lines consisting of a process ID (in decimal) followed by a space followed by an ASCII string to use as the name for that process.

Report Contents

curt and AIX Trace Information

The first lines in the curt report give the time when the curt program was executed and the command line used to invoke curt. Following that is this information about the AIX trace file being processed by curt: name, size, creation date, and the command used to gather the trace file.

System Summary

The first major section of the report is the System Summary. This section describes the time spent by the system as a whole (all processors) in various execution modes. These modes are as follows:

APPLICATION
The sum of times spent by all processors in User (ie. non-privileged) mode.
SYSCALL
The sum of times spent by by all processors doing System Calls. This is the portion of time that a processor spends executing in the kernel code providing services directly requested by a user process.
KPROC
The sum of times spent by all processors executing kernel processes other than the IDLE process. This is the portion of time that a processor spends executing specially created dispatchable processes which only execute kernel code.
FLIH
The sum of times spent by all processors in FLIHs (first level interrupt handlers).
SLIH
The sum of times spent by all processors in SLIHs (second level interrupt handlers).
DISPATCH
The sum of times spent by all processors in the AIX dispatch code. This sum includes the time spent in dispatching all threads (i.e. it includes the dispatches of the IDLE process).
IDLE DISPATCH
The sum of times spent by all processors in the AIX dispatch code where the process being dispatched was the IDLE process. Because the DISPATCH category includes the IDLE DISPATCH category's time, the IDLE DISPATCH category's time is not separately added to calculate either CPU(s) busy time or TOTAL (see below).
CPU(s) busy time
The sum of times spent by all processors executing in APPLICATION, SYSCALL, KPROC, FLIH, SLIH, and DISPATCH modes.
IDLE
The sum of times spent by all processors executing the IDLE process.
TOTAL
The sum of CPU(s) busy time and IDLE. This number is referred to as "total processing time."

The column labeled processing total time (msec) gives the total time in milliseconds for the corresponding processing category. The column labeled percent total time gives the processing total time as a percentage of the TOTAL processing total time. The column labeled percent busy gives the processing total time as a percentage of the CPU(s) busy time processing total time.

The Avg. Thread Affinity is the probability that a thread was dispatched to the same processor that it last executed on.

Per Processor Summary

Following the System Summary is the Per Processor Summary, which is essentially the same information but broken down on a processor by processor basis. In the description given for the System Summary, the phrase "sum of times spent by all processors" can be replaced by "time spent by this processor". The Total number of process dispatches refers to how many times AIX dispatched any non-IDLE process on this processor, while Total number of idle dispatches gives the count of IDLE process dispatches.

Application Summary

The second major section of the report is the Application Summary. The first part of this section summarizes the total system processing time on a per-thread basis (by Tid). For each thread, identified by Process ID (and name if available) and Thread ID, the summary gives the total "application" (same as APPLICATION above) and "syscall" (same as SYSCALL above) processing time in milliseconds and as the percentage of the total system processing time for all processors in the trace. In addition, the summary gives the sum of those two times, both as raw time, and as a percentage of the total processing time.

The second part of this section gives the same information on a per-process ID (by Pid) basis. The third part of this section gives the same information on a per-process name (by process type) basis.

The fourth part of this section gives similar information for kernel process threads (Kproc Summary). Since most kprocs provide a specific kernel service, the total processing time is split into two categories, "operation" and "kernel," which loosely correspond to "syscall" and "application" for a process which always runs in kernel code. Each kproc thread is identified by name, Process ID, Thread ID and type of kproc if known. The kproc types are listed and described in a table immediately following this summary.

All four sections of the Summary are presented in sorted order from most combined processing time to least.

Note: Pid's and Tid's (Process and Thread ID's) are always given in decimal.

System Calls Summary

The third major section of the report is the System Calls Summary. This section summarizes the processing time spent in system calls. For each system call (SVC), identified by kernel address (and name if available), the summary gives the number of times the SVC was called and the total processor time for all calls in milliseconds and as a percentage of total system processing time for all processors in the trace. In addition, the summary gives the average, minimum and maximum times for one call to the SVC. If the -e flag is specified, the summary gives the total elapsed time for all calls to the SVC and the average, minimum and maximum times for one call. "Elapsed time" is the wall-clock time from when the process starts executing the SVC in kernel mode until the process resumes executing in application mode. The Summary is presented in sorted order from most total processor time to least. If the -s flag is specified, the summary gives the number of times each error code (errno) was returned by each System Call.

The second part of this section is the Pending System Calls Summary. This part lists the System Calls which have started but not completed. The time that is given is included in the SYSCALL time for the system and the various processors and is included in the syscall time for the thread and process which issued the SVC, but is not included in the processing time for the system call in the first part of this section. The pending call is also not included in the count given in the first part of this section.

Note: System Call Addresses are always given in hexadecimal. Pid's and Tid's are always given in decimal.

Flih Summary

The fourth major section of the report is the Flih Summary. This section summarizes the amount of time spent in first level interrupt handlers (Flih). The first part of the summary gives the total number of entries to each Flih in the trace, as well as the total processor time for all executions of the Flih by all processors in milliseconds. In addition, the summary gives the average, minimum and maximum times for one execution. Each Flih is identified by a system-defined Flih type and a corresponding Flih name, if known.

The second part is the same information broken down on a processor by processor basis. It is possible that not all Flihs which occurred on the system will have occurred on each processor, so the Global Flih list may not be the same as the Flih list for each processor.

The second part of this section may include the Pending Flih Summary. This is a list of the Flihs which have started but not completed. The time that is given is included in the FLIH time for the system and the affected processor, but is not included in the processing time for the Flih in both parts of this section. The pending Flih is also not included in the counts given in both parts of this section.

Slih Summary

The fifth major section of the report is the Slih Summary. This section summarizes the amount of time spent in second level interrupt handlers (Slih). The first part of the summary gives the total number of entries to each Slih in the trace, as well as the total processor time for all executions of the Slih by all processors in milliseconds. In addition, the summary gives the average, minimum and maximum times for one execution. Each Slih is identified by kernel address and Slih function or module name, if known.

The second part is the same information broken down on a processor by processor basis. It is possible that not all Slihs which occurred on the system will have occurred on each processor, so the Global Slih list may not be the same as the Slih list for each processor.

The second part of this section may include the Pending Slih Summary. This is a list of the Slihs which have started but not completed. The time that is given is included in the SLIH time for the system and the affected processor, but is not included in the processing time for the Slih in both parts of this section. The pending Slih is also not included in the counts given in both parts of this section.

Detailed Process Information

This section of the report is produced when the -p flag is specified. It gives detailed information about each process found in the trace. This information is as follows:

  1. The Process ID (Pid) for that process as well as the process name if known.
  2. A count and a list of the Thread ID's (Tids) for that process.
  3. The time spent in application (user) mode and system call mode is shown. For kprocs, the time spent in kernel mode and operation mode is shown instead.
  4. Information on what System Calls were made by threads of this process. The -e flag also affects this output.

The processes are presented in sorted order from most combined application and syscall processing time to least.

Detailed Thread Information

This section of the report is produced when the -t flag is specified. It gives detailed information about each thread found in the trace. This information is as follows:

  1. The Thread ID (Tid) and Process ID (Pid) for that thread as well as the process name if known.
  2. The time spent in application (user) mode and system call mode is shown. For kprocs, the time spent in kernel mode and operation mode is shown instead.
  3. Information on what System Calls were made by this thread, including information on errors returned by the System Calls if the -s flag was specified. The -e flag also affects this output.
  4. The processor affinity is the probability that, for any dispatch of the thread, the thread was dispatched to the same processor that it last executed on.
  5. The "Dispatch Histogram" shows the number of times the thread was dispatched to each CPU in the system.
  6. The total number of times the thread was dispatched (not including redispatches described in 7 below).
  7. The number of redispatches due to interrupts being disabled indicates that the same thread which just ran was dispatched again because that thread has set the interrupt mask to INTMAX. This is shown only if non-zero.
  8. The average dispatch wait time is the average elapsed time since the thread was last undispatched (i.e. average elapsed time since the thread last stopped executing).
  9. How many times each type of Flih occurred while this thread was executing. Some of these types may be caused by the thread (such as DSI or ISI) while other types (such as IO) are can occur when this thread just happens to be running and are not necessarily caused by the thread itself.

The threads are presented in sorted order from most combined application and syscall processing time to least.

Related Information

AIX 5L Version 5.2 Performance Tools Guide and Reference