z/OS Problem Determination Series - Day 10
Posted by Ralph Johnson on Fri, May 15, 2009 @ 12:00 PM
Thus far, we have focused entirely on S0C7 abends in VERY simple programs written in ALC, COBOL, and C. In these abend situations, we have been 100% focused on the state of our application program at the time of the abend.
Today, I will touch on a bit more advanced subject that may be of value to you at some point in the future. One of the things we have been omitting from the SYSUDUMP up to this point is the System Trace. It is near the end of the SYSUDUMP, and looks like this:
--------------------------------------------------- SYSTEM TRACE TABLE ----------------------------
PR ASID WU-ADDR- IDENT CD/D PSW----- ADDRESS- UNIQUE-1 UNIQUE-2 UNIQUE-3 PSACLHS- PSALOCAL UNIQUE-4 UNIQUE-5 UNIQUE-6
00 002C 007FF350 SVCR B 070C0000 8323FF90 00000000 00FC3EB0 00000004
00 002C 007FF350 SVC 5F 070C1000 83240846 00000000 00000008 7FF6FCB8
The system trace displays information regarding the PSW, registers, and timestamps for every SVC (supervisor call) and dispatch event issued by your program and the operating system.
Knowing where you last were, in the event you abend with a S0C4/S0C1 and your PSW is all zeroes (you branched outside your program) is especially useful.
To find the point in the System Trace where your program failed, I search/scan for the "RCVY PROG" entry. This will always be near the end of the System Trace and looks like this:
00 002C 007FF048 *RCVY PROG 940C7000 00000000 00000000 00000000 00000000 002C 002C C41D9090EFE8AEF2
Since this test was run on a lightly loaded system and it was a simple program, you can follow the entire execution from where the TCB was attached to program failure. This rarely happens on a medium to heavy loaded system!
A full copy of the System Trace for today's discussion is located here.
I find the following columns useful in simple diagnosis situations:
- WU-ADDR- is the TCB address for the entry
- IDENT lets you know what is happening
- CD/D will further identify what is going on (i.e the SVC #, ...)
- PSW----- ADDRESS- is the full 8-byte PSW at the time of the entry
- UNIQUE-x are the register values starting at R14 at the time of the entry
- TIMESTAMP-RECORD is the time of each entry in the trace table
To follow the flow of the failing program get the TCB address from column 3 of the RCVY PROG entry, which is 007FF048 in this case. From the top of the System Trace search for this address. If you find the "SVCR 2A" (ATTACH SVC) where this this TCB was created, you have everything from the time the TCB was created (attached) until program failure.
In our example, we see the following:
- TCB was dispatched at C41D9090EF25D808
- PGM asked for x'650' bytes of storage (SVC 78) at C41D9090EF28A508 and returned at C41D9090EF28E130 with storage at address 7F6479B0
- PGM asked for x'60' bytes of storage (SVC 78) at C41D9090EF28F148
- Issued SYNCH (SVC 12) at C41D9090EF292C74
- and many other others, before...
- Received PGM 007 at C41D9090EFE5F712
While I could explain each and every entry for this TCB, this is just not typically necessary. Knowing what your program has executed just prior to a program failure is sometimes key to understanding program failures.
Next week, we will talk about different types of abends, and how to avoid them.