z/OS Problem Determination Series - Day 9
Posted by Ralph Johnson on Thu, May 14, 2009 @ 10:00 AM
Today we take a look at the failing C program from last week, but we are going to start with just a Language Environment dump and see if that is sufficient to diagnose the problem.
Up front I must state my bias and typically find that the Language Environment does not dump enough information to meet my needs. Perhaps its the code I'm working with, or the various environments this code is running in. Hopefully, you will have better luck than I have had with this in the past. Same with Abend-Aid.
The files we will be working with today include:
The first thing we notice on today's S0C7 abend is that there is no Symptom Dump in the JOBLOG. But, no worries as all the same information is contained in the Language Environment dump output.
One interesting thing that I really like about Language Environment dumps is shown on the first page of the LE dump. LE breaks down the calling sequence in the Traceback area. The routine at the top of the chain is the routine that produces the LE dump. The second line indicates that our main, or failing, program was at offset +F6 when it entered CEEHDSP. This chain of calls is especially useful when you are 10 calls deep & your program fails. Notice the area near each register address, PSW address, and the information typically found in the Symptom Dump are displayed for each DSA.
First, we can see that the "main" routine experienced the exception condition based on the information found in the traceback area.
Next, we need to find the area for this DSA. In this case the Condition Information for the DSA address we are looking for is 18F18208 (DSA 2) with an entry address of 18F004C0. Typically this is just below the Traceback info.
Here we find the following information:
- PSW = 18F005BC
- ILC = 6
- INTC = 7
- OFFSET = F6
LE automatically adjusts the OFFSET to the real failing instruction offset. The failing instruction is at 18F005B6 (PSW 18F005BC minus ILC 6).
Just below the Condition Information, we find the Storage Area near condition. This is the area near where the program failed. In this case, the storage at 18F005B6 is FA 22 D0 CA 40 00. By now, you have already noticed this is our dreaded Add Decimal (AP) instruction.
Operand 1 is located at +CA from where register 13 is pointing. In this case operand 1 is located at 18F182D2 (18F18208 + CA). The length of the operand in 3 bytes.Unfortunately, LE only dump +20 bytes before, and +40 bytes after each register (my biggest objection to LE!) In some cases, you are stuck at this point. One method I have used to overcome this is scanning through all register storage looking for the storage that is not included in the R13 information. In this case, we find 18F182D2 is NOT included elsewhere. This operand becomes an open issue!
Operand 2 is located at +00 from where register 4 is poinitng. Operand 2 is located at 18F006A0 and is also 3 bytes in length. The value located at 18F00A0 is 00 00 00. This is not a valid packed decimal field, so this is part of the problem.
Looking at the program listing, we find that there are no instructions at +F6 in the compile listing. The failing instruction FA 22 D0 CA 40 00 can be found at +186. The reason for this discrepancy is that the program base starts at +90. So if +90 is added to the +F6 offset we get +186. Now the program listing agrees with the LE dump.
Review of our program source, shows that counter2 was initially set to a value of 2, but was set to NULL by the following instruction:
memset(&counter2,0,sizeof(counter2));
By removing this instruction, re-compiling, and re-running the failing program this problem is resolved.
Tomorrow we will address some other common abends, and some best practices on how to resolve them.