z/OS Problem Determination Series - Day 1
Posted by Ralph Johnson on Mon, May 04, 2009 @ 08:00 AM
Welcome to the z/OS Problem Determination Series
Welcome
to the z/OS Problem Determination Series. This month we will be
focusing on z/OS abend and error troubleshooting. If you are abend
challenged, and have
never received any formal debugging training, this series is for you!
Since many of our clients and associates have expressed an interest in
this subject in the past, we believe it will be of value to anyone that
deals with z/OS troubleshooting and wants a better understanding of the
process.
This Week
This
week we will be focusing on gathering the necessary materials to
facilitate the problem diagnosis process. Not having these materials in
place prior to starting the diagnosis process is typically results in a
waste of your time. At the very minimum I suggest gathering the
following items for z/OS problem diagnosis:
- Program compile & link listings
- Complete job log from the failing job
- SYSUDUMP from the failing job
- Access to IBM system codes documentation
- Access to IBM error messages documentation
- Access to z/Architecture Reference Summary & Principles of Operations manuals
By
the end of this week, you will have examples of each of these, as well
as links to all the IBM documentation (and some idea how to use it).
Today
Today,
we will focus on writing an Assembler Language Code (ALC) program,
compiling and linking it, and running a job to force the abend and
dump. This will fulfill the first three elements (outlined in the list
above) that I believe are necessary for efficient problem
determination.
Patience to the COBOL & C participants, as we will be gather an example of COBOL & C dumps later this week.
Building Your ALC Program
The
first phase in building our (failing) program is to write the source
ALC source code. For our illustration, I'm using the following program (download here via right-click->save as):
DAY01PGM CSECT STANDARD ALC ENTRY CODE
USING *,12 STANDARD ALC ENTRY CODE
STM 14,12,12(13) STANDARD ALC ENTRY CODE
LR 12,15 STANDARD ALC ENTRY CODE
LR 14,13 STANDARD ALC ENTRY CODE
LA 13,SAVE STANDARD ALC ENTRY CODE
ST 13,8(14) STANDARD ALC ENTRY CODE
ST 14,SAVE+4 STANDARD ALC ENTRY CODE
WTO 'HELLO ABEND! (#1)' MESSAGE - BEFORE COMPUTATION
AP COUNTER1,COUNTER2 ADD TWO PACKED FIELDS
WTO 'HELLO ABEND! (#2)' MESSAGE - AFTER COMPUTATION
EXIT0 L 13,SAVE+4 STANDARD ALC EXIT CODE
LM 14,12,12(13) STANDARD ALC EXIT CODE
LA 15,0 STANDARD ALC EXIT CODE
BR 14 STANDARD ALC EXIT CODE
SAVE DC 18F'0' OUR SAVE AREA
COUNTER1 DC PL4'1' OUR 1ST COUNTER SET TO 1
COUNTER2 DS PL4 OUR 2ND COUNTER SET TO 2
END , END OF PROGRAM
I saved this ALC program in a partitioned dataset called IBMUSER.ZOSPD.CNTL with a member name of DAY01SRC.
Next
we need some JCL that will compile this program into object code, and
then Link it into an executable load module. Since our code is written
in assembler, I'm calling a PROC (canned set of JCL) to perform the
compile (assemble - for you sticklers on vocabulary) and link the
program. I'm using the following JCL (download here via right-click->save as):
//DAY01JCL JOB (),CLASS=A,MSGCLASS=K,NOTIFY=&SYSUID
//*********************************************************************
//* JCL TO ASSEMBLE & LINK ALC VERSION OF "HELLO ABEND!" *
//*********************************************************************
//ASMTEST1 EXEC ASMACL
//C.SYSIN DD DSN=IBMUSER.ZOSPD.CNTL(DAY01SRC),DISP=SHR
//L.SYSLMOD DD DSN=IBMUSER.ZOSPD.LOADLIB(DAY01PGM),DISP=SHR
Note: You will most likely need to tailor the JOB card to your
sites's requirements. You will also need to replace IBMUSER with your
TSO Userid on each of the DD statements. The CNTL dataset that I'm
using is allocated as a PDS with LRECL=80, BLKSIZE=27920, and RECFM=FB.
The LOADLIB dataset is allocated as a PDS with LRECL=0, BLKSIZE=27998,
and RECFM=U.
Once your CNTL & LOADLIB datasets are
allocated, or you point your JCL to existing datasets, you are ready to
submit the JCL for execution from TSO/ISPF (the UI for developers on a
z/OS system).
Once you submit your JCL, JES (most likley JES2,
but possibly JES3) takes over to prioritize your job based JOBCLASS
(controlled by CLASS= parameter in your JCL) and priority (we didn't
override priority, so you are getting a default value. Once you are the
highest priority job in your CLASS, JES will assisgn your job to an
initiator for execution. Once your job has completed execution, you
will be notified the next time you hit enter in TSO.
Once you get
the notification that your job is complete, go to SDSF to review the
job output. The job log should look similar to this (download here via right-click->save as):
12.08.40 JOB02182 ---- THURSDAY, 30 APR 2009 ----
12.08.40 JOB02182 IRR010I USERID IBMUSER IS ASSIGNED TO THIS JOB.
12.08.40 JOB02182 $HASP375 DAY01JCL ESTIMATED CARDS EXCEEDED
12.08.41 JOB02182 IEF677I WARNING MESSAGE(S) FOR JOB DAY01JCL ISSUED
12.08.41 JOB02182 $HASP375 DAY01JCL ESTIMATED CARDS EXCEEDED
12.08.41 JOB02182 ICH70001I IBMUSER LAST ACCESS AT 12:07:58 ON THURSDAY, APRIL
12.08.41 JOB02182 $HASP373 DAY01JCL STARTED - INIT 1 - CLASS A - SYS S0W1
12.08.41 JOB02182 - --TIMINGS (MIN
12.08.41 JOB02182 -STEPNAME PROCSTEP RC EXCP CONN TCB SRB CLOCK
12.08.41 JOB02182 -ASMTEST1 C 00 218 91 .04 .00 .0
12.08.41 JOB02182 -ASMTEST1 L 00 26 13 .04 .00 .0
12.08.41 JOB02182 -DAY01JCL ENDED. NAME- TOTAL TCB CPU TIM
12.08.41 JOB02182 $HASP395 DAY01JCL ENDED
The critical thing to look for here is to make sure that BOTH steps
received a zero (00) return code. Typically, a return code of four (04)
is a warning, but you should review and understand before continuing
forward. If you receive a return code of eight (08) is higher, your job
has failed and you need to investigate. I suspect some will receive
this error. If you can't figure it out, post a comment below and I will
either answer offline or post a reply for everyone.
Let the Execution Begin!
If your compile and link was successful, you are now ready to
execute your new program. Once again, we need JCL to run the new
program as a batch job. I'm using the following JCL for this example (download here via right-click->save as):
//DAY01RUN JOB (),CLASS=A,MSGCLASS=K,NOTIFY=&SYSUID
//*********************************************************************
//* JCL TO EXECUTE THE ALC VERSION OF "HELLO ABEND!" *
//*********************************************************************
//ASMTEST1 EXEC PGM=DAY01PGM
//STEPLIB DD DSN=IBMUSER.ZOSPD.LOADLIB,DISP=SHR
//SYSUDUMP DD SYSOUT=*
Once you submit this job, and it executes, you will be notified it
is finished. Unlike the previous notification on the compile &
link, this notification will be slightly different. You will be
notified that the program abended with a S0C7 abend code.
Before
someone gets cute and blurts out the "obvious cause" of the abend, I
will ask that you not do this. The purpose of this educational
experience is to know how to dig to find answers.
The failing joblog should look something like this (download here via right-click->save as):
J E S 2 J O B L O G -- S Y S T E M S 0 W 1 -- N O D E
12.08.55 JOB02185 ---- THURSDAY, 30 APR 2009 ----
12.08.55 JOB02185 IRR010I USERID IBMUSER IS ASSIGNED TO THIS JOB.
12.08.55 JOB02185 $HASP375 DAY01RUN ESTIMATED CARDS EXCEEDED
12.08.56 JOB02185 ICH70001I IBMUSER LAST ACCESS AT 12:08:41 ON THURSDAY, APRIL
12.08.56 JOB02185 $HASP373 DAY01RUN STARTED - INIT 1 - CLASS A - SYS S0W1
12.08.56 JOB02185 +HELLO ABEND! (#1)
12.08.56 JOB02185 IEA995I SYMPTOM DUMP OUTPUT 601
601 SYSTEM COMPLETION CODE=0C7 REASON CODE=00000000
601 TIME=12.08.56 SEQ=00073 CPU=0000 ASID=002C
601 PSW AT TIME OF ERROR 078D0000 00007F7E ILC 6 INTC 07
601 ACTIVE LOAD MODULE ADDRESS=00007F48 OFFSET=00000036
601 NAME=DAY01PGM
601 DATA AT PSW 00007F78 - FA33C0AC C0B00700 A715000D
601 AR/GR 0: 96685D16/00011000 1: 00000000/650015DA
601 2: 00000000/00000040 3: 00000000/007D99D4
601 4: 00000000/007D99B0 5: 00000000/007FF350
601 6: 00000000/007CAFE0 7: 00000000/FD000000
601 8: 00000000/007FC018 9: 00000000/007D3CC8
601 A: 00000000/00000000 B: 00000000/007FF350
601 C: 00000000/00007F48 D: 00000000/00007FAC
601 E: 00000000/00006F60 F: 00000000/00000000
601 END OF SYMPTOM DUMP
12.08.56 JOB02185 IEF450I DAY01RUN ASMTEST1 - ABEND=S0C7 U0000 REASON=00000000
12.08.56 JOB02185 - --TIMINGS (MIN
12.08.56 JOB02185 -STEPNAME PROCSTEP RC EXCP CONN TCB SRB CLOCK
12.08.56 JOB02185 -ASMTEST1 *S0C7 88 78 .21 .00 .0
12.08.56 JOB02185 -DAY01RUN ENDED. NAME- TOTAL TCB CPU TIM
12.08.56 JOB02185 $HASP395 DAY01RUN ENDED
If your job completed without an abend, you probably cheated. If you
didn't receive the "HELLO ABEND! (#1)" message, you probably aren't
executing the right program. Once again, post a comment if you are
having issues. I will help you... Fail!
Keep the output from both
jobs that you created today. Review them and start becoming familiar
with what is there. We will come back to these next week when we start
te debugging process.
Conclusion
Congratulations
on completing Day 1 of the z/OS Problem Determination Series. Tomorrow
we will be focusing on doing almost the exact thing, except this time
with a COBOL program.
As always your feedback is welcomed
and appreciated. If you know of someone that is interested in this
material, it is not too late to have them join us!