Posted by Ralph Johnson on Tue, Jun 09, 2009 @ 10:00 AM
Overview
As new technology makes its way to the mainframe, ISVs and corporate IT managers face new challenges when it comes to securing mainframe-based data. Information security threats are increasing from both internal and external sources.
Internally, unauthorized employee access to customer data is increasing due to increased economic pressures. Unsecured corporate data is just too tempting when exposed to those wanting to "turn the easy buck". I always think it is ridiculous when my phone company representative asks if they can access my records. Then, I realize what information they have and that this process is in place to prevent fraudulent internal activity. One representitive went as far as apologizing for asking me for my account number several times recently, as they were not allowed to write it down because someone could dig through the trash and find it.
Externally, offsite data has a knack for falling into the wrong hands. Documents must be shredded before they can be placed into the trash. Hackers have better tools for listening to public internet traffic. I sat at a small conference recently with a sniffer running on my laptop, and picked up 4 or 5 e-mail passwords. For many, a single password is used for banking, investments, and even their e-mail accounts.Think of this the next time you logon for e-mail from your laptop, or iPhone while on the road or your favorite wireless hotspot.
Physical security and network security measures will need to increase as technology evolves. Here are a few areas that need to be addressed in many organizations...
File Security
For years, automatic dataset protection (ADSP) was not available in RACF. Products such as ACF2 have used this as a default (and competitive advantage claim) since product inception. Today, I still find shops where ADSP is not enabled in RCAF. Therefore, any new datasets that are created are not automatically protected as a default. For ISVs, source code and product libraries are potentially available to unauthorized employees or contractors. Although the penalty is stiffer these days, there have been instances of commercial ISVs having their entire source code library posted on a bulletin board. For corporate IT, how much of your customer data is available to programmers & consultants in production datasets, not to mention the lesser controlled test files. Is there a value that can be placed on your source code? Your customer information? Can you ignore the risk and exposure?
Naming standards go a long ways toward making sure sensitive data is secured properly. Security administrators can easily control who can read, write, and control datasets when good naming practices are in use. Good naming standards are also necessary for good catalog management and system managed storage (SMS).
Grouping users into groups eases the nightmare of administering external security software such as RACF, ACF2, and Top-Secret. Once a group is given a proper level of access, members of that group inherit those security rights.
If ADSP is enabled, good naming standards are in place, and an adequate level of grouping users into controlled groups is in place, this forms a minimal level of security to help protect your organization. As every shop is unique in their security requirements, additional controls may be needed.
Client / Server Security
As we tend to have more and more client-server applications deployed on the mainframe, the issue of server security comes into play. Who should have access to the server? Is the application written in a manner that excludes unauthorized access? Many server applications interrogate the incoming IP address to authenticate valid server activity. Others provide a login process to validate the user. Some go as far as doing a RACROUTE check with RACF to validate the "current" status (revoked or not) & password of the incoming connections. This year alone, I have had to add code to several products to address sniffer and denial of service (DoS) attacks on mainframe servers.
Another good practice is to log incoming connections to a server. First, this aids in any short-term diagnosis activities when problems occur. Second, if there is a compromise of data good records help you determine the level of the exposure.
Lastly, unless you have a controlled network where there is no way a person can sniff for packets, client-server communications should be encrypted from the "wandering eye". Secure Socket Layer (SSL) is a quick remedy to this problem. Depending on the information, especially customer data, a strong "standardized" encryption algorithm should be used. ICSF provides a variety of methods to accomplish this process.
Securing Customer Information
Whether in a file, or on the network, encryption of data must increase in the future. Internal and external identity theft is the biggest catalyst for this movement. Regulatory compliance will become mandatory, especially where customer data is present, even for smaller businesses.
We have developed generic encryption routines for several clients in the past few years. One example was to encrypt all credit card information during a DB2 database load. My belief is that AES with 128+ bit compression is the best way to go. I also believe that if you have an ICSF capable processor, building a generic encryption & decryption routine is the way to go when encrypting fields within a file, or an entire packet of network data.
Lastly, secure the usage of your encryption & decryption routine. In ICSF, only the security administrator will know the encryption key offering the maximum in peace of mind. If exchanging information with outside entities, ICSF allows public and private keys that will allow you to share your key with outside parties for the purposes of encrypting your data.
Conclusion
As we innovate on the mainframe, our world is also going to get more complicated. As we design new systems we need to take an extended look at data security. Plan for the worst case scenario, it will cost you far less in the long run.
As always, I encourage your questions and feedback! Please share your insight and experiences, as I know I have only touched the tip of the iceberg here...
Ralph Johnson is the owner and founder of Techsys Software Services LLC, a Dallas based mainframe software consulting firm that specializes in system programming and system software development. Techsys provides development & technical support for over a dozen commercial mainframe software applications.
(ifhxm8tz2j)
Posted by Ralph Johnson on Wed, May 27, 2009 @ 08:00 AM
Several people contacted me offline over the past 3 weeks and asked me for recommendations for tools and products as related to the z/OS Problem Determination process.
Since I agreed I would not promote products in this series, I'm just going to throw the question out for discussion to the group.
For the next few days, I wanted to get some feedback on:
- What tool(s)?
- Easy to use?
- Are there GOTCHAs?
- Would you recommend?
- Is the value really there?
Today, let's concentrate on debugging tools you use during the development process. Submit your favorite (or not so favorite) debugging tools in the comments section below. Address as many of the questions listed above as possible.
I'll be working on my list!
Posted by Ralph Johnson on Fri, May 22, 2009 @ 11:00 AM
Please use the E-mail, StumbleUpon, Facebook, LinkedIn, Twitter, etc. buttons above to tell your friends about this series. If you got value from the series, let someone else know. I really do appreciate it!
Just a few final words as related to abends that you may encounter at some point in the future. If you can think of others that should be included, that have not been previously mentioned, please send a comment to this post. I can always append the information to the post so that others have access to the information.
The first abend for discussion today is the S001 abend. This abend is associated with a variety of file oriented issues. A few of the conditions that lead to this abend are:
- Wrong length record spedificed in DCB, FD, or JCL
- Reading a dataset that has been allocated, but never written to (you are getting the leftovers from whatever was on the disk at that location)
- Writing to an input file
- Concatenation of files with unlike LRECL and/or RECFM
- Reading after EOF has been reached
Always look for additional error messages in the system log when receiving this abend. The error messages will identify the DD name involved and get you pointed in the right direction.
The next abend that you may see as an application programmer, and almost certainly if a program writing system programmer, is the S047 abend code. This indicates that the program you are attempting to execute contains an authorized function and one of the following conditions is true:
- Program was not linked with AC=1
- Program is not being executed from an APF authorized library. Remember, if you have a concatenation - all libraries in the concatenation must be authorized! Also, the authorization of a dataset is volume specific - make sure the "authorized dataset" is on the "authorized volume".
Lastly, the S913 abend is issued when you are not authorized by RACF (or equivalent external security manager) for a dataset, resource, or program. This message is accompanied by the ICH408I message that identifies the resource that you do not have access to. Contact your security administrator to rectify this situation.
Conclusion
As this is the last day of the series, I want to thank all that participated! I want to thank all that have contacted me online and offline for details, clarifications, and issues with the posts. A total of about 250 people participated in my first adventure into the world of blogging. I hope you have learned as much as I did.
This 3-week blog series will become a onsite educational offering from Techsys in early July. If you know someone that would benefit from this information in a classroom environment, please have them contact us.
The next series will begin June 1st. I'm still taking ideas on what the June subject will be. Please vote by clicking here. I do know that I will probably back off to 2-3 posts per week, as a daily post gets in the way of real work. I have also seen an increase in the Ask The Expert posts that take some additional time to respond to.
Have a great Memorial Day weekend! Take a few moments to reflect on why we have this 3-day weekend and those who have fallen for our Great Land over the past 200+ years. Until we meet again...
Posted by Ralph Johnson on Thu, May 21, 2009 @ 12:00 PM
Today's discussion is related to the x22 abend codes. Up to this point, we have dealt with exception conditions that were cuased by the programmer or the person running the programmer. The x22 abends are typically caused by an outside force.
The x22 abend series is not one that seems to correlate with a single SVC that initiates these abends. If someone can make sense of how SVC 22 (MGCR/QEDIT) is related to any of these, please educate me!
The first two x22 abends that we want to discuss are S122 and S222. These abends are both initiated as a result of the master console operator, or someone via SDSF, cancelling your job. The S222 abend is a straight operator cancel command (C JOBNAME), while the S122 abend is a cancel with a DUMP command (C JOBNAME, DUMP). It is very important to remember that if you cancel your job using the first method, you will NOT receive a SYSUDUMP. You must use the second method, to get the S122 abend and the corresponding SYSUDUMP.
The S322 abend is issued when the time limit for your job has been exceeded. The system programmer at your shop sets artificial limits by JOBCLASS and other criteria in the JES paramters and SMF exit on the maximum CPU time allowed by your program. Therefore, we have to choose a JOBCLASS to correspond with our anticipated CPU usage. If we exceed this value, due to a program error that results in a loop, our job is cancelled with the S322 abend. This is done to preserve the integrity of the other users, jobs, and workload running on z/OS - not to punish you. Some shops let you override this value in your JCL, allowing you to
alter the value without making a change to the system configuration.
The S522 abend is caused when your job has been waiting longer than a preset time. In most systems this is 15 minutes, but I have seen it as high as an hour. The most common occurrence of the S522 abend it a TSO timeout. If you don't hit enter in TSO within 15 minutes, you are logged off automatically to free up resources. Another common, although I haven't used a real tape drive in quite some time, is the MOUNT for your tape on a tape drive by the tape operator. If the tape is not mounted within 15 minutes, your job is cancelled.
The S722 abend is generated when the output limit is exceeded by your job. For instance, if you are generating a report and your program starts looping you will ultimately receive this abend if limits are in place. Some shops let you override this value in your JCL, allowing you to alter the value without making a change to the system configuration.
The S822 abend occurs when the REGION requested by your job is not available to the JES initiator. If the private area size (where your job runs) is 8-10MB below the line, and your job requests 12MB REGION, you will receive this error. The typical solution is to make sure your job can run above the line (where you have close to 2GB of REGION available. The other alternative is to lowere your REGION request to the 8-10MB range that is avalable on your system.
A SA22 abend occurs when the master console operator or system programmer has to FORCE your job out of the system. Most likely a CANCEL command was issued and the operating system couldn't get rid of your job. There is usually a system problem when an application program cannot be removed via the CANCEL command. Other possibilities include an operator reply that wasn't responded to, or a system software problem. It is usually good to get a standalone dump (dump the entire operating system) when this occurs, and have IBM review the dump.
The remaining x22 abends are rare and there is not a high liklihood that you will ever see any of these. If you know of one that wasn't mentioned, let me know. Always, you can go to IBM's LookAt web page to get further information.
Tomorrow, we will wrap up the z/OS Problem Determination (Introduction) Series. There are still a few common abends that didn't fit into any of the groups we discussed this week.
Posted by Ralph Johnson on Wed, May 20, 2009 @ 09:57 AM
One of the biggest changes in applications development in my 30 years in the mainframe business has been the drop in memory costs. Without this drop, there would be no C++, JAVA, or TCP/IP on the mainframe (or any other platform???). Memory, among other things, helps provide the infrastructure that we take for granted these days.
Today, medium complexity applications regularly use 1-2MB when written in C++ and JAVA. I have seen vendor applications that use upwards of 500MB in order to get acceptable performance. In contrast, a heavily loaded CICS with 500+ users ran on 8-10MB of storage 20 years ago. Perhaps a bit of a rant, but as long as memory is cheap and we are not running short of it, my desire to minimize storage utilization is a low priority issue.
Today's topic has to do with x78 abends. As we learned yesterday, abends are sometimes grouped by the SVC that generates the error. SVC 78, GETMAIN & FREEMAIN, is the source of all x78 abends.
First, a bit of a discussion on GETMAIN processing. The parameters required to complete a GETMAIN request are:
- Type of storage to allocate
- Where to allocate it
- Length of storage
- Where to return the storage address
For most applications, the type if storage is private storage either below or above the 16MB line. Typically, there is 8-10 MB of storage available below the 16MB line for application usage. The remainder of the storage below the 16MB line is reserved for the operating system. Above the 16MB line, the vast majority of the address range from 16MB to 2GB (31-bit addressing) is available for application usage.
Because the integrity of the z/OS operating system environment depends on it, there are a variety of system exits & parameters to control how much storage a single user can allocate. Typically, when you get an x78 abend while trying to GETMAIN memory, these artificial limitations are what is stopping you.
The parameters required to perform a FREEMAIN request are as follows:
- Type of storage
- Address of the storage
- Length of the storage
As a rule of thumb, I always try to maintain a 1:1 ratio between by GETMAIN & FREEMAIN requests. In other words, partial FREEMAINs are not a good idea, unless you have a really good memory management process in place. IBM may prevent you from doing this anyway, since the operating system has to maintain awareness of who owns all storage and what storage is free at any point in time.
Now that we have a better idea of how storage is obtained and freed, let's look at the errors that may occur during this process. The x78 abends that can occur during this process are:
- S078 - The master catalog cannot be opened.
- S178 - Unable to satisfy system subpool GETMAIN request.
- S278 - There is no central storage availble to process a LSQA request. Not one that application programmers would ever see.
- S378 - An error occured during FREEMAIN processing. Double-check all values on the FREEMAIN to insure they match the associated GETMAIN request.
- S478 - You are attempting to FREEMAIN storage from a subpool that in ineligible for FREEMAIN
- S778 - A machine error has occurred. I would double-check everything, then contact your systems programmer. I have never seen this one.
- S878 - Hands down the most common x78 abend! Increasing region size is the most common remedy. I personally try to go to REGION=0M and run (virtually) unlimited. I say virtually since JES parameters, SMF exits, and z/OS exits can be in place to "limit" the amount of storage available. If you think you are being restricted, and need more memory, talk with your systems programmer to find out what your limits are.
- S978 - The storage you are attempting to FREEMAIN is not on a doubleword (8 byte) boundary. Double-check the address of the storage being freed.
- SA78 - Nine times out of ten,in my experience, the length of the storage being freed is incorrect and there is an overlap with freed storage. The system trace is sometimes useful in this case to make sure the GETMAIN & FREEMAIN lengths are identical.
- SB78 - The majority of the errors for this abend are related to invalid subpools, trying to free storage from a system subpool, or lack of authorization to free storage. See the corresponding return code in the MVS Systems Codes doc for more information.
- SC78 - A CPOOL get/free request failed (it is not likely that you will observe this in an application program)
- SD78 - A FREEMAIN request for LSQA storage is not owned by your task (you are attempting to free another tasks storage)
While this may be a good quick reference, the MVS Systems Codes (via LookAt) manual is the place to go when you receive any sort of x78 abend.
Tomorrow we will look at the final series of abends - the x22 series.
Posted by Ralph Johnson on Wed, May 20, 2009 @ 08:00 AM
As May draws to a close, I wanted to open it up to the subscribers to chose the subject for June's blogging activities. The only limitation is that it has to be mainframe oriented, and should be of value to the majority of the audience. I will evaluate the feeback and decide on the topic early next week. Just post your suggestion in the comments below.
Beginning in July, I'm planning a series of introductory blogs related to z/OS, TSO/ISPF/SDSF, JCL, Utilities, VSAM, and CICS. I'll keep you informed as information becomes available.
Posted by Ralph Johnson on Tue, May 19, 2009 @ 10:55 AM
Today we look at the most common of the x37 type abends. All are related to EOV (end of volume) processing and can typically be avoided by proper dataset allocation.
Did you know?Before we get into proper dataset allocation techniques, note that the "37" in all the x37 abends is related to EOV processing. This also corresponds to the EOV SVC being x'37'. Sometimes knowing the SVC function when looking at an abend code can help determine the function that was being performed when the failure occured. For instance, x78 abends all have to do with GETMAIN/FREEMAIN (SVC 78) as we will focus on tomorrow.
A word on dataset allocation!OK, back to dataset allocation problems... z/OS allocates non-VSAM datasets much differently than files are allocated on other platforms. Some will argue this is better, others say it is much worse. Once you figure it out, the only thing that matters - it works.
Each time we allocate a dataset, we should consider the following:
* How many records will be placed in the dataset?
* What is the record length?
* What kind of growth is anticipated in the dataset?
Once this information is gathered, we can determine how to allocate the dataset. For an example, I want to build a dataset with 80 byte records, with 50,000 records, and a 20% annual growth.
First we need to decide how many tracks or cylinders that we need to hold the dataset. A 3390 model 3 track holds 56,664 bytes. I personally always use 56,000 as a maximum because I can remember this value. This means that 700 records could fit on each track. Since we cannot place partial records on tracks, always round this down to the next lowest whole even number. Based on 700 records per track, it appears it will take 71.42 tracks to hold our anticipated 50,000 records (50K / 700 = 71.42). We want to round this number up to 72 tracks.
Anytime the number of tracks is greater than 15, which is the number of tracks per cylinder, I typically try to convert my allocations to cylinders. In this case, I would want 5 (72 / 15 = 4.8, then rounded up) cylinders to hold my primary extent.
Based upon my anticipated growth of 20% annually, I would allocate the the secondary extents at 1 cylinder each. z/OS will take up to 16 extents to extend a non-VSAM dataset. Once z/OS has exhausted the 16 extents, you will receive a x37 abend.
Another thing to remember is that z/OS will take up to 4 extents to satisfy the primary allocation. This can happen when disk volumes become highly fragmented, and there is not enough contiguous space to satisfy the primary allocation request. Therefore, highly fragmented disks can and will cause premature x37 abends as a result of this "feature".
So, to avoid problems I recommend:
* Always attempt to fit the file in one extent.
* Occaisionally review extent counts if you don't have a full-time storage administrator
* Re-allocate, or modify the JCL for new allocations, once datasets get over 8-10 extents
I know all this is a hassle, but it has serious performance & recovery implications.
Common x37 AbendsThe most common x37 abends are as follows:
- B37 - The error was detected by the end-of-volume routine. This system completion code is accompanied by message IEC030I. When reviewing the IEC030I message, the most common problem is the Rc=4 condition. In most cases this is as a result of the "The data set already had 16 extents, but required more space" condition. The solution is to re-calculate the values for the primary & secondary extents, then re-submit the job.
- D37 - The error occurred when an output operation to a direct access device was requested. This system completion code is accompanied by message IEC031I. After reviewing the IEC031I message is that the dataset was allocated with no secondary extent. In this case, reallocate the dataset with a larger primary extent, or add secondary extents. Keep in mind that some applications cannot handle secondary extents, but these are rare.
- E37 - The error occurred when an output operation was requested. The data set was on a direct access or magnetic tape device. This system completion code is accompanied by message IEC032I. The RC=4 is the most common here also. Typically, you have either hit the 16 extent limit or the volume was full and could not create another extent on the volume.
While it seems to be a lot of work to keep the primary and secondary extent information optimized, the time to fix x37 abends and recover your batch processing is far greater. There are tools to make this much easier. There are numerous tools to report file extent information before problems occur. There is also system software to intercept the x37 abends and correct them automatically, allowing batch processing to continue without the need for a recovery scenario.
Tomorrow, we will look into the X78 GETMAIN/FREEMAIN related failures...
Posted by Ralph Johnson on Mon, May 18, 2009 @ 11:00 AM
While our previous efforts in this series have been focused on S0C7 abends, today we look into to other program interruption abends.
The group of abend codes ranging from S0C1 to S0CF occur as a result of an exception, and there is no recovery routine in your program to handle that type of abend. For instance, you can put a ESTAE routine in place to handle data exception conditions such as the S0C7 (data exception) conditions. This was quite common in the early mainframe days, when data entry (keypunch) errors would cause an abend. Recovery routines were put in place to fix the data, if possible, so that processing could continue. This was primarilly utilized by assembler language programs.
The abend codes associated with program interruptions are:
- S0C1 - Operation exception. This is caused by executing an instruction that is an invalid op-code. For instance, in an assembler language program, you can add the following code to force a S0C1 abend:
ABEND0C7 DC X'0000' Define 2 null bytes to force an abend
When your program executes the op-code of hex 00, the abend will occur. An easy way to force an abend during program testing...
- S0C2 - Privileged-operation exception. The reason code is 2.
-
- S0C3 - Execute exception. The reason code is 3.
-
- S0C4 - This is another common program interruction. Although the IBM documentation describes 8 different conditions where the S0C4 abend will occur, the two most common conditions are:
- Your program tried to modify a storage area where the storage protection key did not match the storage protect key in your PSW. This condition has a return code value of 4. Applications typically run with a storage protection key of 8. The operating system's storage has a storage protection key of 0. So, an application program running in key 8, cannot modify operating system storage and will result in a S0C4 abend. This is how the operating system protects itself from application programs (and programmers). Application programs can modify operating system storage if they can get into supervisor mode (via SETMODE), but this must be accomplished via their program being placed in an authorized library & being linked with AC=1.
- The other common S0C4 abend involves trying to access a virtual storage address that doesn't exist. A common problem is a non-initilized field and you try to access a storage addresss of x'40404040'. Since x'40' is a space in EBCDIC, the address field either was never initialized or spaces was accidentally to this field. The return code of 11 is returned for this condition.
S0C5 - Addressing exception. The reason code is 5.
- S0C6 - Specification exception. The reason code is 6.
-
- S0C7 - Data exception. This condition occurs most often when one, or both operands, of a decimal operation do not contain valid packed decimal data. If the final byte of the field does not contain xF for unsigned fields or xC/xD for signed fields, a S0C7 abend will occur. See last weeks examples for additional information.
-
- S0C8 - Fixed-point-overflow exception. The reason code is 8.
-
- S0C9 - Fixed-point-divide exception. This condition occurs when you attempt to divide a number by 0. This condition happens not only with decimal operation instructions, but also in register-to-register instructions as well as storage-to-storage instructions on binary fields.
-
- S0CA - Decimal-overflow exception. The reason code is A.
- S0CB - Decimal-divide exception. The reason code is B.
- S0CC - Exponent-overflow exception. The reason code is C.
- S0CD - Exponent-underflow exception. The reason code is D.
- S0CE - Significance exception. The reason code is E.
- S0CF - Floating-point-divide exception. The reason code is F.
As you can see, only a few of the abends require additional information as they are the most common. For the "not so common" abends I have included the IBM explanation for the abend strictly to illustrate the possibility of other S0Cx abends. I seriously doubt you will generate any of these during the course of a normal programming assignment. If you happen to generate one of these, on purpose or accidently, please forward the information to me so that I can include it here!
Tomorrow, we will focus on the Sx37 abends that are a result of improperly allocated datasets.
Posted by Ralph Johnson on Fri, May 15, 2009 @ 12:00 PM
Thus far, we have focused entirely on S0C7 abends in VERY simple programs written in ALC, COBOL, and C. In these abend situations, we have been 100% focused on the state of our application program at the time of the abend.
Today, I will touch on a bit more advanced subject that may be of value to you at some point in the future. One of the things we have been omitting from the SYSUDUMP up to this point is the System Trace. It is near the end of the SYSUDUMP, and looks like this:
--------------------------------------------------- SYSTEM TRACE TABLE ----------------------------
PR ASID WU-ADDR- IDENT CD/D PSW----- ADDRESS- UNIQUE-1 UNIQUE-2 UNIQUE-3 PSACLHS- PSALOCAL UNIQUE-4 UNIQUE-5 UNIQUE-6
00 002C 007FF350 SVCR B 070C0000 8323FF90 00000000 00FC3EB0 00000004
00 002C 007FF350 SVC 5F 070C1000 83240846 00000000 00000008 7FF6FCB8
The system trace displays information regarding the PSW, registers, and timestamps for every SVC (supervisor call) and dispatch event issued by your program and the operating system.
Knowing where you last were, in the event you abend with a S0C4/S0C1 and your PSW is all zeroes (you branched outside your program) is especially useful.
To find the point in the System Trace where your program failed, I search/scan for the "RCVY PROG" entry. This will always be near the end of the System Trace and looks like this:
00 002C 007FF048 *RCVY PROG 940C7000 00000000 00000000 00000000 00000000 002C 002C C41D9090EFE8AEF2
Since this test was run on a lightly loaded system and it was a simple program, you can follow the entire execution from where the TCB was attached to program failure. This rarely happens on a medium to heavy loaded system!
A full copy of the System Trace for today's discussion is located here.
I find the following columns useful in simple diagnosis situations:
- WU-ADDR- is the TCB address for the entry
- IDENT lets you know what is happening
- CD/D will further identify what is going on (i.e the SVC #, ...)
- PSW----- ADDRESS- is the full 8-byte PSW at the time of the entry
- UNIQUE-x are the register values starting at R14 at the time of the entry
- TIMESTAMP-RECORD is the time of each entry in the trace table
To follow the flow of the failing program get the TCB address from column 3 of the RCVY PROG entry, which is 007FF048 in this case. From the top of the System Trace search for this address. If you find the "SVCR 2A" (ATTACH SVC) where this this TCB was created, you have everything from the time the TCB was created (attached) until program failure.
In our example, we see the following:
- TCB was dispatched at C41D9090EF25D808
- PGM asked for x'650' bytes of storage (SVC 78) at C41D9090EF28A508 and returned at C41D9090EF28E130 with storage at address 7F6479B0
- PGM asked for x'60' bytes of storage (SVC 78) at C41D9090EF28F148
- Issued SYNCH (SVC 12) at C41D9090EF292C74
- and many other others, before...
- Received PGM 007 at C41D9090EFE5F712
While I could explain each and every entry for this TCB, this is just not typically necessary. Knowing what your program has executed just prior to a program failure is sometimes key to understanding program failures.
Next week, we will talk about different types of abends, and how to avoid them.
Posted by Ralph Johnson on Thu, May 14, 2009 @ 10:00 AM
Today we take a look at the failing C program from last week, but we are going to start with just a Language Environment dump and see if that is sufficient to diagnose the problem.
Up front I must state my bias and typically find that the Language Environment does not dump enough information to meet my needs. Perhaps its the code I'm working with, or the various environments this code is running in. Hopefully, you will have better luck than I have had with this in the past. Same with Abend-Aid.
The files we will be working with today include:
The first thing we notice on today's S0C7 abend is that there is no Symptom Dump in the JOBLOG. But, no worries as all the same information is contained in the Language Environment dump output.
One interesting thing that I really like about Language Environment dumps is shown on the first page of the LE dump. LE breaks down the calling sequence in the Traceback area. The routine at the top of the chain is the routine that produces the LE dump. The second line indicates that our main, or failing, program was at offset +F6 when it entered CEEHDSP. This chain of calls is especially useful when you are 10 calls deep & your program fails. Notice the area near each register address, PSW address, and the information typically found in the Symptom Dump are displayed for each DSA.
First, we can see that the "main" routine experienced the exception condition based on the information found in the traceback area.
Next, we need to find the area for this DSA. In this case the Condition Information for the DSA address we are looking for is 18F18208 (DSA 2) with an entry address of 18F004C0. Typically this is just below the Traceback info.
Here we find the following information:
- PSW = 18F005BC
- ILC = 6
- INTC = 7
- OFFSET = F6
LE automatically adjusts the OFFSET to the real failing instruction offset. The failing instruction is at 18F005B6 (PSW 18F005BC minus ILC 6).
Just below the Condition Information, we find the Storage Area near condition. This is the area near where the program failed. In this case, the storage at 18F005B6 is FA 22 D0 CA 40 00. By now, you have already noticed this is our dreaded Add Decimal (AP) instruction.
Operand 1 is located at +CA from where register 13 is pointing. In this case operand 1 is located at 18F182D2 (18F18208 + CA). The length of the operand in 3 bytes.Unfortunately, LE only dump +20 bytes before, and +40 bytes after each register (my biggest objection to LE!) In some cases, you are stuck at this point. One method I have used to overcome this is scanning through all register storage looking for the storage that is not included in the R13 information. In this case, we find 18F182D2 is NOT included elsewhere. This operand becomes an open issue!
Operand 2 is located at +00 from where register 4 is poinitng. Operand 2 is located at 18F006A0 and is also 3 bytes in length. The value located at 18F00A0 is 00 00 00. This is not a valid packed decimal field, so this is part of the problem.
Looking at the program listing, we find that there are no instructions at +F6 in the compile listing. The failing instruction FA 22 D0 CA 40 00 can be found at +186. The reason for this discrepancy is that the program base starts at +90. So if +90 is added to the +F6 offset we get +186. Now the program listing agrees with the LE dump.
Review of our program source, shows that counter2 was initially set to a value of 2, but was set to NULL by the following instruction:
memset(&counter2,0,sizeof(counter2));
By removing this instruction, re-compiling, and re-running the failing program this problem is resolved.
Tomorrow we will address some other common abends, and some best practices on how to resolve them.