Tracing alignment faults on OpenVMS IA64

The following was written for a COBOL development group to make them aware of alignment issues and how to correct them on OpenVMS IA64

Introduction

Alignment faults occur when references are made to data that is not naturally aligned. Data is naturally aligned when its address is an integral multiple of the size of the data in bytes. For example, a longword is naturally aligned at any address that is a multiple of four, and a quadword is naturally aligned at any address that is a multiple of eight. A structure is naturally aligned when all its members are naturally aligned.

Unaligned data on OpenVMS Alpha and IA64 systems can seriously impact performance, particularly on IA64. An HP representative has commented that resolving an Alpha alignment fault can involve approximately 100 instructions within the operating system (PALcode) used to catch and resolve it, while a similar alignment fault on an Itanium system could (at a guess) involve the equivalent of 10,000 to 15,000 instructions.

Developers designing or porting code to IA64 systems should be aware of alignment fault issues, and be able to detect and correct them.

Detecting Alignment Faults

There are a number of ways to detect alignment faults. If you have OpenVMS 8.3 installed, the quickest way to determine if an issue exists is to issue the MONITOR ALIGN command. This produces a display similar to the following. In this example, a number of user faults are evident, most likely occurring in user written applications.

                            OpenVMS Monitor Utility
                           ALIGNMENT FAULT STATISTICS
                                 on node XXXXXX
                            15-AUG-2008 12:57:36.04

                                       CUR        AVE        MIN        MAX


    Kernel Fault Rate                25.00      25.33      24.33      26.33
    Exec   Fault Rate                 0.00       0.00       0.00       0.00
    Super  Fault Rate                 0.00       0.00       0.00       0.00
    User   Fault Rate              1164.66    1137.66    1074.33    1243.33

    Total  Fault Rate              1189.66    1163.00    1098.66    1268.66

At a more granular level, alignment faults can be detected by enabling alignment fault reporting by calling a system service. An example of this implementation can be found on this site. If this source is saved, compiled, and linked, an image will be produced that allows alignment fault reporting to be enabled and disabled . For example:


$ cc sys_perm_align_fault
$ link sys_perm_align_fault
$ mcr []sys_perm_align_fault on
Alignment fault reporting now on
$ run image_with_suspected_alignment_faults.exe
 %SYSTEM-I-ALIGN, data alignment trap, virtual address=0000000000040009, function=00000001,
PC=0000000000030154, PS=0000001B
$ mcr []sys_perm_align_fault off
Alignment fault reporting now off

This indicates that the suspected image indeed encountered an alignment fault at PC (program counter) 30154 and virtual address 40009. You can use the techniques described in Reading a traceback dump to locate the failing instruction (and the reference to the data), or you can use the OpenVMS Debugger by issuing the SET BREAK/UNALIGNED_DATA command to determine where the issue is.

Correcting Alignment Faults

In a perfect world, there would be no alignment faults. Some alignment faults are easy to rectify, and some are nearly impossible. There are three general approaches to correcting alignment faults.

Let's look at these solutions individually.

Align the Data

Obviously, aligning the data is the best way to avoid alignment faults.

Sometimes it is not possible to align the data. An example of this would be writing a fixed format file that must be exported to another operating system.

The easiest way to align data is to ensure that data items with the largest size are defined first. For example, this COBOL fragment shows unaligned data:


DATA DIVISION.
WORKING-STORAGE SECTION.
01 SOME-STRUCTURE.
  03 WORD-EXAMPLE PIC S9(05) COMP.
  03 LONG-EXAMPLE PIC S9(09) COMP.
  03 STRG-EXAMPLE PIC X(7).
  03 QUAD-EXAMPLE PIC S9(15) COMP.

In this example, the only aligned data will be variable WORD-EXAMPLE, due to the fact that it is defined first. LONG-EXAMPLE will begin on a word boundary rather than a longword boundary, STRG-EXAMPLE will also start on a word boundary (although alignment faults are only of concern with atomic binary data), and QUAD-EXAMPLE will start on a “longword plus one” boundary. Each time LONG-EXAMPLE and QUAD-EXAMPLE are accessed, a potential alignment fault will occur. Compare the following rearrangement of the code.


DATA DIVISION.
WORKING-STORAGE SECTION.
01 SOME-STRUCTURE.
  03 QUAD-EXAMPLE PIC S9(15) COMP.
  03 LONG-EXAMPLE PIC S9(09) COMP.
  03 WORD-EXAMPLE PIC S9(05) COMP.
  03 STRG-EXAMPLE PIC X(7).

In this example, the same data is present, but due to the fact that the largest variables are defined first (remember, strings are just arrays of bytes, so don’t count), no alignment faults will occur.

As an alternative to hand aligning data structures, you can also achieve the same functionality as the compiler’s /ALIGNMENT qualifier for specific sections of code. To do this, you include a COBOL alignment directive.

COBOL alignment directives are in the form of a structured comment. To achieve the same result (at the cost of memory for padding) in our example above, we could code:


DATA DIVISION.
WORKING-STORAGE SECTION.
*DC SET ALIGNMENT
01 SOME-STRUCTURE.
  03 WORD-EXAMPLE PIC S9(05) COMP.
  03 LONG-EXAMPLE PIC S9(09) COMP.
  03 STRG-EXAMPLE PIC X(7).
  03 QUAD-EXAMPLE PIC S9(15) COMP.
*DC END-SET ALIGNMENT

See HELP COBOL DIRECTIVES ALIGNMENT for further information.

If possible, data files under the direct control of the development group should be rearranged in this fashion to eliminate potential alignment faults while performing I/O.

Hint to the compiler

This method allows you to indicate to the compiler that the data about to be referenced is or may be unaligned. This causes the compiler to emit instructions that aligns the data on the fly. These additional instructions, while overhead, are infinitely better than incurring an alignment fault.

Unfortunately, this method is unavailable in COBOL at present.

Copying the Data

This method is suitable for when you cannot align the data but frequent reference needs to be made to it. For example, if you have an external file that needs to be read once (into memory for a cache) and it is unaligned, it would make sense to read each unaligned record and then copy the individual data fields into an aligned cache record.

Summary

Alignment faults are very expensive on IA64 processors. An integral part of any porting project must address this issue if it exists. Developers designing new code for IA64 based systems must be aware of this issue, and be able to detect and correct it.