15-Aug-2011

Another crash

This time on development. A little back story: the system this occurred on is running on a blade system, and the management processor in the blade system had been acting flaky. HP decided to swap out the blade, but the engineer on site forgot to back up the NVRAM before he swapped out the old blade, and we ended up with an unbootable box. After much stuffing around, we managed to boot the box in a non-standard way. A day later, we saw this crash.



Crashdump Summary Information:
------------------------------
Crash Time:        12-AUG-2011 13:18:50.79
Bugcheck Type:     PGFIPLHI, Pagefault with IPL too high
Node:              xxxxxx  (Cluster)
CPU Type:          HP BL860c  (1.59GHz/9.0MB)
VMS Version:       V8.3-1H1
Current Process:   NULL
Current Image:     <not available>
Failing PC:        FFFFFFFF.800F9ED1    EXE$PAL_REMQUEQ_C+001C1
Failing PS:        00000000.00000608
Module:            SYSTEM_PRIMITIVES_MIN    (Link Date/Time:  2-JUL-2010 17:35:2
7.47)
Offset:            000C9ED1

Boot Time:         10-AUG-2011 15:26:14.00
System Uptime:               1 21:52:36.79
Crash/Primary CPU: 0./0.
System/CPU Type:   4020
Saved Processes:   394
Pagesize:          8 KByte (8192 bytes)
Physical Memory:   16383 MByte (134742016 PFNs, discontiguous memory)
Dumpfile Pagelets: 3497462 blocks
Dump Flags:        olddump,writecomp,errlogcomp
Dump Type:         compressed,selective,dosd,shared_mem
EXE$GL_FLAGS:      poolpging,init,bugdump,tbchk
Paging Files:      2 Pagefiles and 1 Swapfile installed

Stack Pointers:
KSP = FFFFFFFF.B34C57D0   ESP = FFFFFFFF.B281B000   SSP = FFFFFFFF.B280F000
USP = FFFFFFFF.B280F000

General Registers:
R0  = 00000000.00000000   GP  = FFFFFFFF.AD4D8600   R2  = FFFFFFFF.AD04CAE8
R3  = 80000000.00000006   R4  = FFFFFFFF.70A94000   R5  = 00000000.00000000
R6  = FFFFFFFF.B34C5800   R7  = 00000000.00000006   R8  = 00000000.00000207
R9  = 00000000.00000009   R10 = 00000000.00000001   R11 = 00000000.00000000
SP  = 00000000.00000000   TP  = 00000000.00000000   R14 = FFFFFFFF.FFFFFFFC
R15 = FFFFFFFF.AD1E7F88   R16 = FFFFFFFF.80000CE0   R17 = 00000000.0000035C
R18 = 00000000.00000002   R19 = 00000000.00000000   R20 = 00000000.00000000
R21 = 00000000.7FFF0278   R22 = 00000000.00000358   R23 = FFFFFFFF.B34C58C0
R24 = 00000000.00000600   AI  = 00000000.00000006   RA  = FFFFFFFF.B34C57E8
PV  = FFFFFFFF.B34C57E0   R28 = 00000000.00000000   FP  = FFFFFFFF.B34C57D0
R30 = FFFFFFFF.81977C10   R31 = 30000000.00000000

Pagefault Information:
Faulting Virtual Address        FFFFFFFF.70A94000
Memory Management Flags         00000000.00000000   Read Data Fault

Exception Frame:
Exception taken at IP FFFFFFFF.800F9ED0, slot 01 from Kernel mode
Trap Type   00000009 (Translation not valid fault)
IVT Offset  00000800 (Data TLB Fault)

Control Registers:
CR0   Default Control Register (DCR)         00000000.00007F00
CR16  Processor Status Register (IPSR)       00001210.08022030
CR17  Interrupt Status Register (ISR)        00000A04.00000000
CR19  Instruction Pointer (IIP)              FFFFFFFF.800F9ED0
CR20  Faulting Address (IFA)                 FFFFFFFF.70A94000
CR21  TLB Insertion Register (ITIR)          00000000.00000334
CR22  Instruction Previous Address (IIPA)    FFFFFFFF.800F9ED0
CR23  Function State (IFS)                   80000000.00000006
CR24  Instruction immediate (IIM)            00000000.00000000
CR25  VHPT Hash Address (IHA)                FFFFFFFF.7FFF2920

Application Registers:
AR16  Register Stack Config Reg (RSC)        00000000.00000003
AR17  Backing Store Pointer (BSP)            FFFFFFFF.70A88328
AR18  Backing Store for Mem Store (BSPSTORE) FFFFFFFF.70A88168
AR19  RSE NaT Collection Register (RNAT)     00000000.00000000
AR32  Compare/Exchange Comp Value Reg (CCV)  FFFFFFFF.00000000
AR36  User NaT Collection Register (UNAT)    00000000.00000000
AR64  Previous Function State (PFS)          00000000.00000C9F
AR65  Loop Count Register (LC)               00000000.00000000
AR66  Epilog Count Register (EC)             00000000.00000000

Processor Status Register (IPSR):
AC = 0   MFL= 1   MFH= 1   IC = 1   I  = 0   DT = 1
DFL= 0   DFH= 0   RT = 1   CPL= 0   IT = 1   MC = 0   RI = 1
Interrupt Status Register (ISR):
Code 00000000     X  = 0   W  = 0   R  = 1   NA = 0   SP = 0
RS = 0   IR = 0   NI = 0   SO = 0   EI = 1   ED = 1

Branch Registers:
B0        FFFFFFFF.819BAF90
B1        00000000.00000000
B2        00000000.00000000
B3        00000000.00000000
B4        00000000.00000000
B5        00000000.00000000
B6        FFFFFFFF.800F9D20
B7        FFFFFFFF.81977C10

Floating Point Registers:          FPSR      0009804C.8A70033F
F6        00000000.0001003E.00000000.00016A76
F7        00000000.0001003E.00000000.00000407
F8        00000000.0001003E.00000000.0000005A
F9        00000000.0001003E.0000E7F9.999A6494
F10       00000000.0001003E.00000000.00016A76
F11       00000000.0001003E.00000000.A3D70A3E

Miscellaneous Registers:
Interrupt Priority Level (IPL)                        00000006
Stack Align                                           000002D0
NaT Mask                                                  0000
PPrev Mode                                                  00
Previous Stack                                              00
Interrupt Depth                                             03
Preds                                        40000000.0001F059
Nats                                         00000000.00000000
Context                                      40000000.0001F20B

General Registers:
R0   00000000.00000000     GP   FFFFFFFF.AD3E5C00     R2   00000000.0000023C
R3   FFFFFFFF.88C75574     R4   00000000.7FF43B20     R5   00000000.7FF43B40
R6   0009804C.0270033F     R7   00000000.0000003E     R8   00000000.00000006
R9   FFFFFFFF.88C75570     R10  00000000.00000001     R11  00000000.00000001
SP   FFFFFFFF.B34C5AD0     TP   00000000.00000000     R14  00000000.00000006
R15  FFFFFFFF.AD735790     R16  FFFFF804.09C00A00     R17  00000000.00000000
R18  00000000.00000000     R19  FFFFFFFF.800F9D10     R20  FFFFFFFF.70A94000
R21  00000000.00000000     R22  FFFFFFFF.00000000     R23  00000000.00000000
R24  00000000.00000000     R25  00000000.00000001     R26  00000000.00000000
R27  00000000.00000003     R28  FFFFFFFF.893A0074     R29  FFFFFFFF.B34C5AD0
R30  FFFFFFFF.89645C30     R31  00000000.00000000


System Registers:
Page Table Base Register (PTBR)                           00000000.00000000
Processor Base Register (PRBR)                            FFFFFFFF.88050000
Privileged Context Block Base (PCBB)                      FFFFFFFF.88050080
System Control Block Base (SCBB)                          6D6D6D6D.6D6D6D6D
Software Interrupt Summary Register (SISR)                00000000.00000000
Address Space Number (ASN)                                00000000.00000000
AST Summary / AST Enable (ASTSR_ASTEN)                    00000000.00000000
Floating-Point Enable (FEN)                               00000000.00000001
Interrupt Priority Level (IPL)                            00000000.00000006
Machine Check Error Summary (MCES)                        00000000.00000000
Virtual Page Table Base Register (VPTB)                   00000000.00000000

Failing Instruction:
EXE$PAL_REMQUEQ_C+001C1:              ld8         r30 = [r20], 008

Instruction Stream (last 20 instructions):
EXE$PAL_REMQUEQ_C+00170:              nop.m       000000
EXE$PAL_REMQUEQ_C+00171:              cmp.eq      p0, p6 = r14, r0
EXE$PAL_REMQUEQ_C+00172:         (p6) br.cond.spnt.few 1FFF400
EXE$PAL_REMQUEQ_C+00180:              tak         r22 = r30 ;;
EXE$PAL_REMQUEQ_C+00181:              nop.m       000000
EXE$PAL_REMQUEQ_C+00182:              cmp.eq      p6, p0 = 01, r22 ;;
EXE$PAL_REMQUEQ_C+00190:              nop.m       000000
EXE$PAL_REMQUEQ_C+00191:         (p6) mov         r17 = r30
EXE$PAL_REMQUEQ_C+00192:         (p6) br.cond.spnt.few 00000D0
EXE$PAL_REMQUEQ_C+001A0:              probe.w     r22 = r30, r31 ;;
EXE$PAL_REMQUEQ_C+001A1:              nop.m       000000
EXE$PAL_REMQUEQ_C+001A2:              cmp.eq      p6, p0 = r22, r0 ;;
EXE$PAL_REMQUEQ_C+001B0:         (p6) mov         r17 = r30
EXE$PAL_REMQUEQ_C+001B1:         (p6) br.cond.spnt.few 1FFF330
EXE$PAL_REMQUEQ_C+001B2:              br.few      0000030
EXE$PAL_REMQUEQ_C+001C0:              rsm         004000
EXE$PAL_REMQUEQ_C+001C1:              ld8         r30 = [r20], 008
EXE$PAL_REMQUEQ_C+001C2:              nop.i       000000 ;;
EXE$PAL_REMQUEQ_C+001D0:              ld8         r10 = [r20], 1F8
EXE$PAL_REMQUEQ_C+001D1:              nop.f       000000
EXE$PAL_REMQUEQ_C+001D2:              nop.i       000000
EXE$PAL_REMQUEQ_C+001E0:              add         r14 = 0008, r30 ;;
EXE$PAL_REMQUEQ_C+001E1:              st8         [r14] = r10
EXE$PAL_REMQUEQ_C+001E2:              nop.i       000000
EXE$PAL_REMQUEQ_C+001F0:              st8         [r10] = r30


Posted at August 15, 2011 5:59 PM
Comments

This crash footprint is known to HP. The crash happens when the Network File System (NFS) accesses a corrupt TCPIP queue header. The problem is resolved by installing HP-I64VMS-TCPIP-V0506-9ECO5-1.

Posted by: Brodders at August 16, 2011 7:01 PM

Thanks John,

A patching we will go.

Posted by: Jim Duff at August 16, 2011 9:37 PM

Comments are closed