05-Dec-2013

CPUSPINWAIT crash

A couple of weeks ago, we experienced a CPUSPINWAIT crash. Initial investigation indicates that the crash occurred in a call to SYS$ICC_ACCEPT() while waiting to get spinlock IOLOCK8.

Because of the call to Intra-Cluster Communication services, HP initially recommended applying patch VMS84I_IPC-V0200. But after I pointed out that the release notes for that patch mention "waiting for the SCHED spinlock" and not the IOLOCK8 spinlock, I insisted that further crash analysis be performed.

At the moment, it appears that the crash is a close relation of the one solved by the above patch, but is certainly not the same crash.

Engineering is presently investigating.

Here's the crash footprint:

Crash Time:        15-NOV-2013 16:31:10.90
Bugcheck Type:     CPUSPINWAIT, CPU spinwait timer expired
Node:              xxxxxx  (Cluster)
CPU Type:          HP BL860c  (1.59GHz/9.0MB)
VMS Version:       V8.4
Current Process:   BATCH_1008254
Current Image:     DSA34:[EXE]xxxxx.EXE;5
Failing PC:        FFFFFFFF.80263C20    SMP$TIMEOUT_C+00170
Failing PS:        00000000.00000800
Module:            SYSTEM_SYNCHRONIZATION_MIN    (Link Date/Time:  3-SEP-2010 12:46:50.40)
Offset:            00010F20

Boot Time:         21-OCT-2013 09:50:42.00
System Uptime:              25 06:40:28.90
Crash/Primary CPU: 3./0.
System/CPU Type:   4020
Saved Processes:   1056
Pagesize:          8 KByte (8192 bytes)
Physical Memory:   20479 MByte (134742016 PFNs, discontiguous memory)
Dumpfile Pagelets: 5243425 blocks
Dump Flags:        olddump,writecomp,errlogcomp
Dump Type:         compressed,selective,dosd,shared_mem
EXE$GL_FLAGS:      poolpging,init,bugdump,tbchk
Paging Files:      1 Pagefile and 0 Swapfiles installed

Stack Pointers:
KSP = 00000000.7FF43E10   ESP = 00000000.7FF68000   SSP = 00000000.7FFAC000
USP = 00000000.7AB0B960

General Registers:
R0  = 00000000.00000000   GP  = FFFFFFFF.AD8EE800   R2  = 00000000.7FF43E00
R3  = 00000007.57A0E823   R4  = 00000000.00000043   R5  = FFFFFFFF.8C9CB080
R6  = 00000000.885A49B8   R7  = FFFFFFFF.896B2D00   R8  = 00000000.00000000
R9  = 00000000.00000002   R10 = 00000000.8813C470   R11 = FFFFFFFF.8825CC00
SP  = 00000000.00000000   TP  = 00000000.7B30E1C8   R14 = 00000000.00000000
R15 = FFFFFFFF.AD6EE968   R16 = FFFFFFFF.8019A6C0   R17 = 00000000.0000078C
R18 = 00000000.00000000   R19 = 00000000.0000078C   R20 = FFFFFFFF.AD6EE300
R21 = 00000000.7FF43E38   R22 = FFFFFFFF.8825E1A8   R23 = FFFFFFFF.AD022EA0
R24 = 00000000.00000000   AI  = 00000000.00000003   RA  = 00000000.8813A480
PV  = 00000000.0000FBA6   R28 = FFFFFFFF.8A5D6EC0   FP  = 00000000.7FF43EC0
R30 = FFFFFFFF.AD6EE300   R31 = 00000000.00000000

CPUSPINWAIT Bugcheck:
Cause:                  timeout processing IPINT and/or acquiring spinlock
Spinlock name:          IOLOCK8/SCS
Spinlock address:       AD6EE300
Spinlock owner CPU Id:  02
Crash CPU Id:           03

CPU Id    CPUDB       BugCode            State       WorkReq                     Interrupted PC
------    --------    ---------------    --------    ------------------------    ---------------------------------------
  00      880E2000    CPUSPINWAIT        Run         bugchk
  01      88258C80    CPUSPINWAIT        Stopped     bugchk
  02      8825AC00    CPUEXIT            Stopped     <none>
  03      8825CC00    CPUSPINWAIT        Stopped     <none>

System Registers:
Page Table Base Register (PTBR)                           00000000.0010D950
Processor Base Register (PRBR)                            FFFFFFFF.8825CC00
Privileged Context Block Base (PCBB)                      FFFFFFFF.B0142080
System Control Block Base (SCBB)                          00000000.00000000
Software Interrupt Summary Register (SISR)                00000000.00000180
Address Space Number (ASN)                                00000000.002788F6
AST Summary / AST Enable (ASTSR_ASTEN)                    00000000.0000000F
Floating-Point Enable (FEN)                               00000000.00000001
Interrupt Priority Level (IPL)                            00000000.00000008
Machine Check Error Summary (MCES)                        00000000.00000000
Virtual Page Table Base Register (VPTB)                   00000000.00000000

Failing Instruction:
SMP$TIMEOUT_C+00170:              break.m     100002

Instruction Stream (last 20 instructions):
SMP$TIMEOUT_C+00120:              mov         r8 = r58
SMP$TIMEOUT_C+00121:              mov.i       ar.pfs = r56
SMP$TIMEOUT_C+00122:              nop.b       000000 ;;
SMP$TIMEOUT_C+00130:              nop.m       000000
SMP$TIMEOUT_C+00131:              nop.f       000000
SMP$TIMEOUT_C+00132:              br.ret.sptk.many b0 ;;
SMP$TIMEOUT_C+00140:              add         r19 = 200140, r1
SMP$TIMEOUT_C+00141:              mov         r22 = r17
SMP$TIMEOUT_C+00142:              nop.i       000000 ;;
SMP$TIMEOUT_C+00150:              ld8         r19 = [r19] ;;
SMP$TIMEOUT_C+00151:              or          r19 = 04, r19
SMP$TIMEOUT_C+00152:              nop.i       000000 ;;
SMP$TIMEOUT_C+00160:              nop.m       000000
SMP$TIMEOUT_C+00161:              sxt4        r17 = r19
SMP$TIMEOUT_C+00162:              nop.b       000000 ;;
SMP$TIMEOUT_C+00170:              break.m     100002
SMP$TIMEOUT_C+00171:              mov         r17 = r22
SMP$TIMEOUT_C+00172:              nop.i       000000 ;;
SMP$TIMEOUT_C+00180:              break.m     100003
SMP$TIMEOUT_C+00181:              nop.f       000000
SMP$TIMEOUT_C+00182:              nop.i       000000 ;;
SMP$INIT_SANITY_C:                alloc       r41 = ar.pfs, 11, 00, 00
SMP$INIT_SANITY_C+00001:          add         r15 = 2000B0, r1
SMP$INIT_SANITY_C+00002:          mov         r47 = r7
SMP$INIT_SANITY_C+00010:          mov         r46 = r6 ;;
Posted at December 5, 2013 12:21 PM
Tag Set:

Comments are closed