A couple of weeks ago, we experienced a CPUSPINWAIT crash. Initial investigation indicates that the crash occurred in a call to SYS$ICC_ACCEPT() while waiting to get spinlock IOLOCK8.
Because of the call to Intra-Cluster Communication services, HP initially recommended applying patch VMS84I_IPC-V0200. But after I pointed out that the release notes for that patch mention "waiting for the SCHED spinlock" and not the IOLOCK8 spinlock, I insisted that further crash analysis be performed.
At the moment, it appears that the crash is a close relation of the one solved by the above patch, but is certainly not the same crash.
Engineering are presently investigating.
Here's the crash footprint:
Crash Time: 15-NOV-2013 16:31:10.90 Bugcheck Type: CPUSPINWAIT, CPU spinwait timer expired Node: xxxxxx (Cluster) CPU Type: HP BL860c (1.59GHz/9.0MB) VMS Version: V8.4 Current Process: BATCH_1008254 Current Image: DSA34:[EXE]xxxxx.EXE;5 Failing PC: FFFFFFFF.80263C20 SMP$TIMEOUT_C+00170 Failing PS: 00000000.00000800 Module: SYSTEM_SYNCHRONIZATION_MIN (Link Date/Time: 3-SEP-2010 12:46:50.40) Offset: 00010F20 Boot Time: 21-OCT-2013 09:50:42.00 System Uptime: 25 06:40:28.90 Crash/Primary CPU: 3./0. System/CPU Type: 4020 Saved Processes: 1056 Pagesize: 8 KByte (8192 bytes) Physical Memory: 20479 MByte (134742016 PFNs, discontiguous memory) Dumpfile Pagelets: 5243425 blocks Dump Flags: olddump,writecomp,errlogcomp Dump Type: compressed,selective,dosd,shared_mem EXE$GL_FLAGS: poolpging,init,bugdump,tbchk Paging Files: 1 Pagefile and 0 Swapfiles installed Stack Pointers: KSP = 00000000.7FF43E10 ESP = 00000000.7FF68000 SSP = 00000000.7FFAC000 USP = 00000000.7AB0B960 General Registers: R0 = 00000000.00000000 GP = FFFFFFFF.AD8EE800 R2 = 00000000.7FF43E00 R3 = 00000007.57A0E823 R4 = 00000000.00000043 R5 = FFFFFFFF.8C9CB080 R6 = 00000000.885A49B8 R7 = FFFFFFFF.896B2D00 R8 = 00000000.00000000 R9 = 00000000.00000002 R10 = 00000000.8813C470 R11 = FFFFFFFF.8825CC00 SP = 00000000.00000000 TP = 00000000.7B30E1C8 R14 = 00000000.00000000 R15 = FFFFFFFF.AD6EE968 R16 = FFFFFFFF.8019A6C0 R17 = 00000000.0000078C R18 = 00000000.00000000 R19 = 00000000.0000078C R20 = FFFFFFFF.AD6EE300 R21 = 00000000.7FF43E38 R22 = FFFFFFFF.8825E1A8 R23 = FFFFFFFF.AD022EA0 R24 = 00000000.00000000 AI = 00000000.00000003 RA = 00000000.8813A480 PV = 00000000.0000FBA6 R28 = FFFFFFFF.8A5D6EC0 FP = 00000000.7FF43EC0 R30 = FFFFFFFF.AD6EE300 R31 = 00000000.00000000 CPUSPINWAIT Bugcheck: Cause: timeout processing IPINT and/or acquiring spinlock Spinlock name: IOLOCK8/SCS Spinlock address: AD6EE300 Spinlock owner CPU Id: 02 Crash CPU Id: 03 CPU Id CPUDB BugCode State WorkReq Interrupted PC ------ -------- --------------- -------- ------------------------ --------------------------------------- 00 880E2000 CPUSPINWAIT Run bugchk 01 88258C80 CPUSPINWAIT Stopped bugchk 02 8825AC00 CPUEXIT Stopped <none> 03 8825CC00 CPUSPINWAIT Stopped <none> System Registers: Page Table Base Register (PTBR) 00000000.0010D950 Processor Base Register (PRBR) FFFFFFFF.8825CC00 Privileged Context Block Base (PCBB) FFFFFFFF.B0142080 System Control Block Base (SCBB) 00000000.00000000 Software Interrupt Summary Register (SISR) 00000000.00000180 Address Space Number (ASN) 00000000.002788F6 AST Summary / AST Enable (ASTSR_ASTEN) 00000000.0000000F Floating-Point Enable (FEN) 00000000.00000001 Interrupt Priority Level (IPL) 00000000.00000008 Machine Check Error Summary (MCES) 00000000.00000000 Virtual Page Table Base Register (VPTB) 00000000.00000000 Failing Instruction: SMP$TIMEOUT_C+00170: break.m 100002 Instruction Stream (last 20 instructions): SMP$TIMEOUT_C+00120: mov r8 = r58 SMP$TIMEOUT_C+00121: mov.i ar.pfs = r56 SMP$TIMEOUT_C+00122: nop.b 000000 ;; SMP$TIMEOUT_C+00130: nop.m 000000 SMP$TIMEOUT_C+00131: nop.f 000000 SMP$TIMEOUT_C+00132: br.ret.sptk.many b0 ;; SMP$TIMEOUT_C+00140: add r19 = 200140, r1 SMP$TIMEOUT_C+00141: mov r22 = r17 SMP$TIMEOUT_C+00142: nop.i 000000 ;; SMP$TIMEOUT_C+00150: ld8 r19 = [r19] ;; SMP$TIMEOUT_C+00151: or r19 = 04, r19 SMP$TIMEOUT_C+00152: nop.i 000000 ;; SMP$TIMEOUT_C+00160: nop.m 000000 SMP$TIMEOUT_C+00161: sxt4 r17 = r19 SMP$TIMEOUT_C+00162: nop.b 000000 ;; SMP$TIMEOUT_C+00170: break.m 100002 SMP$TIMEOUT_C+00171: mov r17 = r22 SMP$TIMEOUT_C+00172: nop.i 000000 ;; SMP$TIMEOUT_C+00180: break.m 100003 SMP$TIMEOUT_C+00181: nop.f 000000 SMP$TIMEOUT_C+00182: nop.i 000000 ;; SMP$INIT_SANITY_C: alloc r41 = ar.pfs, 11, 00, 00 SMP$INIT_SANITY_C+00001: add r15 = 2000B0, r1 SMP$INIT_SANITY_C+00002: mov r47 = r7 SMP$INIT_SANITY_C+00010: mov r46 = r6 ;;
SYS$ACM changed behaviour?: Has SYS$ACM(W) changed behavior in recent versions of OpenVMS? (180 words)
ACLSEARCH X01-07: A new version of ACLSEARCH has been released. A fix suggested by Tony McGrath has been incorporated to handle long ACLs correctly, and I've done some reworking of the "Does this ACE match?" logic. (213 words)
3PAR now fully supported: HP announce full support for 3PAR on IA64 OpenVMS. (45 words)
Bespoke dashboard: A description of a little application and infrastructure dashboard I whipped up. (667 words)
PHP Caching: How I reduced the run time of my code examples script from 5 seconds to less than half a second. (203 words)
Last of the Alphas: Today I shut down the last of the AlphaServers here. I worked here on contract to port to Itanium quite some time ago. Finally, five years later, the business has shut down the third party tool that forced them to keep an alpha in production. Finally. (46 words)
SHARED_IMAGES.COM: A DCL command procedure that produces a report of all executables open on a specified disk sorted by the number of processes with open channels to them. You can specify this disk, and the nodes to query, or accept the default of the system disk cluster wide. (802 words)
General purpose RTL examples: After a couple of recent requests for examples of calling the OTS$ routines I've found some time and written some. Examples are now available on the Code examples page. (81 words)