DaemonForums  

Go Back   DaemonForums > OpenBSD > OpenBSD General

OpenBSD General Other questions regarding OpenBSD which do not fit in any of the categories below.

Reply
 
Thread Tools Display Modes
  #1   (View Single Post)  
Old 6th December 2014
hulten hulten is offline
Port Guard
 
Join Date: Dec 2014
Posts: 24
Default 5.6 crashes

Both 5.6-release and 5.6-stable give kernel panics on my amd64 system. They seem to be network-related.

An example of a crash of -release (shortly after logging into a virtual console):
Code:
# pkg_add vim
kernel: type 692267296 trap, code=0
Stopped at      0x6f43204149444956:panic: uvm_fault: fault on non-pagable map(0xffffffff81d7bb60, 0xffff80000015b000)
     Stopped at      Debugger+0x9:   leave

ddb{0}> trace
Debugger() at Debugger+0x9
panic() at panic+0xfe
uvm_fault() at uvm_fault+0xcc4
trap() at trap+0x62f
--- trap (number 6) ---
(null)() at 0xffff80000015ba80
db_get_value() at db_get_value+0x34
db_disasm() at db_disasm+0x42
db_trap() at db_trap+0x90
kdb_trap() at kdb_trap+0xf0
end of kernel
end trace frame: 0x96ae10b000000001, count: -9
ddb{0}> ps
   PID   PPID   PGRP    UID  S       FLAGS  WAIT          COMMAND
  2047  23251   6419      0  3        0x83  poll          ftp
 23251   6419   6419      0  3        0x8b  pause         sh
  6419  21029   6419      0  3        0x83  piperd        perl
 18846      1  22182     35  3        0x90  poll          xconsole
 30272      1  22182      0  3        0x80  netio         xconsole
   383  27898    383      0  3        0x80  poll          xdm
 15980  21909  21909H(  .e   3        0x
Hereafter apparently random characters appear (about 20 lines). When I enter a command again (ps), the only readable messages are (kernel, blue):

Code:
          kernel: protection fault trap, code=0
Faulted in DDB; continuing...
Then there is a full crash (no keys, like NumLock, respond; I don't get icmp ping requests back from the machine).

Crash happens most often already during loading the kernel or during init. An example of crash during init:

Code:
...
DHCPACK from 192.168.0.11 (00:a0:24:f0:fb:11)
kernel: double fault trap, code=0
Stopped at      0:
Example of -stable crash (upgraded from 5.6 release following release(8)):

Code:
starting network
DHCPREQUEST on sk0 to 255.255.255.255
DHCPACK from 192.168.0.11 (00:40:24:f0:fb:11)
and then the computer hangs (keyboard nor network). Ideas are welcome.
Reply With Quote
  #2   (View Single Post)  
Old 6th December 2014
ocicat ocicat is offline
Administrator
 
Join Date: Apr 2008
Posts: 3,253
Default

Quote:
Originally Posted by hulten
Ideas are welcome.
Please provide the complete output of dmesg(8).
Reply With Quote
  #3   (View Single Post)  
Old 6th December 2014
J65nko J65nko is offline
Administrator
 
Join Date: May 2008
Location: Budel - the Netherlands
Posts: 3,494
Default

Read http://www.openbsd.org/faq/faq2.html#Bugs and report the relevant info (inline, not as an attachment) to the OpenBSD misc mailing list.
__________________
You don't need to be a genius to debug a pf.conf firewall ruleset, you just need the guts to run tcpdump
Reply With Quote
  #4   (View Single Post)  
Old 7th December 2014
hulten hulten is offline
Port Guard
 
Join Date: Dec 2014
Posts: 24
Default

The full dmesg of bsd.sp is here:
Code:
OpenBSD 5.6 (GENERIC) #310: Fri Aug  8 00:14:24 MDT 2014
    deraadt@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
real mem = 2129526784 (2030MB)
avail mem = 2064158720 (1968MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.2 @ 0xf0000 (43 entries)
bios0: vendor Phoenix Technologies, LTD version "6.00 PG" date 03/29/2006
bios0: DFI Corp,LTD LP NF4 Series
acpi0 at bios0: rev 0
acpi0: sleep states S0 S1 S4 S5
acpi0: tables DSDT FACP MCFG APIC
acpi0: wakeup devices HUB0(S5) XVR0(S5) XVR1(S5) XVR2(S5) XVR3(S5) USB0(S3) USB2(S3) MMAC(S5) MMCI(S5) UAR1(S5) PS2M(S4) PS2K(S4)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimcfg0 at acpi0 addr 0xe0000000, bus 0-255
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Athlon(tm) 64 X2 Dual Core Processor 4600+, 2411.39 MHz
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW,LAHF,CMPLEG
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
cpu0: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
cpu0: DTLB 32 4KB entries fully associative, 8 4MB entries fully associative
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 200MHz
cpu at mainbus0: not configured
ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 11, 24 pins
ioapic0: misconfigured as apic 0, remapped to apid 2
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 1 (HUB0)
acpicpu0 at acpi0
acpitz0 at acpi0: critical temperature is 69 degC
acpibtn0 at acpi0: PWRB
pci0 at mainbus0 bus 0
"NVIDIA nForce4 DDR" rev 0xa3 at pci0 dev 0 function 0 not configured
pcib0 at pci0 dev 1 function 0 "NVIDIA nForce4 ISA" rev 0xa3
nviic0 at pci0 dev 1 function 1 "NVIDIA nForce4 SMBus" rev 0xa2
iic0 at nviic0
spdmem0 at iic0 addr 0x50: 1GB DDR SDRAM non-parity PC3200CL3.0
spdmem1 at iic0 addr 0x51: 1GB DDR SDRAM non-parity PC3200CL3.0
iic1 at nviic0
iic1: addr 0x4e 00=02 02=28 03=28 04=2a 12=be 13=0e 20=01 28=83 29=12 2a=12 2b=28 words 00=0200 01=0028 02=2828 03=282a 04=2a00 05=0000 06=0000 07=0000
ohci0 at pci0 dev 2 function 0 "NVIDIA nForce4 USB" rev 0xa2: apic 2 int 20, version 1.0, legacy support
ehci0 at pci0 dev 2 function 1 "NVIDIA nForce4 USB" rev 0xa3: apic 2 int 20
ehci0: timed out waiting for BIOS
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 "NVIDIA EHCI root hub" rev 2.00/1.00 addr 1
pciide0 at pci0 dev 6 function 0 "NVIDIA nForce4 IDE" rev 0xa2: DMA, channel 0 configured to compatibility, channel 1 configured to compatibility
atapiscsi0 at pciide0 channel 0 drive 0
scsibus1 at atapiscsi0: 2 targets
cd0 at scsibus1 targ 0 lun 0: <HL-DT-ST, CDRW/DVD GCC4482, E107> ATAPI 5/cdrom removable
atapiscsi1 at pciide0 channel 0 drive 1
scsibus2 at atapiscsi1: 2 targets
cd1 at scsibus2 targ 0 lun 0: <LG, CD-ROM CRD-8522B, 2.03> ATAPI 5/cdrom removable
cd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2
cd1(pciide0:0:1): using PIO mode 4, DMA mode 2
pciide0: channel 1 disabled (no drives)
pciide1 at pci0 dev 7 function 0 "NVIDIA nForce4 SATA" rev 0xa3: DMA
pciide1: using apic 2 int 20 for native-PCI interrupt
pciide2 at pci0 dev 8 function 0 "NVIDIA nForce4 SATA" rev 0xa3: DMA
pciide2: using apic 2 int 20 for native-PCI interrupt
wd0 at pciide2 channel 0 drive 0: <Maxtor 6B120M0>
wd0: 16-sector PIO, LBA, 117246MB, 240121728 sectors
wd0(pciide2:0:0): using PIO mode 4, Ultra-DMA mode 6
ppb0 at pci0 dev 9 function 0 "NVIDIA nForce4" rev 0xa2
pci1 at ppb0 bus 1
"NVIDIA Vanta" rev 0x15 at pci1 dev 6 function 0 not configured
emu0 at pci1 dev 7 function 0 "Creative Labs SoundBlaster Live" rev 0x06: apic 2 int 3
ac97: codec id 0x54524123 (TriTech Microelectronics TR28602)
audio0 at emu0
"Creative Labs PCI Gameport Joystick" rev 0x06 at pci1 dev 7 function 1 not configured
"VIA VT6306 FireWire" rev 0x80 at pci1 dev 9 function 0 not configured
skc0 at pci1 dev 10 function 0 "Marvell Yukon 88E8001/8003/8010" rev 0x13, Yukon Lite (0x9): apic 2 int 5
sk0 at skc0 port A: address 00:01:29:fc:35:59
eephy0 at sk0 phy 0: 88E1011 Gigabit PHY, rev. 5
nfe0 at pci0 dev 10 function 0 "NVIDIA CK804 LAN" rev 0xa3: apic 2 int 20, address 00:01:29:fc:34:f1
ciphy0 at nfe0 phy 1: CS8201 10/100/1000TX PHY, rev. 3
ppb1 at pci0 dev 11 function 0 "NVIDIA nForce4 PCIE" rev 0xa3
pci2 at ppb1 bus 2
ppb2 at pci0 dev 12 function 0 "NVIDIA nForce4 PCIE" rev 0xa3
pci3 at ppb2 bus 3
ppb3 at pci0 dev 13 function 0 "NVIDIA nForce4 PCIE" rev 0xa3
pci4 at ppb3 bus 4
ppb4 at pci0 dev 14 function 0 "NVIDIA nForce4 PCIE" rev 0xa3
pci5 at ppb4 bus 5
vga1 at pci5 dev 0 function 0 "NVIDIA GeForce 6800 GT" rev 0xa2
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
pchb0 at pci0 dev 24 function 0 "AMD AMD64 0Fh HyperTransport" rev 0x00
pchb1 at pci0 dev 24 function 1 "AMD AMD64 0Fh Address Map" rev 0x00
pchb2 at pci0 dev 24 function 2 "AMD AMD64 0Fh DRAM Cfg" rev 0x00
kate0 at pci0 dev 24 function 3 "AMD AMD64 0Fh Misc Cfg" rev 0x00
isa0 at pcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
it0 at isa0 port 0x2e/2: IT8712F rev 7, EC port 0x290
usb1 at ohci0: USB revision 1.0
uhub1 at usb1 "NVIDIA OHCI root hub" rev 1.00/1.00 addr 1
uhidev0 at uhub1 port 2 configuration 1 interface 0 "Logitech USB-PS/2 Optical Mouse" rev 2.00/21.00 addr 2
uhidev0: iclass 3/1
ums0 at uhidev0: 8 buttons, Z dir
wsmouse0 at ums0 mux 0
vscsi0 at root
scsibus3 at vscsi0: 256 targets
softraid0 at root
scsibus4 at softraid0: 256 targets
root on wd0a (106e6dba438758b0.a) swap on wd0b dump on wd0b
I am afraid that I am not able to provide the dmesg from bsd (multi processor), because I do not have local access to the machine. As soon as I do (around 23 December), I will try to provide that output (if it doesn't crash before I can enter, or completing, dmesg). At that point, I will also send a bug report to misc.

Is the enclosed dmesg output (from the stable bsd.sp) useful at all?
Reply With Quote
  #5   (View Single Post)  
Old 7th December 2014
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 5,517
Default

Hello, and welcome!

This dmesg tells us quite a bit, since it lays out your hardware for us. I note that your system is using a DFI "LanParty" NP4 series motherboard, with what appears to be an up-to-date BIOS, if I understand that the same "Revision A" BIOS was packaged in both March and again in April of 2006. DFI's website isn't completely clear.

If the problem only occurs with the MP kernel, please report that when you are able to post to the mailing list. That information should help.
Reply With Quote
  #6   (View Single Post)  
Old 24th December 2014
hulten hulten is offline
Port Guard
 
Join Date: Dec 2014
Posts: 24
Default apparently fixed

edit: The system crashed shortly after this post; after a reboot again; problem still exists.

Thank you for the welcome, jggimi, and thank you all for the suggestions.

The problem appears to be solved with the GENERIC.MP kernel that I compiled and installed today (after updating to the newest sources from the CVS repository). I have done a "stress" test (from the pre-compiled packages), scp'ed a big file, did a "pkg_add -u", some webbrowsing and am now compiling the userspace. No crash. With the previous kernel my system crashed during, or shortly after, init.

If I understand the evolution of the OpenBSD system correctly, there have been three changes in the code since my previous multiprocessor kernel, namely errata 012, 013 and 014. As 012 has something to do with attacks (and I am on a private subnet that should be secure), and 014 is a security patch for X and previously my system crashed often before X started, I conclude that reliability patch 013, fixing hangs with the virtio device, has solved it. But does my system use a VirtIO device? I don't know.

For completeness my dmesg, this time of the patched MP kernel:
Code:
OpenBSD 5.6-stable (GENERIC.MP) #1: Wed Dec 24 15:10:55 CET 2014
    root@gluon.instanton:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 2129526784 (2030MB)
avail mem = 2064109568 (1968MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.2 @ 0xf0000 (43 entries)
bios0: vendor Phoenix Technologies, LTD version "6.00 PG" date 03/29/2006
bios0: DFI Corp,LTD LP NF4 Series
acpi0 at bios0: rev 0
acpi0: sleep states S0 S1 S4 S5
acpi0: tables DSDT FACP MCFG APIC
acpi0: wakeup devices HUB0(S5) XVR0(S5) XVR1(S5) XVR2(S5) XVR3(S5) USB0(S3) USB2(S3) MMAC(S5) MMCI(S5) UAR1(S5) PS2M(S4) PS2K(S4)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimcfg0 at acpi0 addr 0xe0000000, bus 0-255
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Athlon(tm) 64 X2 Dual Core Processor 4600+, 2411.40 MHz
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW,LAHF,CMPLEG
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
cpu0: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
cpu0: DTLB 32 4KB entries fully associative, 8 4MB entries fully associative
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 200MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: AMD Athlon(tm) 64 X2 Dual Core Processor 4600+, 2411.11 MHz
cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW,LAHF,CMPLEG
cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache
cpu1: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
cpu1: DTLB 32 4KB entries fully associative, 8 4MB entries fully associative
ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 11, 24 pins
ioapic0: misconfigured as apic 0, remapped to apid 2
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 1 (HUB0)
acpicpu0 at acpi0
acpicpu1 at acpi0
acpitz0 at acpi0: critical temperature is 69 degC
acpibtn0 at acpi0: PWRB
pci0 at mainbus0 bus 0
"NVIDIA nForce4 DDR" rev 0xa3 at pci0 dev 0 function 0 not configured
pcib0 at pci0 dev 1 function 0 "NVIDIA nForce4 ISA" rev 0xa3
nviic0 at pci0 dev 1 function 1 "NVIDIA nForce4 SMBus" rev 0xa2
iic0 at nviic0
spdmem0 at iic0 addr 0x50: 1GB DDR SDRAM non-parity PC3200CL3.0
spdmem1 at iic0 addr 0x51: 1GB DDR SDRAM non-parity PC3200CL3.0
iic1 at nviic0
iic1: addr 0x4e 00=02 02=28 03=28 04=2a 12=be 13=0e 20=01 28=83 29=12 2a=12 2b=28 words 00=0200 01=0028 02=2828 03=282a 04=2a00 05=0000 06=0000 07=0000
ohci0 at pci0 dev 2 function 0 "NVIDIA nForce4 USB" rev 0xa2: apic 2 int 20, version 1.0, legacy support
ehci0 at pci0 dev 2 function 1 "NVIDIA nForce4 USB" rev 0xa3: apic 2 int 20
ehci0: timed out waiting for BIOS
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 "NVIDIA EHCI root hub" rev 2.00/1.00 addr 1
pciide0 at pci0 dev 6 function 0 "NVIDIA nForce4 IDE" rev 0xa2: DMA, channel 0 configured to compatibility, channel 1 configured to compatibility
atapiscsi0 at pciide0 channel 0 drive 0
scsibus1 at atapiscsi0: 2 targets
cd0 at scsibus1 targ 0 lun 0: <HL-DT-ST, CDRW/DVD GCC4482, E107> ATAPI 5/cdrom removable
atapiscsi1 at pciide0 channel 0 drive 1
scsibus2 at atapiscsi1: 2 targets
cd1 at scsibus2 targ 0 lun 0: <LG, CD-ROM CRD-8522B, 2.03> ATAPI 5/cdrom removable
cd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2
cd1(pciide0:0:1): using PIO mode 4, DMA mode 2
pciide0: channel 1 disabled (no drives)
pciide1 at pci0 dev 7 function 0 "NVIDIA nForce4 SATA" rev 0xa3: DMA
pciide1: using apic 2 int 20 for native-PCI interrupt
pciide2 at pci0 dev 8 function 0 "NVIDIA nForce4 SATA" rev 0xa3: DMA
pciide2: using apic 2 int 20 for native-PCI interrupt
wd0 at pciide2 channel 0 drive 0: <Maxtor 6B120M0>
wd0: 16-sector PIO, LBA, 117246MB, 240121728 sectors
wd0(pciide2:0:0): using PIO mode 4, Ultra-DMA mode 6
ppb0 at pci0 dev 9 function 0 "NVIDIA nForce4" rev 0xa2
pci1 at ppb0 bus 1
"NVIDIA Vanta" rev 0x15 at pci1 dev 6 function 0 not configured
emu0 at pci1 dev 7 function 0 "Creative Labs SoundBlaster Live" rev 0x06: apic 2 int 3
ac97: codec id 0x54524123 (TriTech Microelectronics TR28602)
audio0 at emu0
"Creative Labs PCI Gameport Joystick" rev 0x06 at pci1 dev 7 function 1 not configured
"VIA VT6306 FireWire" rev 0x80 at pci1 dev 9 function 0 not configured
skc0 at pci1 dev 10 function 0 "Marvell Yukon 88E8001/8003/8010" rev 0x13, Yukon Lite (0x9): apic 2 int 5
sk0 at skc0 port A: address 00:01:29:fc:35:59
eephy0 at sk0 phy 0: 88E1011 Gigabit PHY, rev. 5
nfe0 at pci0 dev 10 function 0 "NVIDIA CK804 LAN" rev 0xa3: apic 2 int 20, address 00:01:29:fc:34:f1
ciphy0 at nfe0 phy 1: CS8201 10/100/1000TX PHY, rev. 3
ppb1 at pci0 dev 11 function 0 "NVIDIA nForce4 PCIE" rev 0xa3
pci2 at ppb1 bus 2
ppb2 at pci0 dev 12 function 0 "NVIDIA nForce4 PCIE" rev 0xa3
pci3 at ppb2 bus 3
ppb3 at pci0 dev 13 function 0 "NVIDIA nForce4 PCIE" rev 0xa3
pci4 at ppb3 bus 4
ppb4 at pci0 dev 14 function 0 "NVIDIA nForce4 PCIE" rev 0xa3
pci5 at ppb4 bus 5
vga1 at pci5 dev 0 function 0 "NVIDIA GeForce 6800 GT" rev 0xa2
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
pchb0 at pci0 dev 24 function 0 "AMD AMD64 0Fh HyperTransport" rev 0x00
pchb1 at pci0 dev 24 function 1 "AMD AMD64 0Fh Address Map" rev 0x00
pchb2 at pci0 dev 24 function 2 "AMD AMD64 0Fh DRAM Cfg" rev 0x00
kate0 at pci0 dev 24 function 3 "AMD AMD64 0Fh Misc Cfg" rev 0x00
isa0 at pcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
it0 at isa0 port 0x2e/2: IT8712F rev 7, EC port 0x290
usb1 at ohci0: USB revision 1.0
uhub1 at usb1 "NVIDIA OHCI root hub" rev 1.00/1.00 addr 1
uhidev0 at uhub1 port 2 configuration 1 interface 0 "Logitech USB-PS/2 Optical Mouse" rev 2.00/21.00 addr 2
uhidev0: iclass 3/1
ums0 at uhidev0: 8 buttons, Z dir
wsmouse0 at ums0 mux 0
vscsi0 at root
scsibus3 at vscsi0: 256 targets
softraid0 at root
scsibus4 at softraid0: 256 targets
root on wd0a (106e6dba438758b0.a) swap on wd0b dump on wd0b
uid 0 on /usr: file system full

Last edited by hulten; 24th December 2014 at 04:39 PM. Reason: errors
Reply With Quote
  #7   (View Single Post)  
Old 24th December 2014
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 5,517
Default

I couldn't tell you why the problems have apparently resolved. Should it happen again, perhaps a backtrace of the system core dump will help isolate the issue. See crash(8) for sysctl settings you may want to deploy on the remote platform, as well as debugging guidance.
Reply With Quote
  #8   (View Single Post)  
Old 25th December 2014
hulten hulten is offline
Port Guard
 
Join Date: Dec 2014
Posts: 24
Default

The crashes happen again. I was probably just lucky that I could use my system several hours instead of mostly a minute (which is usually the case).

First some standard bug report information:
Code:
...
DHCPACK from 192.168.0.11 (00:a):24:f0:fb:11)
kernel: type 692267296 trap, code=1
Stopped at      0x6f43204140444056:panic: attempt to execute user adé   s 0xb in
 sup _  É(  %       ä    >           _ ...
        panic: attempt to execute user address 0xb in supervisor mode       Pß
Stopped at   Debugger+0x9:   leave
RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!
IF RUNNING SMP, USE 'mach ddbcpu <#>' AND 'trace' ON OTHER PROCESSORS, TOO.
DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCUDING THAT INFORMATION!
ddb{0}> trace
Debugger() at Debugger+0x9
panic() at panic+0xfe
trap() at trap+0x85d
--- trap (number 6) ---
end of kernel
end trace frame: 0x176e176f17691774, count: -3
0xb:
ddb{0}> machine ddbcpu 1
RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!
IF RUNNING SMP, USE 'mach ddbcpu <#>' AND 'trace' ON OTHER PROCESSORS, TOO.
DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCUDING THAT INFORMATION!
ddb{1}> trace
Debugger() at Debugger+0x9
x86_ipi_handler at x86_ipi_handler+0x64
Xresume_lapic_ipi() at Xresume_lapic_ipi+0x1b
--- interrupt ---
Bad frame pointer: 0xffff8000212e2f10
end trace frame: 0xffff8000212e2f10, count: -3
cpu_idle_cycle+0x13:
ddb{1}> show panic
attempt to execute user address 0xb in supervisor mode
ddb{1}> ps
   PID   PPID   PGRP    UID  S       FLAGS  WAIT         COMMAND
 16610      1  16610      0  3        0x80  poll         dhclient
 15412  18760  18760     77  3        0x13  biowait      dhclient
 18760      1  18760      0  3        0x8b  pause        sh
  1024      0      0      0  3     0x14200  aiodoned     aiodoned
  5077      0      0      0  3     0x14200  syncer       update
 19163      0      0      0  3     0x14200  cleaner      cleaner
 10500      0      0      0  3     0x14200  reaper       reaper
 14850      0      0      0  3     0x14200  pgdaemon     pagedaemon
  7097      0      0      0  3     0x14200  bored        crypto
 29987      0      0      0  3     0x14200  pftm         pfpurge
    98      0      0      0  3     0x14200  bored        sensors
 27672      0      0      0  3     0x14200  usbtsk       usbtask
 28808      0      0      0  3     0x14200  usbatsk      usbatsk
 31061      0      0      0  3  0x40014200  acpi0        acpi0
*16000      0      0      0  7  0x40014200               idle1
 15840      0      0      0  3     0x14200  bored        systqmp
 28757      0      0      0  3     0x14200  bored        systq
 19413      0      0      0  3     0x14200  bored        syswq
 11589      0      0      0  3     0x14200               idle0
     1      0      0      0  3     0x14200  wait         init
     0      0      0      0  3     0x14200  scheduler    swapper
ddb {1}>
Now, for the core dump, I tried this:
Code:
...
DHCPACK from 192.168.0.11 (00:a):24:f0:fb:11)
kernel: type 692267296 trap, code=1
Stopped at      0x6f43204140444056:     kernel: protection fault trap, code=0
Stopped at      db_read_bytes+0x22:     movzbl  0(%rdi,%rcx,1)c%eax
ddb{0}> boot dump
°(   (  πë  s...ä       `\    (     itok: want -1 have 2
splassert: aä   twaitok: ~  ...
        t: assertwaitok: want -1 have 2
...
kernel: type 269 trap, code=0
Faulted in DDB; continuing...
ddb{0}> boot sync
Faulted in DDB ...
If I understand crash(8) correctly, it should save files in /var/crash/ (which it didn't) if "boot dump" were to be executed correctly.

I have local access to this machine now.

The machine ran/runs without notable problems with several versions of Debian GNU/Linux.

Last edited by hulten; 25th December 2014 at 10:13 PM. Reason: quick additions
Reply With Quote
  #9   (View Single Post)  
Old 25th December 2014
ocicat ocicat is offline
Administrator
 
Join Date: Apr 2008
Posts: 3,253
Default

While some members here may have a suggestion, recognize that this site is not officially affiliated with the OpenBSD project proper. Users wanting to submit formal bug reports should study the information found at the following link for the submission protocol:

http://www.openbsd.org/report.html

Completeness is considered a good thing. You will make friend by providing a thorough explanation.
Reply With Quote
Old 26th December 2014
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 5,517
Default

Quote:
Originally Posted by hulten View Post
If I understand crash(8) correctly, it should save files in /var/crash/ (which it didn't) if "boot dump" were to be executed correctly
Correct. The state of the machine after this panic is such that ddb cannot take a core dump. The traceback you have reproduced here is the only available information.

The x86_ipi_handler mentioned in the traceroute is in sys/arch/amd64/amd64/ipi.c and that module has not been changed since 5.6-release.

I agree with ocicat that this is best reported to the Project via its misc@ mailing list.
Reply With Quote
Old 26th December 2014
hulten hulten is offline
Port Guard
 
Join Date: Dec 2014
Posts: 24
Default

Before I report this to the misc@ or bugs@, I would like to exclude the following potential hardware failure.

I have ran Memtest86 (both v1.65 from the BIOS, and v4.20 through "boot memtest"). In these tests test 5 [Block move, 64 moves] gives many errors, as does test 7 from v4.20. I have read on several internet fora that test 5 would create a lot of heat, rendering this test not trustworthy.

Any ideas about this? My first guess is bad memory, and I should try to exclude or confirm this first.
Reply With Quote
Old 26th December 2014
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 5,517
Default

My own experience with memtest86 and memtest+ is that I have never seen false positives. Plenty of false negatives, since they do not catch every problem; they cannot prove hardware is good; they only help identify bad hardware.

I've never had heat problems during the running of these tests. If I wanted to test power supplies and heat management, I'd run stress testers instead.

You are more likely to have a memory problem than a heat problem, based upon your results..... of course, a heat problem could manifest as a memory problem.

Last edited by jggimi; 26th December 2014 at 12:44 PM. Reason: nothing is ever definitive .. :)
Reply With Quote
Old 26th December 2014
hulten hulten is offline
Port Guard
 
Join Date: Dec 2014
Posts: 24
Default it must be the memory

Thanks, jggimi. I will try to fix the memory problem. Also, none of the temperature sensors that are displayed in the BIOS show very high temperatures (nothing higher than 55°C), so what I read on those other fora on unreliable memtests was probably nonsense.

Still, it is unexpected, though not impossible, that I found the problem when running OpenBSD. I do not remember experiencing problems under GNU/Linux.
Reply With Quote
Old 26th December 2014
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 5,517
Default

My son's monster game PC recently had a heat problem. It has fans the size of a Buick ... but they became clogged with dust, as did the heat sinks.

Good luck!!
Reply With Quote
Old 27th December 2014
hulten hulten is offline
Port Guard
 
Join Date: Dec 2014
Posts: 24
Default

First I cleaned my computer with a vacuum cleaner (and connected a case fan that should have been connected before).

With Memtest86 I was able to identify one of my two DIMMs as bad. The memory test does not give problems with the other DIMM, so I assume this one is okay (though maybe false negative). I have done so many tests that my results are statistically rigorous.

With presumably my only good DIMM, I have done stress tests with stress(1) that show in both OpenBSD and GNU/Linux problems (the systems crash). But again, Memtest86 gives no problems with my last hardware configuration (only one DIMM).

OpenBSD:
Code:
# stress --cpu 2 --io 2 --vm 2 --timeout 5m
stress: info [21433] dispatching hogs: 2 cpu, 2 io, 2 vm, 0 hdd
kernel: protection fault trap, code=0
Stopped at      pmap_page_remove+0x75:  movq    0(%rbx),%rax
ddb{0}> trace
pmap_page_remove() at pmap_page_remove+0x75
uvm_anfree() at uvm_anfree+0xbe
amap_wipeout() at amap_wipeout+0xb9
uvm_unmap_detach() at uvmj_unmap_detach+0x52
sys_munmap() at sys_munmap+0x14b
syscall() at syscall+0x297
--- syscall (number 73) ---
end of kernel
end trace frame: 0x49a122e000, count: -6
0x4989e097ea:
ddb{0}>
Under GNU/Linux, a stress test as above:
Code:
# stress --cpu 2 --io 2 --vm 2 --timeout 5m
stress: info: [1897] dispatching hogs: 2 cpu, 2 io, 2 vm, 0 hdd
[  344.076011] BUG: soft lockup - CPU#0 stuck for 22s! [stress:1899]
[  360.156009] BUG: soft lockup - CPU#1 stuck for 22s! [cupsd:687]
[  344.076010] BUG: soft lockup - CPU#0 stuck for 22s! [stress:1899]
[  360.156008] BUG: soft lockup - CPU#1 stuck for 22s! [cupsd:687]
...
I still get ping (icmp) back from the computer, but cannot login (ssh).
None of the following worked: Ctrl+C, NumLock, switching to other virtual console (or graphical).
This worked: SysRq+U,S,B (it initiated umount and sync, and the system rebooted).

Then I decided to strip my computer from all stuff (e.g. audio card) not needed for the stress tests.
Again stress(1) under GNU/Linux, now single user mode (so no cupsd, e.g.):
Code:
# stress --cpu 2 --io 2 --vm 2 --timeout 5m
stress: info [669] dispatching hogs: 2 cpu, 2 io, 2 vm, 0 hdd
Crash! The CPU#0 backtrace is not readable anymore on the screen, I am left with a backtrace of CPU#1:
Code:
WARNING: CPU: 1 PID: 100 at /build/linux-CMiYW9/linux-3.16.7-ckt2/kernel/watchdog.c:265 watchdog_overflow_callback+0x98/0xc0()
Watchdog detected hard LOCKUP on cpu 1
Modules linked in: ...
CPU: 1 PID: 100 Comm: kworker/1:1H Tainted: G       D W     3.16.0-4-amd64 #1 Debian 3.16.7-ckt2-1
Hardware name:    /LP NF4 Series, BIOS 6.00 PG 03/29/2006
...
Computer crashed: SysRq not responsive.

Apparently I have another hardware problem.
Reply With Quote
Old 27th December 2014
ocicat ocicat is offline
Administrator
 
Join Date: Apr 2008
Posts: 3,253
Default

In re-reading this thread, you mention testing with both 5.6-release & 5.6-stable, & it appears you have spent a considerable time thus far, & it does appear this needs to be brought to the attention of the project developers. However, OpenBSD 5.6 is several months old, & the source code under scrutiny by the developers today is nearing six months past the time 5.6 was tagged in CVS. Formally reporting with 5.6-release or 5.6-stable as test cases, while useful, is not as important as testing with a recent snapshot of -current.

I would suggest the following action:
  • Install a recent snapshot of -current & test again. The first question the project developers will ask is, "What is the behavior seen in an installation of -current?".
  • Thoroughly report to the developers the results seen in -current. Again, the relevant page to look at is:

    http://www.openbsd.org/report.html
Good luck.

Last edited by ocicat; 28th December 2014 at 12:06 AM. Reason: grammar
Reply With Quote
Old 29th December 2014
IdOp's Avatar
IdOp IdOp is offline
Too dumb for a smartphone
 
Join Date: May 2008
Location: twisting on the daemon's fork(2)
Posts: 722
Default

Quote:
Originally Posted by hulten
Apparently I have another hardware problem.
As this seems to be and "older" computer (2006 BIOS in your dmesg) it's worth asking: have you visually checked the capacitors on the motherboard for signs of bulging tops and/or leakage. Especially larger cap's near the CPU. Bad cap's can definitely cause these random kinds of crashes, and they can also be replaced if desired.

ADDED: Capacitor plague
Reply With Quote
Old 30th December 2014
hulten hulten is offline
Port Guard
 
Join Date: Dec 2014
Posts: 24
Default

The idea of checking the capacitors is a good one, I didn't think of this yet. I looked carefully at all capacitors, but they all look like nice cylinders without any bulging or leakage.

I will install a recent OpenBSD snapshot and try to reproduce and report the problem.
Reply With Quote
Old 30th December 2014
hulten hulten is offline
Port Guard
 
Join Date: Dec 2014
Posts: 24
Default

I wanted to install the latest snapshot, so I went to http://ftp.nluug.nl/pub/OpenBSD/snapshots/amd64 and dd'ed install56.fs to usb. However, it gives an "ERR M" — probably my BIOS cannot handle it. The usb stick boots fine in my newer computers. I also have problems booting cd's on which I burned install56.iso (even though they, again, boot fine in my other computers).

Is it fine to boot from the official installation medium (5.6 release), do a clean install and select the software sets from http, pointing to the snapshot (same url as above)? Or should I expect the installation software (5.6 release) to be incompatible with newer software sets (snapshot 2014-12-27)?
Reply With Quote
Old 30th December 2014
Carpetsmoker's Avatar
Carpetsmoker Carpetsmoker is offline
Real Name: Martin
Old man from scene 24
 
Join Date: Apr 2008
Location: Bristol, UK
Posts: 2,169
Default

Quote:
Originally Posted by jggimi View Post
My own experience with memtest86 and memtest+ is that I have never seen false positives.
I have :-)

I remember a certain mainboard (Asus or Intel, don't quite remember) that always gave errors at the same address when PXE was enabled.

I also remember having a certain model of HP workstation (don't remember which model) that gave a whole bunch of errors with memtest86; HP's own memory test ran fine, and so did the system ... I'm pretty sure the memory was fine.
__________________
UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever things.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
bwi0 on Powerbook G4 crashes tsarpal OpenBSD General 3 23rd February 2013 02:40 AM
sparcstation 20 cgfourteen crashes darf NetBSD General 7 11th March 2010 05:06 AM
FreeBSD 7.0 with SSD Crashes map7 FreeBSD General 4 5th February 2009 10:08 PM
net-im/sim-im* crashes blackbox TerryP FreeBSD Ports and Packages 0 28th September 2008 08:29 AM
Akregator crashes map7 FreeBSD Ports and Packages 2 13th July 2008 11:22 PM


All times are GMT. The time now is 02:33 PM.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Content copyright © 2007-2010, the authors
Daemon image copyright ©1988, Marshall Kirk McKusick