|
OpenBSD General Other questions regarding OpenBSD which do not fit in any of the categories below. |
|
Thread Tools | Display Modes |
|
|||
How to troubleshoot a hang in OpenBSD?
I'm trying to get OpenBSD 5.4-release amd64 working on a MacbookAir5,1. So far it runs pretty well...that is, until I use Gnome. After 1-3 hours of Gnome use my machine hangs entirely. I cannot switch to a different virtual console, move the mouse cursor, or do anything besides power cycle the machine. I get the same behavior using Xfce instead of Gnome.
I've searched for answers but I haven't turned up much. I'd like to do more to narrow down the problem and either figure out if this is user error or figure out its a bug and then do what I can to gather information about the bug so I can file a report that the devs will find useful. So I've been looking around for documentation on how to troubleshoot hangs in OpenBSD and that search hasn't turn up much either. I thought I'd ask y'all--do you know how to go about troubleshooting a hang in OpenBSD? Is there any logging I can turn on or clues left behind when the system freezes up? Or should I just experiment with turning various subsystems or hardware on and off to see if I can figure out where the trouble lies that way? Last edited by quisquous; 23rd December 2013 at 02:44 AM. Reason: removing most caps in title to be more like the other post titles |
|
||||
"Hang" when using X can be difficult to diagnose. If there has been a kernel panic, you would have the same symptoms. The default is to enter the ddb(4) kernel debugger, and when running X this cannot be seen. If the keyboard is active in ddb -- and it may not be --, you can blindly type ddb commands such as
ddb> boot crash to force a kernel core dump and reboot. Along with having a keyboard connection to ddb without being able to see ddb, you'll need to have sufficient default swap space to hold a kernel core dump, which means it should be larger than your physical RAM. You won't know if your boot crash command is working unless you can monitor disk I/O. Writing a kernel core dump to swap takes time.You can disable ddb so that you can avoid blind typing, and automatically obtain a kernel dump. See crash(8) as well. If the MacbookAir has a 9-pin serial port, with a null-modem cable you could set up another computer as a serial console. This has two advantages:
|
|
|||
Thanks jggimi! My swap is twice as big as my RAM, so I should be good there. I'll try typing 'boot crash' and waiting 10 minutes next time I get a proper hang.
That said, last night when I was turning off various services in an attempt to see if one of them was the culprit, I tried running plain startx with the default desktop, no gnome, then I played a video for 20m or so using Totem. Screen froze as before, but the audio from the video kept going through what I believe was the end of the video. Pressing keys on the keyboard and mousing didn't do anything, at least, that I could see. So...perhaps I have a video issue and not a hang. I'm going to try this again in Gnome, get the video to play in a loop, see if it hangs or just looks like its hung, try and SSH into the box if the video keeps playing. Maybe there's a way I can reboot the video driver while SSHed in, that would be something. Also, a couple lines in the -current release notes caught my eye:
This is not a Haswell laptop, but perhaps the issue I'm encountering is fixed in -current. Seems I'm going to have to bite the bullet and install -current. |
|
|||
I installed -current but it still hangs every couple hours or so. It varies between a hang where I can still SSH into the box, and ones where I cannot SSH or ping the box. I have a new appreciation for the package maintainers--building Gnome takes a long time.
|
|
|||
Gnome3 adds many additional layers, gstreamer pulseaudio and video composting, to the base OS. Some of the layers are not BSD friendly. If you are using totem in fvwm, it likely utilizes pulseaudio and gstreamer. The Parole video player in XFCE4, which is based on Totem, also uses gstreamer but not pulseaudio. I believe in 5.4 and current 2 versions of gstreamer are utilized as gnome3 required the newer version. Also, if you are starting fvwm from gdm, some of these services may be running in the background.
top in the fvwm xterm will let you know. Another option is to try a different video players. Both VLC and mplayer utilize sndio directly and do not depend on gstreamer. Lack of hangs in either of these two media players would tend to focus debugging efforts and also assist you in choosing a desktop. Last edited by shep; 31st December 2013 at 04:15 PM. Reason: added comment on composting |
|
|||
The screen is frozen now, in fact, but I can still ssh in. Here's what those commands show:
Code:
$ top load averages: 1.21, 1.13, 1.09 my.host.name 07:43:29 75 processes: 74 idle, 1 on processor CPU0 states: 0.2% user, 1.1% nice, 0.6% system, 0.3% interrupt, 97.8% idle CPU1 states: 0.3% user, 2.0% nice, 0.9% system, 0.0% interrupt, 96.8% idle CPU2 states: 0.5% user, 1.5% nice, 0.9% system, 0.0% interrupt, 97.1% idle CPU3 states: 0.8% user, 1.5% nice, 0.9% system, 0.0% interrupt, 96.8% idle Memory: Real: 738M/1375M act/tot Free: 6493M Cache: 328M Swap: 0K/8339M PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU COMMAND 1883 root 2 0 3628K 2996K sleep/2 poll 0:00 0.15% sshd 10182 scott 2 0 3532K 2320K sleep/2 select 0:00 0.05% sshd 26650 scott 2 10 6972K 21M idle select 19:22 0.00% deja-dup 9362 scott 2 0 315M 305M idle select 5:39 0.00% firefox 29829 scott 2 0 150M 167M idle select 2:35 0.00% gnome-shell 29592 _x11 10 0 16M 18M idle acpilk 1:55 0.00% Xorg 13140 scott 2 0 102M 114M idle poll 1:31 0.00% evolution-c 19990 scott 2 0 10M 28M idle select 0:05 0.00% gnome-setti 14059 scott 2 0 7984K 6224K idle poll 0:05 0.00% gnome-keyri 15729 scott 2 0 6640K 31M idle poll 0:04 0.00% goa-daemon 19072 scott 2 0 8004K 14M idle poll 0:03 0.00% gnome-shell 30873 root 2 0 424K 848K sleep/3 kqread 0:02 0.00% apmd 21552 scott 2 0 11M 18M idle poll 0:02 0.00% evolution-s 23271 scott 2 19 6996K 14M sleep/1 poll 0:02 0.00% tracker-min 29741 _colord 2 0 3072K 6152K sleep/3 poll 0:02 0.00% colord Code:
$ systat 1 users Load 1.34 1.56 1.41 Tue Dec 31 07:56:30 2013 memory totals (in KB) PAGING SWAPPING Interrupts real virtual free in out in out 408 total Active 757388 757388 6647696 ops 399 clock All 1409320 1409320 15186900 pages 1 ipi acpi0 Proc:r d s w Csw Trp Sys Int Sof Flt forks inteldrm 5 15 234 16 163 8 108 27 fkppw ehci0 fksvm azalia0 0.0%Int 0.0%Sys 0.0%Usr 0.0%Nic 100.0%Idle pwait 8 ehci1 | | | | | | | | | | | relck ahci0 rlkok noram Namei Sys-cache Proc-cache No-cache 1 ndcpy Calls hits % hits % miss % fltcp 22 22 100 3 zfod cow Disks sd0 sd1 67141 fmin seeks 89521 ftarg xfers itarg speed 35823 wired IPKTS sec pdfre OPKTS Code:
$ vmstat procs memory page disks traps cpu r b w avm fre flt re pi po fr sr sd0 sd1 int sys cs us sy id 1 5 0 755904 6649180 128 0 0 0 0 0 27 27 33 2671 394 2 1 97 |
|
|||
Quote:
|
|
||||
There is only one thing that stands out for me in your status reports, and that is from top(1): Xorg is in an ACPI lock state. This has been previously reported to the misc@ mailing list by someone with similar hardware. The discussion continued on tech@. I did not see a resolution in either thread.
I noted that your swap size and RAM appear to be 1:1, not 2:1, and at 1:1 it is possible that swap is not large enough to store a core dump in the event of a kernel panic or by your forcing one through ddb. Note that kernel core dumps will only be stored on the default swap device, should you have more than one. Last edited by jggimi; 31st December 2013 at 07:33 PM. Reason: clarity |
|
|||
Hmm...I neglected to create a big enough swap when I switched to -current. I'll repartition and reinstall so I can capture core dumps. I posted to the tech list in reply to the existing thread, though I munged a couple things on my post (didn't reply to the thread properly and didn't wrap the lines properly) so, lower chance of it being useful.
|
|
|||
Yes! Mark's suggestions gets me further along before I freeze. I suspect his workaround gets me to the next layer in the onion.
http://marc.info/?l=openbsd-tech&m=138852967629606&w=2 Next I think I need to expand my swap and figure out how to get a crash dump. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How do I troubleshoot an internal interface | BinarySpike | OpenBSD General | 3 | 1st September 2011 04:11 AM |
Introduction plus SSH login hang question. | pico | OpenBSD General | 7 | 2nd April 2010 06:08 AM |