DaemonForums  

Go Back   DaemonForums > OpenBSD > OpenBSD General

OpenBSD General Other questions regarding OpenBSD which do not fit in any of the categories below.

Reply
 
Thread Tools Display Modes
  #1   (View Single Post)  
Old 23rd December 2013
quisquous quisquous is offline
Port Guard
 
Join Date: Sep 2013
Posts: 10
Thanked 0 Times in 0 Posts
Question How to troubleshoot a hang in OpenBSD?

I'm trying to get OpenBSD 5.4-release amd64 working on a MacbookAir5,1. So far it runs pretty well...that is, until I use Gnome. After 1-3 hours of Gnome use my machine hangs entirely. I cannot switch to a different virtual console, move the mouse cursor, or do anything besides power cycle the machine. I get the same behavior using Xfce instead of Gnome.

I've searched for answers but I haven't turned up much. I'd like to do more to narrow down the problem and either figure out if this is user error or figure out its a bug and then do what I can to gather information about the bug so I can file a report that the devs will find useful. So I've been looking around for documentation on how to troubleshoot hangs in OpenBSD and that search hasn't turn up much either.

I thought I'd ask y'all--do you know how to go about troubleshooting a hang in OpenBSD? Is there any logging I can turn on or clues left behind when the system freezes up? Or should I just experiment with turning various subsystems or hardware on and off to see if I can figure out where the trouble lies that way?

Last edited by quisquous; 23rd December 2013 at 02:44 AM. Reason: removing most caps in title to be more like the other post titles
Reply With Quote
  #2   (View Single Post)  
Old 23rd December 2013
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 3,894
Thanked 214 Times in 189 Posts
Default

"Hang" when using X can be difficult to diagnose. If there has been a kernel panic, you would have the same symptoms. The default is to enter the ddb(4) kernel debugger, and when running X this cannot be seen. If the keyboard is active in ddb -- and it may not be --, you can blindly type ddb commands such as ddb> boot crash to force a kernel core dump and reboot. Along with having a keyboard connection to ddb without being able to see ddb, you'll need to have sufficient default swap space to hold a kernel core dump, which means it should be larger than your physical RAM. You won't know if your boot crash command is working unless you can monitor disk I/O. Writing a kernel core dump to swap takes time.

You can disable ddb so that you can avoid blind typing, and automatically obtain a kernel dump. See crash(8) as well.

If the MacbookAir has a 9-pin serial port, with a null-modem cable you could set up another computer as a serial console. This has two advantages:
  1. If there is a kernel panic, you can actually see it.
  2. If there is a real hang, you can force the kernel to enter ddb. See ddb(4).
Reply With Quote
  #3   (View Single Post)  
Old 23rd December 2013
quisquous quisquous is offline
Port Guard
 
Join Date: Sep 2013
Posts: 10
Thanked 0 Times in 0 Posts
Default

Thanks jggimi! My swap is twice as big as my RAM, so I should be good there. I'll try typing 'boot crash' and waiting 10 minutes next time I get a proper hang.

That said, last night when I was turning off various services in an attempt to see if one of them was the culprit, I tried running plain startx with the default desktop, no gnome, then I played a video for 20m or so using Totem. Screen froze as before, but the audio from the video kept going through what I believe was the end of the video. Pressing keys on the keyboard and mousing didn't do anything, at least, that I could see.

So...perhaps I have a video issue and not a hang. I'm going to try this again in Gnome, get the video to play in a loop, see if it hangs or just looks like its hung, try and SSH into the box if the video keeps playing. Maybe there's a way I can reboot the video driver while SSHed in, that would be something.

Also, a couple lines in the -current release notes caught my eye:
  • "Made intel(4) clflush() flush the correct cache line on i386/amd64. Fixes gnome screen corruption and hangs."
  • "Bugfixes to drm(4) i915 code to avoid possible Haswell system hangs and GPU locks."

This is not a Haswell laptop, but perhaps the issue I'm encountering is fixed in -current. Seems I'm going to have to bite the bullet and install -current.
Reply With Quote
  #4   (View Single Post)  
Old 30th December 2013
quisquous quisquous is offline
Port Guard
 
Join Date: Sep 2013
Posts: 10
Thanked 0 Times in 0 Posts
Default

I installed -current but it still hangs every couple hours or so. It varies between a hang where I can still SSH into the box, and ones where I cannot SSH or ping the box. I have a new appreciation for the package maintainers--building Gnome takes a long time.
Reply With Quote
  #5   (View Single Post)  
Old 30th December 2013
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 3,894
Thanked 214 Times in 189 Posts
Default

In those times when you are able to reach the workstation via SSH, what have you been able to discover? (e.g.: top(1), systat(1), vmstat(8), etc.)

In those times when you are not able to use SSH, what have you been able to discover? (eg. setting ddb.panic=0 to avoid dropping into ddb, or setting ddb.console=1 and invoking ddb through one of the console methods and blindly typing boot crash, etc.)
Reply With Quote
  #6   (View Single Post)  
Old 31st December 2013
shep shep is offline
ISO Quartermaster
 
Join Date: May 2008
Location: Dry and Dusty
Posts: 632
Thanked 9 Times in 9 Posts
Default

Gnome3 adds many additional layers, gstreamer pulseaudio and video composting, to the base OS. Some of the layers are not BSD friendly. If you are using totem in fvwm, it likely utilizes pulseaudio and gstreamer. The Parole video player in XFCE4, which is based on Totem, also uses gstreamer but not pulseaudio. I believe in 5.4 and current 2 versions of gstreamer are utilized as gnome3 required the newer version. Also, if you are starting fvwm from gdm, some of these services may be running in the background. top in the fvwm xterm will let you know.

Another option is to try a different video players. Both VLC and mplayer utilize sndio directly and do not depend on gstreamer. Lack of hangs in either of these two media players would tend to focus debugging efforts and also assist you in choosing a desktop.

Last edited by shep; 31st December 2013 at 04:15 PM. Reason: added comment on composting
Reply With Quote
  #7   (View Single Post)  
Old 31st December 2013
quisquous quisquous is offline
Port Guard
 
Join Date: Sep 2013
Posts: 10
Thanked 0 Times in 0 Posts
Default

The screen is frozen now, in fact, but I can still ssh in. Here's what those commands show:

Code:
$ top
load averages:  1.21,  1.13,  1.09                       my.host.name 07:43:29
75 processes: 74 idle, 1 on processor
CPU0 states:  0.2% user,  1.1% nice,  0.6% system,  0.3% interrupt, 97.8% idle
CPU1 states:  0.3% user,  2.0% nice,  0.9% system,  0.0% interrupt, 96.8% idle
CPU2 states:  0.5% user,  1.5% nice,  0.9% system,  0.0% interrupt, 97.1% idle
CPU3 states:  0.8% user,  1.5% nice,  0.9% system,  0.0% interrupt, 96.8% idle
Memory: Real: 738M/1375M act/tot Free: 6493M Cache: 328M Swap: 0K/8339M

  PID USERNAME PRI NICE  SIZE   RES STATE     WAIT      TIME    CPU COMMAND
 1883 root       2    0 3628K 2996K sleep/2   poll      0:00  0.15% sshd
10182 scott      2    0 3532K 2320K sleep/2   select    0:00  0.05% sshd
26650 scott      2   10 6972K   21M idle      select   19:22  0.00% deja-dup
 9362 scott      2    0  315M  305M idle      select    5:39  0.00% firefox
29829 scott      2    0  150M  167M idle      select    2:35  0.00% gnome-shell
29592 _x11      10    0   16M   18M idle      acpilk    1:55  0.00% Xorg
13140 scott      2    0  102M  114M idle      poll      1:31  0.00% evolution-c
19990 scott      2    0   10M   28M idle      select    0:05  0.00% gnome-setti
14059 scott      2    0 7984K 6224K idle      poll      0:05  0.00% gnome-keyri
15729 scott      2    0 6640K   31M idle      poll      0:04  0.00% goa-daemon
19072 scott      2    0 8004K   14M idle      poll      0:03  0.00% gnome-shell
30873 root       2    0  424K  848K sleep/3   kqread    0:02  0.00% apmd
21552 scott      2    0   11M   18M idle      poll      0:02  0.00% evolution-s
23271 scott      2   19 6996K   14M sleep/1   poll      0:02  0.00% tracker-min
29741 _colord    2    0 3072K 6152K sleep/3   poll      0:02  0.00% colord
Code:
$ systat
    1 users    Load 1.34 1.56 1.41                     Tue Dec 31 07:56:30 2013

            memory totals (in KB)            PAGING   SWAPPING     Interrupts
           real   virtual     free           in  out   in  out      408 total
Active   757388    757388  6647696   ops                            399 clock
All     1409320   1409320 15186900   pages                            1 ipi
                                                                        acpi0
Proc:r  d  s  w    Csw   Trp   Sys   Int   Sof  Flt       forks         inteldrm
        5 15       234    16   163     8   108   27       fkppw         ehci0
                                                          fksvm         azalia0
   0.0%Int   0.0%Sys   0.0%Usr   0.0%Nic 100.0%Idle       pwait       8 ehci1
|    |    |    |    |    |    |    |    |    |    |       relck         ahci0
                                                          rlkok
                                                          noram
Namei         Sys-cache    Proc-cache    No-cache       1 ndcpy
    Calls     hits    %    hits     %    miss   %         fltcp
       22       22  100                                 3 zfod
                                                          cow
Disks   sd0   sd1                                   67141 fmin
seeks                                               89521 ftarg
xfers                                                     itarg
speed                                               35823 wired         IPKTS
  sec                                                     pdfre         OPKTS
Code:
$ vmstat
procs    memory       page                    disks    traps          cpu
 r b w    avm     fre  flt  re  pi  po  fr  sr sd0 sd1  int   sys   cs us sy id
 1 5 0 755904 6649180  128   0   0   0   0   0  27  27   33  2671  394  2  1 97
Reply With Quote
  #8   (View Single Post)  
Old 31st December 2013
quisquous quisquous is offline
Port Guard
 
Join Date: Sep 2013
Posts: 10
Thanked 0 Times in 0 Posts
Default

Quote:
Originally Posted by shep View Post
Gnome3 adds many additional layers, gstreamer pulseaudio and video composting, to the base OS. Some of the layers are not BSD friendly. If you are using totem in fvwm, it likely utilizes pulseaudio and gstreamer. The Parole video player in XFCE4, which is based on Totem, also uses gstreamer but not pulseaudio. I believe in 5.4 and current 2 versions of gstreamer are utilized as gnome3 required the newer version. Also, if you are starting fvwm from gdm, some of these services may be running in the background. top in the fvwm xterm will let you know.

Another option is to try a different video players. Both VLC and mplayer utilize sndio directly and do not depend on gstreamer. Lack of hangs in either of these two media players would tend to focus debugging efforts and also assist you in choosing a desktop.
Thanks shep. I'm able to reproduce the problem when gdm is not running and I run startx using the default fvmw and then run Firefox and leave it open for a couple hours, so its not specific to playing video. Basically running X for a couple hours, regardless of what I'm doing within X, leads to the screen locking up. The screen does not freeze when I'm outside of X, i.e. using a virtual console. I note that it does not freeze on the virtual console even if gdm *is* running. It seems like something that's happening while X controls the screen is leading to the video freeze. When the screen is frozen, CTRL-ALT-F1 does not unfreeze the screen and take me to a virtual console. BUT, notably, when the screen is frozen and I ssh in and issue the reboot command, right before the shutdown sequence finishes and right before the machine reboots, the screen unfreezes and the last shutdown event lines of text appear. So it would seem there may be some way short of a power cycle to get the screen unfrozen.
Reply With Quote
  #9   (View Single Post)  
Old 31st December 2013
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 3,894
Thanked 214 Times in 189 Posts
Default

There is only one thing that stands out for me in your status reports, and that is from top(1): Xorg is in an ACPI lock state. This has been previously reported to the misc@ mailing list by someone with similar hardware. The discussion continued on tech@. I did not see a resolution in either thread.

I noted that your swap size and RAM appear to be 1:1, not 2:1, and at 1:1 it is possible that swap is not large enough to store a core dump in the event of a kernel panic or by your forcing one through ddb. Note that kernel core dumps will only be stored on the default swap device, should you have more than one.

Last edited by jggimi; 31st December 2013 at 07:33 PM. Reason: clarity
Reply With Quote
Old 31st December 2013
quisquous quisquous is offline
Port Guard
 
Join Date: Sep 2013
Posts: 10
Thanked 0 Times in 0 Posts
Default

Hmm...I neglected to create a big enough swap when I switched to -current. I'll repartition and reinstall so I can capture core dumps. I posted to the tech list in reply to the existing thread, though I munged a couple things on my post (didn't reply to the thread properly and didn't wrap the lines properly) so, lower chance of it being useful.
Reply With Quote
Old 1st January 2014
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 3,894
Thanked 214 Times in 189 Posts
Default

It looks like Mark K. has a circumvention for you.
Reply With Quote
Old 1st January 2014
quisquous quisquous is offline
Port Guard
 
Join Date: Sep 2013
Posts: 10
Thanked 0 Times in 0 Posts
Default

Yes! Mark's suggestions gets me further along before I freeze. I suspect his workaround gets me to the next layer in the onion.

http://marc.info/?l=openbsd-tech&m=138852967629606&w=2

Next I think I need to expand my swap and figure out how to get a crash dump.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How do I troubleshoot an internal interface BinarySpike OpenBSD General 3 1st September 2011 04:11 AM
Introduction plus SSH login hang question. pico OpenBSD General 7 2nd April 2010 06:08 AM


All times are GMT. The time now is 06:39 PM.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content copyright © 2007-2010, the authors
Daemon image copyright ©1988, Marshall Kirk McKusick