DaemonForums  

Go Back   DaemonForums > OpenBSD > OpenBSD Installation and Upgrading

OpenBSD Installation and Upgrading Installing and upgrading OpenBSD.

Reply
 
Thread Tools Display Modes
  #1   (View Single Post)  
Old 25th December 2015
i3luefire i3luefire is offline
New User
 
Join Date: Dec 2015
Posts: 9
Exclamation Can't install 5.8 can install 5.7

I get the below error message when I try to install 5.8.
I have completely installed 5.7 on the exact same hardware before and after the failed install of 5.8.
I tried to install from the install58.fs and from miniroot58.fs and I tried to upgrade a 5.7 install to 5.8.
Every time I get the same error.
Also if I try to ping from the command prompt on the 5.8 install image the first ping goes through and the I get the "Illegal instruction" error I assume this would be true for a number of commands since it seems to happen when the installer tries to unzip something and when I try to ping something.

Code:
zip: stdin: Input/output error
tar: End of archive volume 1 reached
Illegal instruction
ftp: Can't open file ///mnt/usr/share/sysmerge/etc.tgz: No such file or directory
gzip: stdin: unrecognized file format
tar: End of archive volume 1 reached
tar: Sorry, unable to determine archive format.
Installation of base58.tgz failed. Continue anyway? [no]
below is a log of what some ppl in the freenode #openbsd room had me try and some things i tried during.
then a link to some pictures of the problem happening.
then a link to my dmesg output running 5.7
if i could figure out how to copy and paste from the install img to the internet i would give a paste of the dmesg output on the 5.8 install img.
https://gist.github.com/i3luefire/9fd73e1b7f284bc6ca16
https://imgur.com/a/1HJgP
https://gist.github.com/i3luefire/623b62a44affdc47ad44
Reply With Quote
  #2   (View Single Post)  
Old 25th December 2015
jggimi's Avatar
jggimi jggimi is online now
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 5,742
Default

Hello, and welcome!

The illegal instruction error is likely indicative of the root cause. Are you able to install an i386 system successfully?
Reply With Quote
  #3   (View Single Post)  
Old 25th December 2015
i3luefire i3luefire is offline
New User
 
Join Date: Dec 2015
Posts: 9
Default

Yes. Actually and I just tried the 5.9 snapshot of amd64 and the problem still exists. Also
wierder thing is when i use ping it gets one result before getting the error "Illegal instruction"
if i do ping -c 1 google.com it has no error
but with ping -c 2 google.com it has the error

these are some pictures and a dmesg from the i386 install
https://drive.google.com/folderview?...U0&usp=sharing

Last edited by i3luefire; 25th December 2015 at 07:09 AM. Reason: add link.
Reply With Quote
  #4   (View Single Post)  
Old 25th December 2015
jggimi's Avatar
jggimi jggimi is online now
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 5,742
Default

This is very strange. The Celeron G1610 is 64-bit capable, and should be able to run either architecture. Both fail with similar issues, and it appears you've attempted to install from local media as well as from a nearby mirror.

I recommend taking this to the Project for analysis and review. A -current dmesg, and links to photos should be sent to the bugs@ mailing list.

(Now is a good time to do so, if the source of the problem happens to be a software bug. The Project just entered beta testing for 5.9, and they're asking for -current bug reports to be sent in.)
Reply With Quote
  #5   (View Single Post)  
Old 26th December 2015
i3luefire i3luefire is offline
New User
 
Join Date: Dec 2015
Posts: 9
Default

okay. i reported it.
but now i have a new bit of info.
i used my intel i5 laptop to install 5.9 amd64 to an external hard drive and when i put it on the celeron computer it will boot fine like the install media... but i still have the same illegal instruction message when i try to ping with more than -c 1
Reply With Quote
  #6   (View Single Post)  
Old 26th December 2015
jggimi's Avatar
jggimi jggimi is online now
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 5,742
Default

When you run ping(8) from the installed system, does it create a .core file from the illegal instruction error?
Reply With Quote
  #7   (View Single Post)  
Old 26th December 2015
i3luefire i3luefire is offline
New User
 
Join Date: Dec 2015
Posts: 9
Default

No. strangely it does not or at least i can't find it. but i have *.core files from ftp ntpd and tmux. i sent this along with my last update on the mailing list.
here are some core dumps related to this problem.
https://github.com/i3luefire/openbsd...ive/master.zip
and here is the gdb output from those core dumps
https://gist.github.com/i3luefire/3b1177deef1ef473735b
Reply With Quote
  #8   (View Single Post)  
Old 27th December 2015
jggimi's Avatar
jggimi jggimi is online now
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 5,742
Default

I built ntpd (since that was your first core file) with debugging symbols, and ran gdb against your core file, hoping for a match. If this is correct, the failure is in line 262 of ntpd.c:
Code:
if ((nfds = poll(pfd, i, timeout)) == -1)
This is the syscall poll(2).

Last edited by jggimi; 27th December 2015 at 12:21 AM. Reason: typos
Reply With Quote
  #9   (View Single Post)  
Old 27th December 2015
jggimi's Avatar
jggimi jggimi is online now
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 5,742
Default

OK, that syscall is defined in /usr/src/sys/kern/syscalls.master as sys_ppoll. That function is in /usr/src/sys/kern/sys_generic.c.

The revision of sys_generic.c with OpenBSD 5.7 was 1.96. Looking through the syscall and its subfunction that does the work, doppoll(), I can see the addition of a POLLNOHUP loop at 1.98. I don't know if that's applicable to the problem or not, but its the only apparent change since 5.7 to my unskilled eyes.

The commit log says:
Code:
revision 1.98
date: 2015/05/10 22:35:38;  author: millert;  state: Exp;  lines: +5 -3;  commitid: rtX5Mpzd4CgHtDmM;
Set POLLHUP even if no valid events were specified as per POSIX.
Since we use the poll backend for select(2), care must be taken not
to set the fd's bit in writefds in this case.  A kernel-only flag,
POLLNOHUP, is used by selscan() to tell the poll backend not to
return POLLHUP on EOF.  This is currently only used by fifo_poll().
The fifofs regress now passes.  OK guenther@
Here's a an excerpt of the diff between 1.96 and 1.98, just within the dopoll() function:
Code:
@@ -953,8 +940,10 @@ doppoll(struct proc *p, struct pollfd *f
    if ((error = copyin(fds, pl, sz)) != 0)
        goto bad;

-    for (i = 0; i < nfds; i++)
+    for (i = 0; i < nfds; i++) {
+        pl[i].events &= ~POLLNOHUP;
        pl[i].revents = 0;
+    }

    if (tsp != NULL) {
        getnanouptime(&rts);

Last edited by jggimi; 27th December 2015 at 01:00 AM. Reason: typos
Reply With Quote
Old 27th December 2015
i3luefire i3luefire is offline
New User
 
Join Date: Dec 2015
Posts: 9
Default

That does not help me because it is a bit over my head. but thank you for your response. if you think that may help the people on the mailing list solve the problem I hope you will send that reply to the ppl on the mailing list. bugs@
Reply With Quote
Old 27th December 2015
jggimi's Avatar
jggimi jggimi is online now
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 5,742
Default

No, I don't think it will help -- the .core file needs to match the symbols in the source code exactly, and it doesn't. There's nothing in that section of the poll() syscall code that indicates to me anything very special -- the change which touched the code only runs through the array of pl structures, setting variables.

So this morning (my time, just now) I ran a backtrace against your tmux core file, and can see that it's out-of-sync with the source code more clearly. It indicated a library error with event management, but the function noted in the stack was at a different location in source code -- so the symbols were misaligned.

---

The problems are occurring due to an illegal instruction, but I cannot locate the source of it with the information I have. There have been illegal instructions previously reported with virtual Celeron G1610s, as the Xen hypervisor can indicate this model to guest virtual machines...but I didn't find any reported with real Celeron hardware.

I can build you a system from -current source code, and then we'd know that any .core file you create will match that source code exactly. You'd have to install it from your working hardware, and then test again from the non-working hardware, capturing .core files once more.

But you'd have to trust some random guy on the Internet to provide kernels and filesets. Let me know if you'd like to give that a try -- and I'll build a system from source, and retain that source for debugging.
Reply With Quote
Old 27th December 2015
i3luefire i3luefire is offline
New User
 
Join Date: Dec 2015
Posts: 9
Default

I will do it. Just let me know when the img is ready to install.
Reply With Quote
Old 27th December 2015
jggimi's Avatar
jggimi jggimi is online now
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 5,742
Default

Building begins. I'll have a matching source tarball available to you as well as the release(8). It'll be a few hours.
Reply With Quote
Old 27th December 2015
jggimi's Avatar
jggimi jggimi is online now
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 5,742
Default

Build of kernels, userland, and xenocara complete. Links provided via PM.
Reply With Quote
Old 28th December 2015
i3luefire i3luefire is offline
New User
 
Join Date: Dec 2015
Posts: 9
Default

ohhhhh kayyyy. well. i am starting to notice a pattern. at least in some circumstances the core dump can be brought on by attempting an exit. eg if i type tmux then try "exit" tmux core dumps, or if i exit from my ssh session the ssh sshd core dumps, or if i ^c out of top i get a core dump. i have been trying to get info but i had to keep rebooting the machine because if i sshd in and exited the ssh it would core dump the sshd and i could not get back into the machine
Reply With Quote
Old 28th December 2015
jggimi's Avatar
jggimi jggimi is online now
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 5,742
Default

i3luefire has sent me a lot of core files, and I have matching source code. I started with ntpd, as it was discussed earlier. Frame #0 is the failure in the poll(2) syscall, and Frame #1 is the syscall to poll() at line 262 of ntpd.c:

Code:
if ((nfds = poll(pfd, i, timeout)) == -1)
The arguments passed to the poll(2) are: a valid pointer to the pollfd structure array pfd, and two variables: i =3, and timeout = -1.

The variable i defines the number of structures in the pollfd array. The core file shows them:

pfd[0]: fd = 3, events = 1, revents= 0
pfd[1]: fd = 4, events = 1, revents = 0
pfd[2]: fd = 7, events = 1, revents = 0

events = 1 is POLLIN per /usr/include/sys/poll.h, which is defined in the man page as "Data other than high-priority data may be read without blocking."

If the timeout argument is set to -1, the poll() blocks until the condition is met.

This syscall looks valid to me. The failing frame only provides an address ... and as I have the kernel source to match, I should be able to find it with a kernel built with makeoptions DEBUG="-g".

Last edited by jggimi; 28th December 2015 at 02:20 PM. Reason: typo
Reply With Quote
Old 28th December 2015
jggimi's Avatar
jggimi jggimi is online now
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 5,742
Default

OK, that failed. The backtrace has only two frames:
Code:
(gdb) bt
#0  0x00000ee8802c4dda in poll () at <stdin>:2
#1  0x00000ee64bf05e8f in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/usr.sbin/ntpd/ntpd.c:262
and the doppoll function is located in the kernel much further away.
Code:
(gdb) file bsd.gdb
Reading symbols from bsd.gdb...done.
(gdb) info address doppoll
Symbol "doppoll" is a function at address 0xffffffff811a97f0.
I don't know how to debug syscalls, obviously. All I know of them is on page 15 of this presentation.

I'm going to look through the other core files today, and see if I can find other types of errors.

Last edited by jggimi; 28th December 2015 at 03:12 PM. Reason: added link
Reply With Quote
Old 28th December 2015
jggimi's Avatar
jggimi jggimi is online now
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 5,742
Default

I've looked at these core files. All are failing inside of syscalls, though the syscalls vary: poll(2) twice, kevent(2), read(2), waitpid(2).

I'll post findings to bugs@ later today, and ask for assistance. I'm sure there's something easy and obvious which I'm missing regarding syscall debugging.
Reply With Quote
Old 28th December 2015
i3luefire i3luefire is offline
New User
 
Join Date: Dec 2015
Posts: 9
Default

Thanks for all your help so far.

Last edited by i3luefire; 28th December 2015 at 05:37 PM. Reason: simplification
Reply With Quote
Old 28th December 2015
jggimi's Avatar
jggimi jggimi is online now
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 5,742
Default

I've posted to bugs@. Hopefully, we'll get some direction to narrow this down.
Reply With Quote
Reply

Tags
install 5.8

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
SSD Install spitfire_ak OpenBSD Installation and Upgrading 10 30th August 2014 06:56 PM
Install 5.0 from a 4.9 CD? raindog308 OpenBSD Installation and Upgrading 7 24th April 2012 04:00 PM
to install on usb to hd... demonio FreeBSD Installation and Upgrading 1 21st July 2011 05:28 PM
How - To install GNOME vile I install OpenBSD ? looop OpenBSD Installation and Upgrading 6 24th April 2010 08:58 PM
How to install from CD cvr1985 FreeBSD Installation and Upgrading 3 4th June 2008 07:53 PM


All times are GMT. The time now is 01:09 PM.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Content copyright © 2007-2010, the authors
Daemon image copyright ©1988, Marshall Kirk McKusick