|
OpenBSD Installation and Upgrading Installing and upgrading OpenBSD. |
|
Thread Tools | Display Modes |
|
|||
Can't install 5.8 can install 5.7
I get the below error message when I try to install 5.8.
I have completely installed 5.7 on the exact same hardware before and after the failed install of 5.8. I tried to install from the install58.fs and from miniroot58.fs and I tried to upgrade a 5.7 install to 5.8. Every time I get the same error. Also if I try to ping from the command prompt on the 5.8 install image the first ping goes through and the I get the "Illegal instruction" error I assume this would be true for a number of commands since it seems to happen when the installer tries to unzip something and when I try to ping something. Code:
zip: stdin: Input/output error tar: End of archive volume 1 reached Illegal instruction ftp: Can't open file ///mnt/usr/share/sysmerge/etc.tgz: No such file or directory gzip: stdin: unrecognized file format tar: End of archive volume 1 reached tar: Sorry, unable to determine archive format. Installation of base58.tgz failed. Continue anyway? [no] then a link to some pictures of the problem happening. then a link to my dmesg output running 5.7 if i could figure out how to copy and paste from the install img to the internet i would give a paste of the dmesg output on the 5.8 install img. https://gist.github.com/i3luefire/9fd73e1b7f284bc6ca16 https://imgur.com/a/1HJgP https://gist.github.com/i3luefire/623b62a44affdc47ad44 |
|
|||
Yes. Actually and I just tried the 5.9 snapshot of amd64 and the problem still exists. Also
wierder thing is when i use ping it gets one result before getting the error "Illegal instruction" if i do ping -c 1 google.com it has no error but with ping -c 2 google.com it has the error these are some pictures and a dmesg from the i386 install https://drive.google.com/folderview?...U0&usp=sharing Last edited by i3luefire; 25th December 2015 at 07:09 AM. Reason: add link. |
|
|||
okay. i reported it.
but now i have a new bit of info. i used my intel i5 laptop to install 5.9 amd64 to an external hard drive and when i put it on the celeron computer it will boot fine like the install media... but i still have the same illegal instruction message when i try to ping with more than -c 1 |
|
|||
No. strangely it does not or at least i can't find it. but i have *.core files from ftp ntpd and tmux. i sent this along with my last update on the mailing list.
here are some core dumps related to this problem. https://github.com/i3luefire/openbsd...ive/master.zip and here is the gdb output from those core dumps https://gist.github.com/i3luefire/3b1177deef1ef473735b |
|
||||
I built ntpd (since that was your first core file) with debugging symbols, and ran gdb against your core file, hoping for a match. If this is correct, the failure is in line 262 of ntpd.c:
Code:
if ((nfds = poll(pfd, i, timeout)) == -1) Last edited by jggimi; 27th December 2015 at 12:21 AM. Reason: typos |
|
||||
OK, that syscall is defined in /usr/src/sys/kern/syscalls.master as sys_ppoll. That function is in /usr/src/sys/kern/sys_generic.c.
The revision of sys_generic.c with OpenBSD 5.7 was 1.96. Looking through the syscall and its subfunction that does the work, doppoll(), I can see the addition of a POLLNOHUP loop at 1.98. I don't know if that's applicable to the problem or not, but its the only apparent change since 5.7 to my unskilled eyes. The commit log says: Code:
revision 1.98 date: 2015/05/10 22:35:38; author: millert; state: Exp; lines: +5 -3; commitid: rtX5Mpzd4CgHtDmM; Set POLLHUP even if no valid events were specified as per POSIX. Since we use the poll backend for select(2), care must be taken not to set the fd's bit in writefds in this case. A kernel-only flag, POLLNOHUP, is used by selscan() to tell the poll backend not to return POLLHUP on EOF. This is currently only used by fifo_poll(). The fifofs regress now passes. OK guenther@ Code:
@@ -953,8 +940,10 @@ doppoll(struct proc *p, struct pollfd *f if ((error = copyin(fds, pl, sz)) != 0) goto bad; - for (i = 0; i < nfds; i++) + for (i = 0; i < nfds; i++) { + pl[i].events &= ~POLLNOHUP; pl[i].revents = 0; + } if (tsp != NULL) { getnanouptime(&rts); Last edited by jggimi; 27th December 2015 at 01:00 AM. Reason: typos |
|
|||
That does not help me because it is a bit over my head. but thank you for your response. if you think that may help the people on the mailing list solve the problem I hope you will send that reply to the ppl on the mailing list. bugs@
|
|
||||
No, I don't think it will help -- the .core file needs to match the symbols in the source code exactly, and it doesn't. There's nothing in that section of the poll() syscall code that indicates to me anything very special -- the change which touched the code only runs through the array of pl structures, setting variables.
So this morning (my time, just now) I ran a backtrace against your tmux core file, and can see that it's out-of-sync with the source code more clearly. It indicated a library error with event management, but the function noted in the stack was at a different location in source code -- so the symbols were misaligned. --- The problems are occurring due to an illegal instruction, but I cannot locate the source of it with the information I have. There have been illegal instructions previously reported with virtual Celeron G1610s, as the Xen hypervisor can indicate this model to guest virtual machines...but I didn't find any reported with real Celeron hardware. I can build you a system from -current source code, and then we'd know that any .core file you create will match that source code exactly. You'd have to install it from your working hardware, and then test again from the non-working hardware, capturing .core files once more. But you'd have to trust some random guy on the Internet to provide kernels and filesets. Let me know if you'd like to give that a try -- and I'll build a system from source, and retain that source for debugging. |
|
|||
I will do it. Just let me know when the img is ready to install.
|
|
|||
ohhhhh kayyyy. well. i am starting to notice a pattern. at least in some circumstances the core dump can be brought on by attempting an exit. eg if i type tmux then try "exit" tmux core dumps, or if i exit from my ssh session the ssh sshd core dumps, or if i ^c out of top i get a core dump. i have been trying to get info but i had to keep rebooting the machine because if i sshd in and exited the ssh it would core dump the sshd and i could not get back into the machine
|
|
||||
i3luefire has sent me a lot of core files, and I have matching source code. I started with ntpd, as it was discussed earlier. Frame #0 is the failure in the poll(2) syscall, and Frame #1 is the syscall to poll() at line 262 of ntpd.c:
Code:
if ((nfds = poll(pfd, i, timeout)) == -1) The variable i defines the number of structures in the pollfd array. The core file shows them: pfd[0]: fd = 3, events = 1, revents= 0 pfd[1]: fd = 4, events = 1, revents = 0 pfd[2]: fd = 7, events = 1, revents = 0 events = 1 is POLLIN per /usr/include/sys/poll.h, which is defined in the man page as "Data other than high-priority data may be read without blocking." If the timeout argument is set to -1, the poll() blocks until the condition is met. This syscall looks valid to me. The failing frame only provides an address ... and as I have the kernel source to match, I should be able to find it with a kernel built with makeoptions DEBUG="-g". Last edited by jggimi; 28th December 2015 at 02:20 PM. Reason: typo |
|
||||
OK, that failed. The backtrace has only two frames:
Code:
(gdb) bt #0 0x00000ee8802c4dda in poll () at <stdin>:2 #1 0x00000ee64bf05e8f in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/usr.sbin/ntpd/ntpd.c:262 Code:
(gdb) file bsd.gdb Reading symbols from bsd.gdb...done. (gdb) info address doppoll Symbol "doppoll" is a function at address 0xffffffff811a97f0. I'm going to look through the other core files today, and see if I can find other types of errors. Last edited by jggimi; 28th December 2015 at 03:12 PM. Reason: added link |
|
||||
I've looked at these core files. All are failing inside of syscalls, though the syscalls vary: poll(2) twice, kevent(2), read(2), waitpid(2).
I'll post findings to bugs@ later today, and ask for assistance. I'm sure there's something easy and obvious which I'm missing regarding syscall debugging. |
|
|||
Thanks for all your help so far.
Last edited by i3luefire; 28th December 2015 at 05:37 PM. Reason: simplification |
Tags |
install 5.8 |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
SSD Install | spitfire_ak | OpenBSD Installation and Upgrading | 10 | 30th August 2014 06:56 PM |
Install 5.0 from a 4.9 CD? | raindog308 | OpenBSD Installation and Upgrading | 7 | 24th April 2012 04:00 PM |
to install on usb to hd... | demonio | FreeBSD Installation and Upgrading | 1 | 21st July 2011 05:28 PM |
How - To install GNOME vile I install OpenBSD ? | looop | OpenBSD Installation and Upgrading | 6 | 24th April 2010 08:58 PM |
How to install from CD | cvr1985 | FreeBSD Installation and Upgrading | 3 | 4th June 2008 07:53 PM |