DaemonForums  

Go Back   DaemonForums > OpenBSD > OpenBSD General

OpenBSD General Other questions regarding OpenBSD which do not fit in any of the categories below.

Reply
 
Thread Tools Display Modes
  #1   (View Single Post)  
Old 15th November 2016
MatthiasKoch MatthiasKoch is offline
Real Name: Matthias Koch
Port Guard
 
Join Date: Mar 2016
Location: Germany
Posts: 37
Default NTP 4.2.8 loses sync, system time far out

Good $TIMEOFDAY,

I am running a number of virtual OpenBSD-Hosts on a VMware-vSphere, with
the stock ntpd from packages. Their ntpd.confs are pretty straightforward:
Code:
# $OpenBSD: ntpd.conf,v 1.14 2015/07/15 20:28:37 ajacoutot Exp $
#
# See ntpd.conf(5) and /etc/examples/ntpd.conf

servers pool.ntp.org
constraints from "https://www.google.com"
While all machines are syncing properly, one of them (openBSD 6.0) keeps
losing sync and drifts waaay out. A newstart of ntpd brings the system
time back on track, but after a few minutes it loses sync again and
drifts out.

ntpd ist started with ntpd_flags='-s -v'.

Here's a typical output:
Code:
Nov 15 13:57:24 n2 ntpd[9622]: ntp engine ready
Nov 15 13:57:51 n2 ntpd[94990]: set local clock to Tue Nov 15 13:57:51
CET 2016 (offset 26.320751s)
Nov 15 13:57:52 n2 ntpd[9622]: constraint reply from 172.217.16.36:
offset -0.877704
Nov 15 13:58:09 n2 ntpd[9622]: peer 52.59.88.68 now valid
Nov 15 13:58:13 n2 ntpd[9622]: peer 176.9.253.75 now valid
Nov 15 13:58:15 n2 ntpd[9622]: peer 78.46.189.152 now valid
Nov 15 13:58:18 n2 ntpd[9622]: peer 87.106.126.46 now valid
Nov 15 14:02:22 n2 ntpd[39746]: adjusting local clock by 3.956680s
Nov 15 14:02:22 n2 ntpd[9622]: clock is now synced
Nov 15 14:02:34 n2 ntpd[9622]: peer 52.59.88.68 now invalid
Nov 15 14:06:11 n2 ntpd[39746]: adjusting local clock by 3.816125s
Nov 15 14:06:11 n2 ntpd[9622]: clock is now unsynced
This is where the first "unsynced" message appears, less than 10 minutes
after restarting ntpd. From that point on, the clock stays out of sync
and veers off.

An almost identical virtual machine, running on the same host, is
unaffected.

I'm particularly wondering what's happening here:
Code:
Nov 15 14:06:11 n2 ntpd[9622]: clock is now unsynced
Any input would greatly be appreciated.

Matthias

Last edited by ocicat; 16th November 2016 at 10:08 PM. Reason: Please use [code] & [/code] tags when posting file contents.
Reply With Quote
  #2   (View Single Post)  
Old 15th November 2016
TronDD TronDD is offline
Spam Deminer
 
Join Date: Sep 2014
Posts: 307
Default

Uh...things to look at, off the top of my head:

VMware can set time on an OpenBSD guest via a sensor. Check your VM settings to see if you have that enabled/disabled on the effected VM vs the others. Do you enable the sensor in any ntpd.conf? I think "sensors *" is in the default config.

You can also check the current status of servers and sensors and constraints with "ntpctl -sa".
Reply With Quote
  #3   (View Single Post)  
Old 15th November 2016
MatthiasKoch MatthiasKoch is offline
Real Name: Matthias Koch
Port Guard
 
Join Date: Mar 2016
Location: Germany
Posts: 37
Default

The ntpd.conf files contain the directive
Code:
sensor *
but it makes no difference if it's activated or not.

I have two more or less identical OpenBSD-6.0 virtual machines with 100% identical ntpd.conf files. Currently I am watching both. One is drifting around no more than 200 millisecs (normally way below that), the other is losing sync within minutes, and hardly ever gets back into sync (probably because the max allowed deviation is exceeded).

Last edited by ocicat; 16th November 2016 at 12:00 PM. Reason: Please use [code] & [/code] tags when posting file contents.
Reply With Quote
  #4   (View Single Post)  
Old 15th November 2016
TronDD TronDD is offline
Spam Deminer
 
Join Date: Sep 2014
Posts: 307
Default

In the VM's configuration on the host, you can enable or disable controlling the guest's clock. That's that only other thing I can think to check.

Also try finding a local timeserver (or two) and using that instead of the pool servers. The way the pool works, each system can get different servers at any time. There is no consistancy there for comparison.

EDIT: Also, you said "stock ntpd from packages". Did you install NTPd from packages or do you mean the base openntpd?
Reply With Quote
  #5   (View Single Post)  
Old 15th November 2016
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 8,019
Default

Do both guests have vmt(4) devices in their dmesg? This is the VMWare Tools driver used as the timedelta sensor.
Reply With Quote
  #6   (View Single Post)  
Old 15th November 2016
MatthiasKoch MatthiasKoch is offline
Real Name: Matthias Koch
Port Guard
 
Join Date: Mar 2016
Location: Germany
Posts: 37
Default

Quote:
Originally Posted by jggimi View Post
Do both guests have vmt(4) devices in their dmesg? This is the VMWare Tools driver used as the timedelta sensor.
Odd...

dmesg|grep vmt has
Code:
vmt0 at pvbus0
vmt0 at pvbus0
on the defective host,

three of the above entries on the reference host,
one entry on another, also unaffected (properly working) host.


EDIT:
Code:
sysctl | grep hw
...
hw.sensors.vmt0.timedelta0=0.042866 secs, OK, Tue Nov 15 17:12:41.440
...

Last edited by ocicat; 16th November 2016 at 06:41 PM. Reason: Please use [code] & [/code] tags when posting file contents.
Reply With Quote
  #7   (View Single Post)  
Old 15th November 2016
MatthiasKoch MatthiasKoch is offline
Real Name: Matthias Koch
Port Guard
 
Join Date: Mar 2016
Location: Germany
Posts: 37
Default

Quote:
Originally Posted by TronDD View Post
In the VM's configuration on the host, you can enable or disable controlling the guest's clock. That's that only other thing I can think to check.
I'll check for that. However, the machines have been installed from templates, and should have the same virtual hardware. Anyway, worth a try.

Quote:
Originally Posted by TronDD View Post
Also try finding a local timeserver (or two) and using that instead of the pool servers. The way the pool works, each system can get different servers at any time. There is no consistancy there for comparison.
That's understood. This is mainly to show that all hosts are working properly while one is not.

Quote:
Originally Posted by TronDD View Post
EDIT: Also, you said "stock ntpd from packages". Did you install NTPd from packages or do you mean the base openntpd?
It's the ntpd that comes with OpenBSD, from the packages.


EDIT: In the VM configuration I have found a "synchronize guet stime with host" option. However, it is unchecked in ALL machines.

Last edited by MatthiasKoch; 15th November 2016 at 04:32 PM.
Reply With Quote
  #8   (View Single Post)  
Old 15th November 2016
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 8,019
Default

Do not count the number of vmt0 occurrences.

The dmesg buffer wraps, and if the platform does not clear memory on boot, multiple dmesg entries may be be stored in the buffer. See the dmesg(8) man page, and then scan the buffers with less(1) or more(1).

It appears that your guests all have an active vmt(4) driver.
Reply With Quote
  #9   (View Single Post)  
Old 15th November 2016
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 8,019
Default

Quote:
Originally Posted by MatthiasKoch View Post
It's the ntpd that comes with OpenBSD, from the packages.
The built-in ntpd(8) is not the same as the net/ntp package.

Packages are third party applications that are not included with OpenBSD, but have been ported to OpenBSD.

If you are using the built-in ntpd, the daemon is running /usr/sbin/ntpd. If you are using the third party net/ntp package, the daemon is running /usr/local/sbin/ntpd.

Which are you using?
Reply With Quote
Old 15th November 2016
MatthiasKoch MatthiasKoch is offline
Real Name: Matthias Koch
Port Guard
 
Join Date: Mar 2016
Location: Germany
Posts: 37
Default

Quote:
Originally Posted by jggimi View Post
The built-in ntpd(8) is not the same as the net/ntp package.
If you are using the built-in ntpd, the daemon is running /usr/sbin/ntpd.
Looks like I confused the terminology here... it's the built-in ntpd, running from /usr/sbin/ntpd.
Reply With Quote
Old 15th November 2016
MatthiasKoch MatthiasKoch is offline
Real Name: Matthias Koch
Port Guard
 
Join Date: Mar 2016
Location: Germany
Posts: 37
Default

Quote:
Originally Posted by jggimi View Post
Do not count the number of vmt0 occurrences.
It appears that your guests all have an active vmt(4) driver.
They obviously do. With
Code:
sensor *
in the configuration, it should already be in use?

Last edited by ocicat; 16th November 2016 at 12:05 PM. Reason: Please use [code] & [/code] tags when posting file contents.
Reply With Quote
Old 15th November 2016
MatthiasKoch MatthiasKoch is offline
Real Name: Matthias Koch
Port Guard
 
Join Date: Mar 2016
Location: Germany
Posts: 37
Default

No idea if this is related... I notice that the reference machines readjust their clock frequencies from time to time, like
Code:
adjusting clock frequency by 0.061162 to -9.519647ppm
and update their ntpd.drift files. I haven't seen the defector do that yes. The driftfile is present but empty.

Last edited by ocicat; 16th November 2016 at 12:08 PM. Reason: Please use [code] & [/code] tags when posting file contents.
Reply With Quote
Old 15th November 2016
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 8,019
Default

Quote:
Originally Posted by MatthiasKoch View Post
Looks like I confused the terminology here... it's the built-in ntpd, running from /usr/sbin/ntpd.
Are you absolutely certain which ntp application you are using? Your subject of this thread refers to NTP 4.2.8. That is the version of the third party net/ntp. The built-in OpenNTPd daemon does not report a version number. So you may indeed be using different ntp applications on your different systems, as you are getting different results.
Quote:
Originally Posted by MatthiasKoch View Post
With

sensor *

in the configuration, it should already be in use?
To find out, use ntpctl(8) as Trondd has previously recommended.

Quote:
Originally Posted by MatthiasKoch View Post
No idea if this is related... I notice that the reference machines readjust their clock frequencies from time to time, like

adjusting clock frequency by 0.061162 to -9.519647ppm

and update their ntpd.drift files. I haven't seen the defector do that yes. The driftfile is present but empty.
Look carefully. The most likely cause is a difference between these systems.
  • Different ntp applications
  • Different ntp configurations
  • Different VMWare guest configurations
If there were no difference, you would not be getting different results.

Is it possible you have discovered a bug? Yes. If so, is it a bug in OpenNTPd? net/ntp? VMWare? vmt(4)? I couldn't even begin to guess.
Reply With Quote
Old 16th November 2016
MatthiasKoch MatthiasKoch is offline
Real Name: Matthias Koch
Port Guard
 
Join Date: Mar 2016
Location: Germany
Posts: 37
Default

Quote:
Originally Posted by jggimi View Post
Are you absolutely certain which ntp application you are using? Your subject of this thread refers to NTP 4.2.8. That is the version of the third party net/ntp. The built-in OpenNTPd daemon does not report a version number.
I took the version number from the packages list on a OpenBSD mirror, assuming that this would be the ntpd that is installed by default. It is the built-in ntpd on all machines, then.

What confuses me most is that both the reference machine and the defective one have been created from the same template. I have already tried moving them from one physical machine to another in the cluster... no effect.

A year ago someone reported a very similar problem with FreeBSD which indicates a virtual hardware problem. However it wasn't explained what had been done to the hardware to fix it.
Reply With Quote
Old 16th November 2016
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 8,019
Default

If the problem moves with the guest from host to host, then there is something about that virtual machine which is different than the others. Determine exactly what is different.
  1. Compare dmesg(8) reports. Remember to remove any extraneous leading information from the dmesg buffers with an editor before comparing them. Don't use your eyes, use diff(1).
  2. Compare installed third party package lists:

    * On guest A: $ pkg_info -q > guest.A.package.list
    * On guest B: $ pkg_info -q > guest.B.package.list
    * Compare these two files with diff().
  3. Compare running daemons. On OpenBSD, we use rcctl(8) to enable and disable both built-in daemons and daemons from packages. This program edits /etc/rc.conf.local for us. Compare this file's contents on your two guests with diff().
Reply With Quote
Old 16th November 2016
MatthiasKoch MatthiasKoch is offline
Real Name: Matthias Koch
Port Guard
 
Join Date: Mar 2016
Location: Germany
Posts: 37
Default

A short update: I have installed net/ntp from packages and found that this one, too, is unable to keep the time. Most likely this narrows it down to a hardware problem. I have removed it again and put the original ntpd back into action.

A typical log excerpt shows that after starting ntpd, it manages to set the clock once. After a short time, the clock loses sync, sometimes (for some reason) gets back into sync before losing it for good.
Code:
Nov 16 16:17:09 n2 ntpd[13697]: ntp engine ready
Nov 16 16:17:21 n2 ntpd[96214]: set local clock to Wed Nov 16 16:17:21 CET 2016 (offset 12.031074s)
Nov 16 16:17:22 n2 ntpd[13697]: constraint reply from 216.58.208.36: offset -0.700312
Nov 16 16:17:46 n2 ntpd[13697]: peer 192.168.2.10 now valid
Nov 16 16:18:35 n2 ntpd[16182]: adjusting local clock by 12.031074s
Nov 16 16:19:07 n2 ntpd[16182]: adjusting local clock by 0.841569s
Nov 16 16:22:53 n2 ntpd[13697]: clock is now synced
Nov 16 16:24:28 n2 ntpd[16182]: adjusting local clock by 1.007848s
Nov 16 16:30:48 n2 ntpd[16182]: adjusting local clock by 3.983608s
Nov 16 16:35:00 n2 ntpd[16182]: adjusting local clock by 3.734702s
Nov 16 16:35:00 n2 ntpd[13697]: clock is now unsynced
Nov 16 16:39:16 n2 ntpd[16182]: adjusting local clock by 5.437277s
Nov 16 16:39:48 n2 ntpd[16182]: adjusting local clock by 6.278187s
Nov 16 16:40:50 n2 ntpd[16182]: adjusting local clock by 8.963168s
Nov 16 16:44:30 n2 ntpd[16182]: adjusting local clock by 7.865542s
Nov 16 16:45:32 n2 ntpd[16182]: adjusting local clock by 7.565163s
Ater losing sync, ntpd reports the adjustment of the local clock with what would be the correct amount of time while actually not doing anything. I restart it and get the correct time back once.

Last edited by ocicat; 16th November 2016 at 06:36 PM. Reason: Please use [code] & [/code] tags when posting file contents.
Reply With Quote
Old 17th November 2016
MatthiasKoch MatthiasKoch is offline
Real Name: Matthias Koch
Port Guard
 
Join Date: Mar 2016
Location: Germany
Posts: 37
Default

For a final check I have disabled ntp on both the defector and the reference machine. After 18 hrs of running without being synced, the defector is off by 19 minutes, the reference machine by 3 seconds.
Reply With Quote
Old 17th November 2016
MatthiasKoch MatthiasKoch is offline
Real Name: Matthias Koch
Port Guard
 
Join Date: Mar 2016
Location: Germany
Posts: 37
Default

Quote:
Originally Posted by jggimi View Post
VMWare? vmt(4)?
This one could be interesting.

vmt0 is present on all of my 6.0 and 5.9 machines (6 machines altogether):

# dmesg | grep vmt
vmt0 at pvbus0


According to vmt(4),
vmt reports the guest's hostname and first non-loopback IP address to the host.

On my vSphere Web Client, I see this in the summary panel for every machine:

<machinename>
Guest OS: OpenBSD6.0
Compatibility: ESXi 5.5 and later (VM version 10)
VMware Tools: Running, version:2147483647 (Guest Managed)
DNS name: <machine's FQDN>
IP Addresses: <machine's IP>
Host: <name of physical host>

This is consistent for all machines, with the exception of the one that's drifting. In that machine's summary panel, the entry for VMware Tools says
VMware Tools: Not running, version:2147483647 (Guest Managed)

and the DNS Name and IP Addresses entries are missing.

This would indicate that VMware Tools is not running, because a) the entry on the summary panel says so and b) hostname and IP address aren't reported back.

Last edited by MatthiasKoch; 17th November 2016 at 01:42 PM. Reason: typo
Reply With Quote
Old 17th November 2016
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 8,019
Default

I'll ask again, then.

What can you discover is different about the virtual machine that has a malfunctioning clock?

Does diff() show a difference in kernel messages? In packages installed? In daemons provisioned?

If there is no apparent difference, consider comparing all of the configuration files in /etc.
Reply With Quote
Old 17th November 2016
MatthiasKoch MatthiasKoch is offline
Real Name: Matthias Koch
Port Guard
 
Join Date: Mar 2016
Location: Germany
Posts: 37
Default

Quote:
Originally Posted by jggimi View Post
Does diff() show a difference in kernel messages? In packages installed? In daemons provisioned?
So far I don't think that it's a software configuration issue. I am favouring a problem with the virtual hardware. I am currently digging through the kernel messages.

# sysctl | grep hw

among other things, reveals this:
Code:
hw.model=Intel(R) Xeon(R) CPU X5650 @ 2.67GHz
No idea if it makes a difference, but on all other machines the CPU type is reported as
X5550. This is definitley a difference. The full output of the defector is

Code:
# sysctl | grep hw
hw.machine=amd64
hw.model=Intel(R) Xeon(R) CPU X5650 @ 2.67GHz
hw.ncpu=1
hw.byteorder=1234
hw.pagesize=4096
hw.disknames=cd0:,sd0:264b5329c6cf44b1,fd0:
hw.diskcount=3
hw.sensors.acpiac0.indicator0=On (power supply)
hw.sensors.vmt0.timedelta0=-102.742591 secs, OK, Thu Nov 17 15:17:59.983
hw.cpuspeed=2666
hw.vendor=VMware, Inc.
hw.product=VMware Virtual Platform
hw.version=None
hw.serialno=VMware-42 0d 04 ba d8 72 18 e4-f8 30 99 33 56 cb 69 7d
hw.uuid=420d04ba-d872-18e4-f830-993356cb697d
hw.physmem=4278059008
hw.usermem=4278046720
hw.ncpufound=4
hw.allowpowerdown=1
The different CPU types show up in dmesg as well. Along the difference in the CPU type, the defector shows these lines:

Code:
cpu at mainbus0: not configured
cpu at mainbus0: not configured
cpu at mainbus0: not configured
while the reference machine doesn't. The reference machine, however, has only 1 CPU, while the defector has four. The no. of CPUs shouldn't make a difference, though, because most of my BSD vms are multi-CPU machines.

By the way, I do appreciate your patience. Seriously.

EDIT: It appears that the virtual CPU is originally created as Xeon X5550, and after adding more cores it reports as X5650. Adding more cores to the reference machine changed the type to 5650 there too, but it still runs properly. The change of the CPU type appears to be unrelated, as both machines are running on 5650 now, with their behaviour unchanged.

Last edited by MatthiasKoch; 18th November 2016 at 12:26 PM. Reason: change of CPU type probably unrelated
Reply With Quote
Reply

Tags
clock, ntpd, virtual machine

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Attack code exploiting critical bugs in net time sync(NTP) puts servers at risk J65nko News 15 31st December 2014 06:59 PM
DoS attacks that took down big game sites abused Web’s time-sync protocol J65nko News 0 9th January 2014 07:34 PM
How to know if the system is in sync sepuku OpenBSD Installation and Upgrading 29 8th September 2011 12:24 PM
vBulletin date/time system Beastie Feedback and Suggestions 6 24th March 2010 01:57 AM
GENERIC.MP kernel failing to boot AMD dual-core system < 75% of the time JMJ_coder NetBSD General 3 9th June 2008 01:54 PM


All times are GMT. The time now is 03:48 AM.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Content copyright © 2007-2010, the authors
Daemon image copyright ©1988, Marshall Kirk McKusick