DaemonForums  

Go Back   DaemonForums > OpenBSD > OpenBSD General

OpenBSD General Other questions regarding OpenBSD which do not fit in any of the categories below.

Reply
 
Thread Tools Display Modes
  #1   (View Single Post)  
Old 17th November 2017
MatthiasKoch MatthiasKoch is offline
Real Name: Matthias Koch
Port Guard
 
Join Date: Mar 2016
Location: Germany
Posts: 37
Default 5.9: /var filling

Good afternoon all,

I'm running a virtual (ESXi 5.5) OpenBSD 5.9 logstash server with logstash-2.1.1p0v0 from packages. It's only listening for remote machines and forwards their data to an Elasticsearch cluster (no other local application running).

The /var partition is small (1.7G) and fills for a reason I cannot find out.

Code:
[root@ymir]:/> uname -var
OpenBSD ymir 5.9 GENERIC.MP#1888 amd64

[root@ymir]:/> pkg_info logstash
Information for inst:logstash-2.1.1p0v0

[root@ymir]:/> df -h
Filesystem     Size    Used   Avail Capacity  Mounted on
...
/dev/sd0e      1.7G    1.3G    343M    80%    /var

[root@ymir]:/> du -hs /var
10.8M   /var
This means, 1.3 out of 1.7G are used, but du only finds 10.8M (!) So what about the rest?
I'm a bit stuck atm... any input is welcome.

Matthias
Reply With Quote
  #2   (View Single Post)  
Old 17th November 2017
ibara ibara is offline
OpenBSD language porter
 
Join Date: Jan 2014
Posts: 783
Default

Code:
$ ls -lh /var
?
Reply With Quote
  #3   (View Single Post)  
Old 20th November 2017
MatthiasKoch MatthiasKoch is offline
Real Name: Matthias Koch
Port Guard
 
Join Date: Mar 2016
Location: Germany
Posts: 37
Default

Nothing uncommon or unexpected. /var contains only directories, no files.
Code:
$ du -hs /var/*
lists the directories and their content size, which is in line with

Code:
 $ du -hs /var
10.8M     /var
/var/db being the biggest directory with 4.7M.
Code:
$ du -hs /var/*
2.0K    /var/account
2.0K    /var/audit
2.0K    /var/authpf
634K    /var/backups
1.2M    /var/cache
4.0K    /var/crash
22.0K   /var/cron
4.7M    /var/db
2.0K    /var/empty
44.0K   /var/games
450K    /var/log
14.0K   /var/lost+found
1.8M    /var/mail
18.0K   /var/nsd
2.0K    /var/quotas
116K    /var/run
554K    /var/spool
218K    /var/sysmerge
0B      /var/tmp
8.0K    /var/unbound
1.1M    /var/www
32.0K   /var/yp
Reply With Quote
  #4   (View Single Post)  
Old 20th November 2017
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 8,032
Default

If du(1) is not finding the root cause, I will guess there is a filesystem problem.

I recommend running fsck(8). To do this, /var must be dismounted. As it is the storage for most daemons, the easiest way to fsck() /var is to boot in single-user mode (boot> -s). You may want to attempt to back up the filesystem before running fsck(), as its repair procedures may remove files. See fsck_ffs(8), called by fsck(), regarding the repair procedures used for FFS filesystems.
Reply With Quote
  #5   (View Single Post)  
Old 21st November 2017
MatthiasKoch MatthiasKoch is offline
Real Name: Matthias Koch
Port Guard
 
Join Date: Mar 2016
Location: Germany
Posts: 37
Default

fsck_ffs -f says the filesystem is OK... rebooting now.
Reply With Quote
  #6   (View Single Post)  
Old 21st November 2017
MatthiasKoch MatthiasKoch is offline
Real Name: Matthias Koch
Port Guard
 
Join Date: Mar 2016
Location: Germany
Posts: 37
Default

Must have been a filesystem issue. Although fsck and ffsck_ffs found no apparent error, after a reboot the values look reasonable:

Code:
# df -h
...
/dev/sd0e      1.7G   10.9M    1.6G     1%    /var

# du -hs /var
11.0M   /var
In other words, the problem is solved for now, but I still don't know what caused it. Quite unpleasant. Need to watch this...

Thanks so far
Matthias
Reply With Quote
  #7   (View Single Post)  
Old 21st November 2017
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 8,032
Default

Then no, it wasn't a filesystem problem, as fsck() didn't report an issue and made no repair. Insead, it appears to have been an issue in the running system. Reboot resolved it, though it is possible a remount would have resolved it also.

(The soft dependencies option delays some metadata writes. I thought there might have been some sort of issue with it on your system.)

Last edited by jggimi; 21st November 2017 at 11:47 AM. Reason: clarity
Reply With Quote
  #8   (View Single Post)  
Old 21st November 2017
TronDD TronDD is offline
Spam Deminer
 
Join Date: Sep 2014
Posts: 307
Default

Often when I see this, it's because some deleted files were unlinked from the directory but were still held open by a running process and therefor not actually removed from the filesystem.

It can happen if you rotate log files and delete them without restarting the application that is appending to that log.
Reply With Quote
  #9   (View Single Post)  
Old 21st November 2017
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 8,032
Default

Do you use the softdep mount option? If so, consider disabling it.
Reply With Quote
Old 21st November 2017
MatthiasKoch MatthiasKoch is offline
Real Name: Matthias Koch
Port Guard
 
Join Date: Mar 2016
Location: Germany
Posts: 37
Default

No, the mount options are ffs rw,nodev,nosuid 1 2.
Reply With Quote
Old 22nd November 2017
MatthiasKoch MatthiasKoch is offline
Real Name: Matthias Koch
Port Guard
 
Join Date: Mar 2016
Location: Germany
Posts: 37
Default

It's filling again. While du reveals 10.8M used space, df reports about 40M are used.
Reply With Quote
Old 22nd November 2017
TronDD TronDD is offline
Spam Deminer
 
Join Date: Sep 2014
Posts: 307
Default

In linux, you can use lsof to find deleted files, but you can't have that on OpenBSD.

Try `fstat -f /var` and see if there is any obvious large file. Unfortunatly, you'll only get an inode and not a file name. But are we looking for one run away file or are we looking for a bunch of non-obvious smaller files?

Since this is a log server, we might be able to assume a log is the problem and yuo can try `fstat /var/log/*` and see if it points you to a named file.
Reply With Quote
Old 22nd November 2017
MatthiasKoch MatthiasKoch is offline
Real Name: Matthias Koch
Port Guard
 
Join Date: Mar 2016
Location: Germany
Posts: 37
Default

Quote:
Originally Posted by TronDD View Post
Try `fstat -f /var` and see if there is any obvious large file. Unfortunatly, you'll only get an inode and not a file name. But are we looking for one run away file or are we looking for a bunch of non-obvious smaller files?
At this point, no idea. Something simply fills /var, and the regular tools cannot find anything.

Quote:
Originally Posted by TronDD View Post
Since this is a log server, we might be able to assume a log is the problem and yuo can try `fstat /var/log/*` and see if it points you to a named file.
It isn't actually a logserver. Logstash receives data from remote machines on port 5044, processes it, and passes it on to an Elasticsearch cluster.

Anyway, it's getting interesting here:
Code:
# /etc/logstash> fstat -f /var
USER     CMD          PID   FD MOUNT        INUM MODE       R/W    SZ|DV
_logstas java       28153    5 /var        26020 -rw-r-----   w 36022383
This file is owned by the logstash process (java), and it keeps growing. The size roughly equals the current difference between the output of du and df. For some reason I just cannot see a file, and du cannot see it either.

The file disappears when I stop logstash, and reappears when I start it, growing steadily. And it keeps growing as long as logstash runs, even when it is not receiving and processing any data (I've redirected the input to another machine).

I have set up a similar system with the same config, running OBSD 6.2 and logstash-2.4.0p1v0. Logstash creates a file too, but it does not seem to grow.

I currently think that the problem relates to logstash, but the question remains why du cannot see the file (the default beviour of du -s is to summarize the contents of all objects and directories).
Reply With Quote
Old 22nd November 2017
TronDD TronDD is offline
Spam Deminer
 
Join Date: Sep 2014
Posts: 307
Default

Quote:
I currently think that the problem relates to logstash, but the question remains why du cannot see the file (the default beviour of du -s is to summarize the contents of all objects and directories).
It is a problem with logstash.

The file has been unlinked from the filesystem and 'du' operates on the filesystem. It still takes up space and and can still be used by logstash because logstash holds an open file descriptor.

This may be a mistake in logstash, but I have seen temp files used like this where the file is created, opened, then "deleted" (unlinked) and then used by the program. This way if the program exists unexpectedly, the file descriptor goes away and the file is cleaned up automatically.

See the first paragraph of unlink(2)
Reply With Quote
Old 22nd November 2017
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 8,032
Default

The lsof utility was removed about a year ago, for 6.1.
Code:
Move lsof to the Attic.

Requires kmem access, is so coupled to the system internals that it
needs a /usr/src/sys checkout, and breaks regularly due to changes in
base.  People used to it should be told to use fstat(1) & friends
instead.

ok landry@ sthen@ dcoppa@
The discussion about it was archived here: https://marc.info/?t=148059362500004&r=1&w=2
Reply With Quote
Old 22nd November 2017
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 8,032
Default

You have an inode number, so there is a file assigned to the space being consumed. I will guess that fsdb(8) could tell you more, if ls(1) -i isn't sufficiently forthcoming.
Reply With Quote
Old 23rd November 2017
J65nko J65nko is offline
Administrator
 
Join Date: May 2008
Location: Budel - the Netherlands
Posts: 4,164
Default

Just wild guessing how logstash works....

If if collects data and puts everything (writing) in a named pipe then it needs a mechanism, i.e. an internet socket to read from that named pipe and send it out to the collecting server. A named pipe that is only written to without being read from just accumulates data in the named pipe file and it will become bigger and bigger.

So assuming a named pipe is being used you need to verify whether data is read from that named pipe. Only when this named pipe file has a reader it will not grow bigger and bigger

With tools like netstat or tcpdump you can verify if is there is communication or a socket between the logstash sender and the Elastic Search cluster.
Maybe pf is blocking?


With the inode number you should be able to find the file or named pipe. Something like # ls -liR /var | grep 26020
__________________
You don't need to be a genius to debug a pf.conf firewall ruleset, you just need the guts to run tcpdump
Reply With Quote
Old 23rd November 2017
MatthiasKoch MatthiasKoch is offline
Real Name: Matthias Koch
Port Guard
 
Join Date: Mar 2016
Location: Germany
Posts: 37
Default

Quote:
Originally Posted by J65nko View Post
you can verify if is there is communication or a socket between the logstash sender and the Elastic Search cluster.
This has definitely been working, as the machine in question is the only node that collects data and forwards it to ES.
Reply With Quote
Old 23rd November 2017
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 8,032
Default

Quote:
Originally Posted by J65nko View Post
...Only when this named pipe file has a reader it will not grow bigger and bigger.
This has not been my experience when using them. Instead, I have seen writes will not complete when the reader is disconnected or reads are pending.
Reply With Quote
Old 23rd November 2017
e1-531g e1-531g is offline
ISO Quartermaster
 
Join Date: Mar 2014
Posts: 636
Default

Quote:
Originally Posted by jggimi View Post
This has not been my experience when using them. Instead, I have seen writes will not complete when the reader is disconnected or reads are pending.
This behavior would be consistent with what I read about named pipes in Gnu/Linux in some general book ("Zrozumieć programowanie" by Gynvael Coldwind) about OSes from programmer's perspective. There were some buffer in kernel. When reader is not reading and writer is writing, buffer becomes full and writing is not completed.
__________________
Signature: Furthermore, I consider that systemd must be destroyed.
Based on Latin oratorical phrase

Last edited by e1-531g; 23rd November 2017 at 12:18 PM.
Reply With Quote
Reply

Tags
df, du, logstash

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Clientmqueue keeps filling /var and I don't know why beandip FreeBSD General 6 19th November 2008 10:42 PM


All times are GMT. The time now is 08:57 AM.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Content copyright © 2007-2010, the authors
Daemon image copyright ©1988, Marshall Kirk McKusick