Discussion:
lightly loaded system eats swap space
(too old to reply)
tech-lists
2018-06-17 22:19:02 UTC
Permalink
Hello list,

context is (server)
freebsd-11-stable r333874, ZFS raidz1-0 (3x4TB disks), 128GB RAM,
E5-2630 @2.3GHz, generic kernel.

There's one bhyve guest on this server (using 4x cpu and 16GB RAM, also
freebsd-11-stable)

There have been no special options for zfs configuration on the server,
apart from several datasets having the compressed property set (lz4).

The server runs nothing else really apart from sshd and it uses ntpd to
sync local time.

How come such a lightly loaded server with plenty of resources is eating
up swap? If I run two bhyve instances, i.e. two of the same size as
indicated above, so 32GB used for the bhyves, I'll get out-of-swapspace
errors in the daily logs:

+swap_pager_getswapspace(24): failed
+swap_pager_getswapspace(24): failed
+swap_pager_getswapspace(24): failed

Here's top, with one bhyve instance running:

last pid: 49494; load averages: 0.12, 0.13, 0.88


up 29+11:36:06 22:52:45
54 processes: 1 running, 53 sleeping
CPU: 0.4% user, 0.0% nice, 0.4% system, 0.3% interrupt, 98.9% idle
Mem: 8664K Active, 52M Inact, 4797M Laundry, 116G Wired, 1391M Buf,
4123M Free
ARC: 108G Total, 1653M MFU, 105G MRU, 32K Anon, 382M Header, 632M Other
103G Compressed, 104G Uncompressed, 1.00:1 Ratio
Swap: 4096M Total, 3502M Used, 594M Free, 85% Inuse

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
COMMAND
49491 root 1 4 0 16444K 12024K select 9 0:12 6.49% ssh
32868 root 12 20 0 9241M 4038M kqread 2 23.2H 1.30% bhyve
49490 root 1 20 0 10812K 6192K sbwait 5 0:02 0.88% sftp

From the looks of it, a huge amount of ram is wired. Why is that, and
how would I debug it?

A server of similar spec which is running freebsd-current with seven
bhyve instances doesn't have this issue:

last pid: 41904; load averages: 0.26, 0.19, 0.15


up 17+01:06:11 23:14:13
27 processes: 1 running, 26 sleeping
CPU: 0.1% user, 0.0% nice, 0.3% system, 0.0% interrupt, 99.6% idle
Mem: 17G Active, 6951M Inact, 41G Laundry, 59G Wired, 1573M Buf, 1315M Free
ARC: 53G Total, 700M MFU, 52G MRU, 512K Anon, 182M Header, 958K Other
53G Compressed, 69G Uncompressed, 1.30:1 Ratio, 122M Overhead
Swap: 35G Total, 2163M Used, 33G Free, 6% Inuse

thanks,
--
J.
Michael Schuster
2018-06-18 04:47:30 UTC
Permalink
I'd suggest - if you haven't done so yet - you familiarize yourself with
DTrace and write a probe that fires when swap_pager_getswapspace() fails,
to print execname and both kernel and userland stacks (or aggregations
thereof). That should give you a starting idea of what's going on.

HTH
Michael
Post by tech-lists
Hello list,
context is (server)
freebsd-11-stable r333874, ZFS raidz1-0 (3x4TB disks), 128GB RAM,
There's one bhyve guest on this server (using 4x cpu and 16GB RAM, also
freebsd-11-stable)
There have been no special options for zfs configuration on the server,
apart from several datasets having the compressed property set (lz4).
The server runs nothing else really apart from sshd and it uses ntpd to
sync local time.
How come such a lightly loaded server with plenty of resources is eating
up swap? If I run two bhyve instances, i.e. two of the same size as
indicated above, so 32GB used for the bhyves, I'll get out-of-swapspace
+swap_pager_getswapspace(24): failed
+swap_pager_getswapspace(24): failed
+swap_pager_getswapspace(24): failed
last pid: 49494; load averages: 0.12, 0.13, 0.88
up 29+11:36:06 22:52:45
54 processes: 1 running, 53 sleeping
CPU: 0.4% user, 0.0% nice, 0.4% system, 0.3% interrupt, 98.9% idle
Mem: 8664K Active, 52M Inact, 4797M Laundry, 116G Wired, 1391M Buf,
4123M Free
ARC: 108G Total, 1653M MFU, 105G MRU, 32K Anon, 382M Header, 632M Other
103G Compressed, 104G Uncompressed, 1.00:1 Ratio
Swap: 4096M Total, 3502M Used, 594M Free, 85% Inuse
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
COMMAND
49491 root 1 4 0 16444K 12024K select 9 0:12 6.49% ssh
32868 root 12 20 0 9241M 4038M kqread 2 23.2H 1.30% bhyve
49490 root 1 20 0 10812K 6192K sbwait 5 0:02 0.88% sftp
From the looks of it, a huge amount of ram is wired. Why is that, and
how would I debug it?
A server of similar spec which is running freebsd-current with seven
last pid: 41904; load averages: 0.26, 0.19, 0.15
up 17+01:06:11 23:14:13
27 processes: 1 running, 26 sleeping
CPU: 0.1% user, 0.0% nice, 0.3% system, 0.0% interrupt, 99.6% idle
Mem: 17G Active, 6951M Inact, 41G Laundry, 59G Wired, 1573M Buf, 1315M Free
ARC: 53G Total, 700M MFU, 52G MRU, 512K Anon, 182M Header, 958K Other
53G Compressed, 69G Uncompressed, 1.30:1 Ratio, 122M Overhead
Swap: 35G Total, 2163M Used, 33G Free, 6% Inuse
thanks,
--
J.
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "
--
Michael Schuster
http://recursiveramblings.wordpress.com/
recursion, n: see 'recursion'
Shane Ambler
2018-06-18 06:14:30 UTC
Permalink
Post by tech-lists
Hello list,
context is (server)
freebsd-11-stable r333874, ZFS raidz1-0 (3x4TB disks), 128GB RAM,
There's one bhyve guest on this server (using 4x cpu and 16GB RAM, also
freebsd-11-stable)
There have been no special options for zfs configuration on the server,
apart from several datasets having the compressed property set (lz4).
The server runs nothing else really apart from sshd and it uses ntpd to
sync local time.
How come such a lightly loaded server with plenty of resources is eating
up swap? If I run two bhyve instances, i.e. two of the same size as
indicated above, so 32GB used for the bhyves, I'll get out-of-swapspace
+swap_pager_getswapspace(24): failed
+swap_pager_getswapspace(24): failed
+swap_pager_getswapspace(24): failed
Out of swap messages can be misleading, you can also get out of swap
messages if too much ram gets wired - wired doesn't get swapped out.
Too much wired can also cause everything else to be paged out.

How do you start bhyve instances? A previous issue used one of the bhyve
helper ports that used -S to wire all guest memory as a default setting.
Post by tech-lists
last pid: 49494;  load averages:  0.12,  0.13,  0.88
                             up 29+11:36:06  22:52:45
54 processes:  1 running, 53 sleeping
CPU:  0.4% user,  0.0% nice,  0.4% system,  0.3% interrupt, 98.9% idle
Mem: 8664K Active, 52M Inact, 4797M Laundry, 116G Wired, 1391M Buf,
You have 116G wired of 128G, this would be the cause behind everything
else going to swap. 108G of that wired looks to be arc.
Post by tech-lists
ARC: 108G Total, 1653M MFU, 105G MRU, 32K Anon, 382M Header, 632M Other
     103G Compressed, 104G Uncompressed, 1.00:1 Ratio
Swap: 4096M Total, 3502M Used, 594M Free, 85% Inuse
There is `sysctl vm.max_wired` which limits the kernel wired memory, the
arc is then also wired but added on top of that setting, if your guests
are then wiring more ram, you can get too much wired.

There is a patch in review that attempts to improve arc releasing when
free ram is low. See if this helps -

https://reviews.freebsd.org/D7538

Other than that check that your guest ram + max_wired + arc_max adds up
to less than your physical ram minus some host system ram. Note that
wax_wired is a page count so multiply it by hw.pagesize
--
FreeBSD - the place to B...Software Developing

Shane Ambler
tech-lists
2018-06-18 12:15:19 UTC
Permalink
Post by Shane Ambler
There is `sysctl vm.max_wired` which limits the kernel wired memory, the
arc is then also wired but added on top of that setting, if your guests
are then wiring more ram, you can get too much wired.
There is a patch in review that attempts to improve arc releasing when
free ram is low. See if this helps -
https://reviews.freebsd.org/D7538
Other than that check that your guest ram + max_wired + arc_max adds up
to less than your physical ram minus some host system ram. Note that
wax_wired is a page count so multiply it by hw.pagesize
Thanks for this. I can't apply the diff just yet due to the machine
being in service but will let you know how I get on when I can.
--
J.
Erich Dollansky
2018-06-18 08:08:55 UTC
Permalink
Hi,

On Sun, 17 Jun 2018 23:19:02 +0100
Post by tech-lists
freebsd-11-stable r333874, ZFS raidz1-0 (3x4TB disks), 128GB RAM,
Swap: 4096M Total, 3502M Used, 594M Free, 85% Inuse
this might not be related but I noticed that your swap space is small
compared to RAM size. I noticed on a much smaller Raspberry Pi, that it
runs into trouble when there is no swap even there is enough RAM
available. Is it easily possible for you to add some GB of swap space
and let the machine run then?

How much swap do the other machines have?

Erich
Adam
2018-06-18 12:45:31 UTC
Permalink
Post by Erich Dollansky
Hi,
On Sun, 17 Jun 2018 23:19:02 +0100
freebsd-11-stable r333874, ZFS raidz1-0 (3x4TB disks), 128GB RAM,
Post by tech-lists
Swap: 4096M Total, 3502M Used, 594M Free, 85% Inuse
this might not be related but I noticed that your swap space is small
compared to RAM size. I noticed on a much smaller Raspberry Pi, that it
runs into trouble when there is no swap even there is enough RAM
available. Is it easily possible for you to add some GB of swap space
and let the machine run then?
How much swap do the other machines have?
Hi,
Yes, the machine with the problem uses the default 4GB swap. That's all
the swap it has. The machine without issue has a swapfile installed on a
SSD in addition to the default 4GB swap.
Device 512-blocks Used Avail Capacity
/dev/ada0p3 8388608 3.3G 714M 83%
Device 512-blocks Used Avail Capacity
/dev/ada0s1b 8262248 1.7G 2.2G 44%
/dev/md0 65536000 1.9G 29G 6%
Total 73798248 3.7G 32G 10%
I added the swapfile a long time ago on this machine due to the same issue.
But my problem isn't so much an out of swapspace problem; all this is, is
a symptom. My problem is "why is it swapping out at all on a 128GB system
and why is what's swapped out not being swapped back in again".
What is the output of sysctl vm.overcommit? If this system is intended on
being a VM host, then why don't you limit ARC to something reasonable like
Total Mem - Projected VM Mem - Overhead = Ideal ARC .
--
Adam
Kevin Oberman
2018-06-18 16:08:14 UTC
Permalink
Post by Erich Dollansky
Hi,
On Sun, 17 Jun 2018 23:19:02 +0100
freebsd-11-stable r333874, ZFS raidz1-0 (3x4TB disks), 128GB RAM,
Post by tech-lists
Swap: 4096M Total, 3502M Used, 594M Free, 85% Inuse
this might not be related but I noticed that your swap space is small
compared to RAM size. I noticed on a much smaller Raspberry Pi, that it
runs into trouble when there is no swap even there is enough RAM
available. Is it easily possible for you to add some GB of swap space
and let the machine run then?
How much swap do the other machines have?
Hi,
Yes, the machine with the problem uses the default 4GB swap. That's all
the swap it has. The machine without issue has a swapfile installed on a
SSD in addition to the default 4GB swap.
Device 512-blocks Used Avail Capacity
/dev/ada0p3 8388608 3.3G 714M 83%
Device 512-blocks Used Avail Capacity
/dev/ada0s1b 8262248 1.7G 2.2G 44%
/dev/md0 65536000 1.9G 29G 6%
Total 73798248 3.7G 32G 10%
I added the swapfile a long time ago on this machine due to the same issue.
But my problem isn't so much an out of swapspace problem; all this is, is
a symptom. My problem is "why is it swapping out at all on a 128GB system
and why is what's swapped out not being swapped back in again".
thanks,
--
J.
Small correction. Your problem is "why is it swapping out at all on a
128GB system ." Once pages are swapped out, they are never swapped back in
until/unless they are needed. There is no reason to waste time/disk
activity to swap pages back into memory unless they are required. RAM is
always more valuable than swap.

Ir is easy to write a process that eats up a large amount of memory, then
goes idle without freeing it. The memory will get pages out, fill swap,
and, unless the process terminates or becomes active, will consume up a
great deal of swap space "forever". Firefox is a good example of this. I
have to restart it every day or two and occasionally will run out of swap
which results in a nearly deadlocked system. It can take many minutes to
just kill firefox.
--
Kevin Oberman, Part time kid herder and retired Network Engineer
E-mail: ***@gmail.com
PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683
Stefan Esser
2018-06-19 07:06:42 UTC
Permalink
A very long time ago - and not on FreeBSD but maybe on a real BSD - I
worked with a system that swapped pages out just to bring it back as
one contiguous block. This made a difference those days. I do not know
if the code made it out of the university I was working at. I just
imagine now that the code made it out and is still in use with the
opposite effect.
If this was on a VAX, then it was due to a short-coming of the
MMU of the VAX, which used one linear array (in system virtual
memory) to hold physical addresses of user pages of all programs.
Each user program had 2 slices in this array (1 growing up, 1
growing down for the stack) and whenever a program allocated a
new page, this slice needed to grow. That leads to fragmentation
(same a problem as with realloc() for an ever growing array), and
when there was no contiguous free space in the array for a grown
slice, then all process where swapped out (resulting in this whole
page table array being cleared and thus without fragmentation,
since swapped-out processes needed no space in this array).

This was a solution that worked without the table walk used in
todays VM systems. System pages were mapped by a linear page table
in physical memory, while user programs used the above described
linear page table in system virtual memory.

Nothing of the above applies to any other architecture than the
VAX and thus the swap-out of all user processes serves no purpose
on any other system. It was an implementation detail of the VAX
VM code, not a BSD Unix feature.

Regards, STefan
Erich Dollansky
2018-06-19 08:08:22 UTC
Permalink
Hi,

On Tue, 19 Jun 2018 09:06:42 +0200
Post by Stefan Esser
A very long time ago - and not on FreeBSD but maybe on a real BSD -
I worked with a system that swapped pages out just to bring it back
as one contiguous block. This made a difference those days. I do
not know if the code made it out of the university I was working
at. I just imagine now that the code made it out and is still in
use with the opposite effect.
If this was on a VAX, then it was due to a short-coming of the
MMU of the VAX, which used one linear array (in system virtual
this could have been the case as they have had many DEC systems.
Post by Stefan Esser
Nothing of the above applies to any other architecture than the
VAX and thus the swap-out of all user processes serves no purpose
on any other system. It was an implementation detail of the VAX
VM code, not a BSD Unix feature.
I know how an MMU works, but I also know how caches work. It still
could give a tiny advantage if it is in one piece, even if it is not a
requirement to function. Any way, I do not work in this area anymore.

Erich

Loading...