Is it normal that a user can take down the whole system by using too

Discussion:

(too old to reply)

Brennan Vincent

2018-06-02 22:16:10 UTC

The attached program `eatmem.c` is a simple example to waste N gigs of memory as quickly as possible.

When I run something like `eatmem 32` (on a system with less than 32GB of RAM), about half the time everything works fine: the system quickly runs out of RAM and swap, the kernel kills `eatmem`, and everything recovers. However, the other half of the time, the system becomes completely unusable: my ssh session is killed, important processes like `init` and `getty` are killed, and it's impossible to even log into the system (the local terminal is unresponsive, and I can't ssh in because sshd is killed immediately whenever it tries to run). The only way to recover is by rebooting.

Is this expected behavior?

My system details are as follows:
FreeBSD 12 CURRENT x86_64 guest on VMWare Fusion.
ram: 8 GB
swap: 1 GB
Host: macbook pro running macOS.

John Howie

2018-06-02 22:40:52 UTC

Permalink

Hi Brennan,

Do ‘man -k limit’ for details of means to restrict resource consumption. In particular, check out limits(1) and rctl(8).

Variations of this problem have been around forever. An oldie but goldie is:

main () { while (1) { fork (); } }

I cannot say why you are getting the results you see on your specific system. I would check limits to see where they are set, and tweak them.

Cheers,

John

Sent from my iPhone

Post by Brennan Vincent
The attached program `eatmem.c` is a simple example to waste N gigs of memory as quickly as possible.
When I run something like `eatmem 32` (on a system with less than 32GB of RAM), about half the time everything works fine: the system quickly runs out of RAM and swap, the kernel kills `eatmem`, and everything recovers. However, the other half of the time, the system becomes completely unusable: my ssh session is killed, important processes like `init` and `getty` are killed, and it's impossible to even log into the system (the local terminal is unresponsive, and I can't ssh in because sshd is killed immediately whenever it tries to run). The only way to recover is by rebooting.
Is this expected behavior?
FreeBSD 12 CURRENT x86_64 guest on VMWare Fusion.
ram: 8 GB
swap: 1 GB
Host: macbook pro running macOS.
<eatmem.c>
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-questions

Brennan Vincent

2018-06-02 23:25:31 UTC

Permalink

Thanks John for the response -- this should help me solve my practical needs.

I'm also curious, however, to learn more from an OS design perspective. Why isn't it possible for the kernel to realize it should kill `eatmem` rather than make the system unusable?

Is this a hard problem in general, or just a missing feature in FreeBSD specifically ?

Post by John Howie
Hi Brennan,
Do ‘man -k limit’ for details of means to restrict resource consumption.
In particular, check out limits(1) and rctl(8).
main () { while (1) { fork (); } }
I cannot say why you are getting the results you see on your specific
system. I would check limits to see where they are set, and tweak them.
Cheers,
John
Sent from my iPhone

RW via freebsd-questions

2018-06-02 23:46:34 UTC

Permalink

On Sat, 02 Jun 2018 19:25:31 -0400

Post by Brennan Vincent
I'm also curious, however, to learn more from an OS design
perspective. Why isn't it possible for the kernel to realize it
should kill `eatmem` rather than make the system unusable?

I did something similar a few years ago, and the process was reliably
killed.

RW via freebsd-questions

2018-06-02 23:53:11 UTC

Permalink

On Sun, 3 Jun 2018 00:46:34 +0100

Post by RW via freebsd-questions
On Sat, 02 Jun 2018 19:25:31 -0400

I did something similar a few years ago, and the process was reliably
killed.

I should have added that that was without virtualization, or ZFS.

Valeri Galtsev

2018-06-03 02:26:39 UTC

Permalink

Post by RW via freebsd-questions
On Sat, 02 Jun 2018 19:25:31 -0400

I did something similar a few years ago, and the process was reliably
killed.

Indeed you sound like saying, no, regular user putting the system on the
knees by just grabbing all memory is not normal. And I would second that.
I didn't do such test to FreeBSD (how thoughtless of me), but in the past
when I was mostly Linux guy, I did that to Linux systems routinely. RedHat
(and clones) as well as unpatched by anybody latest Linux kernel built
with all default configuration were always consistently invoking OOM (Out
Of Memory) killer, which consistently was killing the offender, even in
presence of other processes possessing large junks of memory. Once I tried
SUSE (ver 7), and its stock kernel was crashed by this little memory leak
program I used for tests. I gave up on SUSE once and forever then.

So, killing the offender is a must IMHO. However, on some systems before
the offender gets killed the system may be on its knees for some time, and
for how long mostly will depend on how sysadmin set it up. Say, if there
is very large swap space, then until it is exhausted, the situation is not
considered critical to kill some process. And the machine that is using
swap a lot will effectively be thousands of times slower.

Valeri

Post by RW via freebsd-questions
_______________________________________________
https://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to

++++++++++++++++++++++++++++++++++++++++
Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247
++++++++++++++++++++++++++++++++++++++++

Polytropon

2018-06-03 19:09:15 UTC

Permalink

Post by Brennan Vincent
I'm also curious, however, to learn more from an OS design perspective.
Why isn't it possible for the kernel to realize it should kill `eatmem`
rather than make the system unusable?
Is this a hard problem in general, or just a missing feature in FreeBSD specifically ?

It's more about what the kernel's basic function is: to control
the interactions of hardware and software. In case of "eatmem"
or any other program that allocates memory, there _might_ be a
totally valid and legitimate (!) reason for a problem program
to do so. If physical RAM has been consumed, swap will be added.
This is still within normal margins of system use. Just consider
today's memory-intensive user applications like web browsers.
At which threshold should the kernel say: "Okay, no more RAM
for you, please go kill yourself, or I will do it in 60 seconds.",
which is (more or less) what you seem to expect?

Resource control mechanisms are present in FreeBSD, but it takes
a system administrator with sufficient knowledge and experience
to decide about the values that will apply - how much RAM can be
consumed, how many child processes can be spawned, how many files
can be held open... and so on. There just is no system default
that can be preconfigured, calculated, assumed, guessed, or
arbitrarily chosen. No "one size fits all" solution is possible.

Just take the following example: A problem program will do some
database lookup, store lots of data in memory, then calculate
something from it, build another set of memory-intensive struc-
tures, and then start heavy disk I/O do write lots of result
files. This might require more RAM than the system has. Let's
say it's 64 GB. The program now needs 80 GB, so swap will be
used. After a period of heavy system activity, memory will be
released again. Load values go back to normal.

For this example, try to imagine at what stage the kernel should
interfere. And _should_ it interfere? From the practical point
of view, everything worked as it was intended and expected.

--
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...

RW via freebsd-questions

2018-06-03 22:51:02 UTC

Permalink

On Sun, 3 Jun 2018 21:09:15 +0200

Post by Polytropon

Resource control mechanisms are present in FreeBSD, but it takes
a system administrator with sufficient knowledge and experience
to decide about the values that will apply - how much RAM can be
consumed, how many child processes can be spawned, how many files
can be held open... and so on. There just is no system default
that can be preconfigured, calculated, assumed, guessed, or
arbitrarily chosen. No "one size fits all" solution is possible.

Configuration isn't relevant to the question as it's just a way to avoid
reproducing the test conditions.

Post by Polytropon
Just take the following example:...
For this example, try to imagine at what stage the kernel should
interfere. And _should_ it interfere? From the practical point
of view, everything worked as it was intended and expected.

The OP's program is attempting to dirty many more pages than will fit in
swap+ram. Eventually, as a last resort, the kernel has no alternative
but to start killing processes. The last time I looked it did kill them
largest first which has the best chance of killing a runaway process
first.

When I ran a similar test a few years ago my program would get killed
as expected, large amounts of swap and memory would be freed and the
system would return quickly to normal. The issue here is that the OP
is only seeing this behaviour half the time.

Steve O'Hara-Smith

2018-06-04 09:43:34 UTC

Permalink

On Sat, 02 Jun 2018 19:25:31 -0400

Post by Brennan Vincent
Thanks John for the response -- this should help me solve my practical needs.
I'm also curious, however, to learn more from an OS design perspective.

Best way to do that is look at the code.

Post by Brennan Vincent
Why isn't it possible for the kernel to realize it should kill `eatmem`
rather than make the system unusable?

The code in vm_pageout.c that handles out of swap conditions tries
to kill the biggest process - by that it means the one estimated to release
the most memory. To that end it skips processes in various states - killed,
protected, system, non-running and then picks the biggest of the remaining
processes to kill.

One thing that may cause the random chaos you see is that killing a
process takes time, especially if that process has swapped out pages, if
something else calls for memory it has finished dying then the next biggest
process will get killed even though the process responsible has already
been killed (after all it might be locked up and failing to die).

--
Steve O'Hara-Smith <***@sohara.org>

Frank Leonhardt

2018-06-08 14:18:29 UTC

Permalink

Post by RW via freebsd-questions
On Sat, 02 Jun 2018 19:25:31 -0400

Post by Brennan Vincent
Thanks John for the response -- this should help me solve my practical needs.
I'm also curious, however, to learn more from an OS design
perspective.

Best way to do that is look at the code.

Post by Brennan Vincent
Why isn't it possible for the kernel to realize it should kill `eatmem`
rather than make the system unusable?

The code in vm_pageout.c that handles out of swap conditions tries
to kill the biggest process - by that it means the one estimated to release
the most memory. To that end it skips processes in various states - killed,
protected, system, non-running and then picks the biggest of the remaining
processes to kill.
One thing that may cause the random chaos you see is that killing a
process takes time, especially if that process has swapped out pages, if
something else calls for memory it has finished dying then the next biggest
process will get killed even though the process responsible has already
been killed (after all it might be locked up and failing to die).

You might want to look at this:

http://blog.frankleonhardt.com/2011/large-swap-files-on-freebsd-die-with-mystery-killed-howto-add-lots-of-swap-space/

Regards, Frank.