Running Out of Swap on Linux
Over the years I've run into all manner of problems where systems have run out of memory and swap space. With the latest versions of the Linux kernel there are some new tools that allow you to control what the system does when this happens.
A recent discussion on the Exim (mail transport agent) mail list got me to looking around a bit as I've had a problem with my workstation running out of swap/RAM (and it has lots of both) when I keep lots of Firefox windows open. One of the comments lead me to do a search for "overcommit" on Google, and that lead me to an article in Red Hat's magazine and from there things got interesting.
I grew up on Unix and Xenix from SCO and Radio Shack (TRS-80 Model 16) when the disk allocated for swap really was swap. At that time you needed to ensure you had more swap than physical RAM because there were times when memory became so fragmented that the system literally pushed it ALL out to swap and then pulled it back in, reallocating it into larger chunks as it went. Whole programs were swapped out when RAM became full - and since RAM prices were in the thousands per megabyte, there rarely was more than "just enough".
Today, swap isn't really swap - it is memory page cache. Instead of swapping out whole programs when RAM is used up, the system pushes "least recently used" pages of RAM from any/all (non-locked-in RAM) processes, controlled by the memory management unit in the CPU and a bunch of controls in the kernel. In Linux, there have been a number of such "knobs and switches" in the /proc/sys/vm area. This area has grown over the versions, and in the release of man-pages-3.04 in July, 2008, there are a number of new controls in the /proc area (man proc) that make life even better.
Of course the best cure for the use of swap/page-cache is to have lots of RAM - and with today's RAM prices in the dollars per Gigabyte range, it is usually only the mother board's capacity that really limits what your system has or should have. On the other hand, there is always the possibility of something "running away" and eating up RAM and swap - so what do you allow your system to do when this happens?
But no matter how much RAM and swap you have, there will come a time when something starts eating it and pretty soon the "OOM-Killer" (Out Of Memory-killer) will kick in and start killing random processes. Oh, it's documentation says it will try to find and kill things that look like they are the culpret - big RAM users and "runnaway" processes - but many times it will kill something vital and then the only thing you can do is reboot.
In fact, the typical system allows for what is called "overcommit" because of a neat feature of the way Linux (and other Unix-like OSs) start a sub-process - with the use of the "fork()" call. In Linux, fork() makes a copy of a processe's memory and starts the copy running - but the feature that makes it really neat is what is called "copy on write" - where only if the copy (or the originating) process actually writes to a piece of RAM does the actual copy take place requiring a new piece of RAM; otherwise both the old and new process simply read from the original page and save both the cost of the copy and the extra RAM. So it is possible for literally thousands of fork()s to take place and only a small amount of real RAM to be needed for scratch-pad and stack space for each - the rest is just all pointers to the same piece of RAM for all of them.
The system keeps track of how much RAM has been saved in this way - the over commit amount. If all these processes suddenly need to write to all their RAM, the system will quickly run out of real RAM/swap and things will grind to a halt.
If your machine either spawns a RAM eater or is suddenly hit by all its processes needing to actually write to RAM, thereby needing a reboot, and is on your desktop this usually isn't a problem. But what if it is in a cage somewhere and the company scripmped on the remote administration hardware so you have to go on site or pay someone at the co-lo to push the big red switch?
Is there a way to stop the OOM-killer? How about not letting that one final process even start - or telling whatever process that wants the last piece of RAM that you are simply out of memory? That's the function of a couple of switches added to the 2.6 kernels and a couple of new ones added since 2.6.24.
/proc/sys/vm/overcommit_memory was given a third option - #2 (programmers of course always start counting at zero) such that, when set, the virtual memory system will always check to ensure there is sufficient RAM/swap available to allocate the next piece requested. Along with /proc/sys/vm/overcommit_ratio (defaults to 50) which determines just how much overcommit is allowed.
from proc.5 man page on /proc/sys/vm/overcommit_memory
"... In mode 2 (available since Linux 2.6), the total virtual address space on
the system is limited to (SS + RAM*(r/100)), where SS is the size of
the swap space, and RAM is the size of the physical memory, and r is
the contents of the file /proc/sys/vm/overcommit_ratio."
This means that if you set overcommit_memory to 2 and set overcommit_ratio to 0 you will NEVER overcommit - the last process that trys to fork and potentially (remember, it will only actually use it if it writes to all the RAM it copies) use the last RAM/swap will fail with an out of memory error.
In reality the chances of every fork() using all the RAM it is allocated is exceedingly small - so let's set the ratio to something more reasonable; in fact the default of 50 is likely just fine. This means that (from the example in the Red Hat Magazine) your system has 1 Gig of RAM and 1 Gig of swap and the ratio is set to 50(%) then it appears to your system as if you have 2.5 Gigs of RAM available before the system will stop the overcommit.
OK - so what exactly happens when we really do hit the overcommit point? This is, as of the 2.6.24 kernel, the area of oom_kill_allocating_task - which, when set to something other than 0, does exactly what its name suggests - kills the offending task.
An even newer switch (2.6.25 kernels and later) allows you to dump a list of processes and their information at any time a OOM-killing occurs. This one is called oom_dump_tasks, and again is activated if its contents is non-zero.
And since kernel 2.6.18 we have had the trump flag that overrides all the others: panic_on_oom - which does exactly that and halts the system. This has a couple of settings (0 is default and allows the OOM-killer as set by other flags to function normally) which either allow for multi-cpu cases of local out of memory(1), or simply force any OOM to panic and halt the system (2)
As with any of the knobs and switches in /proc, you can set these parameters simply by pushing a new value into them with echo or cat.
echo "2" > /proc/sys/vm/overcommit_memory
and you can use /etc/sysctl.conf to ensure that the setting survives a reboot:
#turn on memory over-commit
vm.overcommit_memory=2
I've set my workstation to allow it to overcommit only 50%, and it will kill any offending process as soon as it tries to go past that. Let's see if that helps my problem with X-windows eating all of RAM when I leave too many Firefox browser windows open.



What's Related