linux/Documentation
David Rientjes a63d83f427 oom: badness heuristic rewrite
This a complete rewrite of the oom killer's badness() heuristic which is
used to determine which task to kill in oom conditions.  The goal is to
make it as simple and predictable as possible so the results are better
understood and we end up killing the task which will lead to the most
memory freeing while still respecting the fine-tuning from userspace.

Instead of basing the heuristic on mm->total_vm for each task, the task's
rss and swap space is used instead.  This is a better indication of the
amount of memory that will be freeable if the oom killed task is chosen
and subsequently exits.  This helps specifically in cases where KDE or
GNOME is chosen for oom kill on desktop systems instead of a memory
hogging task.

The baseline for the heuristic is a proportion of memory that each task is
currently using in memory plus swap compared to the amount of "allowable"
memory.  "Allowable," in this sense, means the system-wide resources for
unconstrained oom conditions, the set of mempolicy nodes, the mems
attached to current's cpuset, or a memory controller's limit.  The
proportion is given on a scale of 0 (never kill) to 1000 (always kill),
roughly meaning that if a task has a badness() score of 500 that the task
consumes approximately 50% of allowable memory resident in RAM or in swap
space.

The proportion is always relative to the amount of "allowable" memory and
not the total amount of RAM systemwide so that mempolicies and cpusets may
operate in isolation; they shall not need to know the true size of the
machine on which they are running if they are bound to a specific set of
nodes or mems, respectively.

Root tasks are given 3% extra memory just like __vm_enough_memory()
provides in LSMs.  In the event of two tasks consuming similar amounts of
memory, it is generally better to save root's task.

Because of the change in the badness() heuristic's baseline, it is also
necessary to introduce a new user interface to tune it.  It's not possible
to redefine the meaning of /proc/pid/oom_adj with a new scale since the
ABI cannot be changed for backward compatability.  Instead, a new tunable,
/proc/pid/oom_score_adj, is added that ranges from -1000 to +1000.  It may
be used to polarize the heuristic such that certain tasks are never
considered for oom kill while others may always be considered.  The value
is added directly into the badness() score so a value of -500, for
example, means to discount 50% of its memory consumption in comparison to
other tasks either on the system, bound to the mempolicy, in the cpuset,
or sharing the same memory controller.

/proc/pid/oom_adj is changed so that its meaning is rescaled into the
units used by /proc/pid/oom_score_adj, and vice versa.  Changing one of
these per-task tunables will rescale the value of the other to an
equivalent meaning.  Although /proc/pid/oom_adj was originally defined as
a bitshift on the badness score, it now shares the same linear growth as
/proc/pid/oom_score_adj but with different granularity.  This is required
so the ABI is not broken with userspace applications and allows oom_adj to
be deprecated for future removal.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09 20:45:02 -07:00
..
ABI Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 2010-08-06 11:44:36 -07:00
accounting
acpi ACPI, APEI, EINJ injection parameters support 2010-05-19 22:42:08 -04:00
aoe Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
arm Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
auxdisplay
blackfin Documentation/: it's -> its where appropriate 2010-04-23 02:09:52 +02:00
block nick piggin: change email address 2010-08-05 08:45:20 -07:00
blockdev Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
cdrom Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
cgroups Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
connector Documentation/: it's -> its where appropriate 2010-04-23 02:09:52 +02:00
console doc: fix console doc typo 2010-02-24 13:51:32 +01:00
cpu-freq
cpuidle
cris
crypto
development-process Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
device-mapper Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
DocBook Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 2010-08-06 11:36:30 -07:00
driver-model
dvb Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2010-08-04 15:31:02 -07:00
early-userspace
fault-injection lkdtm: add debugfs access and loosen KPROBE ties 2010-03-06 11:26:32 -08:00
fb Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
filesystems oom: badness heuristic rewrite 2010-08-09 20:45:02 -07:00
firmware_class firmware: Update hotplug script 2010-08-05 13:53:34 -07:00
frv
hwmon Merge branch 'x86-hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-08-06 10:02:58 -07:00
i2c Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
i2o
ia64 Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
ide
infiniband Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
input Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2010-08-04 15:31:02 -07:00
ioctl Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 2010-08-07 17:09:24 -07:00
isdn Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2010-08-04 15:31:02 -07:00
ja_JP Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
kbuild Merge branch 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6 2010-08-05 14:12:07 -07:00
kdump
ko_KR Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
kvm KVM: Document KVM_GET_SUPPORTED_CPUID2 ioctl 2010-08-02 06:40:50 +03:00
laptops Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2010-08-04 15:31:02 -07:00
lguest Documentation/: it's -> its where appropriate 2010-04-23 02:09:52 +02:00
m68k
make
mips
misc-devices Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
mn10300
mtd Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
namespaces
netlabel Documentation/: it's -> its where appropriate 2010-04-23 02:09:52 +02:00
networking DNS: Fixes for the DNS query module 2010-08-06 02:26:27 +00:00
parisc
PCI Documentation: pci.txt: fix typo 2010-07-11 22:17:45 +02:00
pcmcia pcmcia: do not use io_req_t when calling pcmcia_request_io() 2010-08-03 09:04:11 +02:00
power Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
powerpc Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2010-08-05 09:03:46 -07:00
pps
prctl
RCU Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
s390 Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
scheduler sched: Remove USER_SCHED from documentation 2010-04-02 20:12:01 +02:00
scsi Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2010-08-04 15:31:02 -07:00
serial Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
sh
sound Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 2010-08-07 17:07:31 -07:00
sparc
spi spi/ep93xx: implemented driver for Cirrus EP93xx SPI controller 2010-05-25 00:23:16 -06:00
sysctl oom: enable oom tasklist dump by default 2010-08-09 20:44:56 -07:00
telephony Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
thermal
timers Documentation: Add timers/timers-howto.txt 2010-08-04 11:00:45 +02:00
trace vmscan: tracing: add a postprocessing script for reclaim-related ftrace events 2010-08-09 20:45:00 -07:00
uml Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
usb Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
video4linux Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2010-08-04 15:31:02 -07:00
vm Documentation/vm: fix spelling in page-types.c 2010-08-05 13:21:23 -07:00
w1 Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
watchdog watchdog: docs: add an entry for imx2_wdt 2010-07-01 16:02:55 +00:00
wimax
x86 x86, olpc: Add support for calling into OpenFirmware 2010-06-18 14:54:36 -07:00
zh_CN Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
.gitignore add random binaries to .gitignore 2010-04-08 11:34:34 +02:00
00-INDEX documentation: fix almost duplicate filenames (IO/io-mapping.txt) 2010-07-20 17:49:30 +00:00
apparmor.txt AppArmor: update Maintainer and Documentation 2010-08-02 15:35:15 +10:00
applying-patches.txt
atomic_ops.txt Documentation/: it's -> its where appropriate 2010-04-23 02:09:52 +02:00
bad_memory.txt
basic_profiling.txt
binfmt_misc.txt Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
braille-console.txt
bt8xxgpio.txt
btmrvl.txt
BUG-HUNTING
bus-virt-phys-mapping.txt documentation: fix almost duplicate filenames (IO/io-mapping.txt) 2010-07-20 17:49:30 +00:00
cachetlb.txt Documentation/: it's -> its where appropriate 2010-04-23 02:09:52 +02:00
Changes Documentation update broken web addresses 2010-07-11 21:55:42 +02:00
circular-buffers.txt Document Linux's circular buffering capabilities 2010-03-24 16:31:22 -07:00
coccinelle.txt Documentation: fix ubuntu distro name 2010-06-29 15:27:00 +02:00
CodingStyle
cpu-hotplug.txt
cpu-load.txt
cputopology.txt
credentials.txt CRED: Fix __task_cred()'s lockdep check and banner comment 2010-07-29 15:16:18 -07:00
dcdbas.txt
debugging-modules.txt
debugging-via-ohci1394.txt
dell_rbu.txt
devices.txt Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
DMA-API-HOWTO.txt Documentation/DMA-API-HOWTO: add ARCH_KMALLOC_MINALIGN description 2010-05-27 09:12:53 -07:00
DMA-API.txt Documentation: rename PCI-DMA-mapping.txt to DMA-API-HOWTO.txt 2010-03-12 15:52:43 -08:00
DMA-attributes.txt
DMA-ISA-LPC.txt
dmaengine.txt
dontdiff Merge branch 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-kconfig 2010-02-25 14:43:57 -08:00
dynamic-debug-howto.txt
edac.txt Documentation/edac.txt: Reflect the sysfs changes at the document 2010-05-10 11:49:30 -03:00
eisa.txt
email-clients.txt Documentation/email-clients.txt: update gmail information 2010-03-12 15:52:35 -08:00
feature-removal-schedule.txt Merge branch 'timers-timekeeping-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-08-06 13:18:29 -07:00
flexible-arrays.txt
futex-requeue-pi.txt
gcov.txt
gpio.txt gpio: introduce gpio_request_one() and friends 2010-03-06 11:26:48 -08:00
highuid.txt
HOWTO Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
hw_random.txt
init.txt init/main.c: improve usability in case of init binary failure 2010-03-06 11:26:29 -08:00
initrd.txt
Intel-IOMMU.txt
intel_txt.txt Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
io-mapping.txt
io_ordering.txt
iostats.txt
IPMI.txt ipmi: add parameter to limit CPU usage in kipmid 2010-03-12 15:52:39 -08:00
IRQ-affinity.txt
IRQ.txt
irqflags-tracing.txt
isapnp.txt
java.txt
kernel-doc-nano-HOWTO.txt
kernel-docs.txt Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
kernel-parameters.txt Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 2010-08-07 17:07:31 -07:00
keys-request-key.txt
keys.txt
kmemcheck.txt
kmemleak.txt
kobject.txt kobject: documentation: Update to refer to kset-example.c. 2010-03-19 07:12:20 -07:00
kprobes.txt Documentation: Mention that KProbes is supported on MIPS 2010-08-05 13:26:30 +01:00
kref.txt
ldm.txt Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
leds-class.txt
leds-lp3944.txt
local_ops.txt
lockdep-design.txt
lockstat.txt
logo.gif
logo.txt
magic-number.txt
Makefile Documentation/fs/: split txt and source files 2010-03-12 15:52:35 -08:00
ManagementStyle
mca.txt
md.txt Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
memory-barriers.txt Document Linux's circular buffering capabilities 2010-03-24 16:31:22 -07:00
memory-hotplug.txt
memory.txt
mono.txt
mutex-design.txt Rename .text.lock to .text..lock. 2010-03-03 11:26:00 +01:00
nmi_watchdog.txt
nommu-mmap.txt
numastat.txt
oops-tracing.txt panic: Add taint flag TAINT_FIRMWARE_WORKAROUND ('I') 2010-05-19 08:37:43 +01:00
padata.txt padata: update API documentation 2010-07-31 19:53:07 +08:00
parport-lowlevel.txt
parport.txt
pi-futex.txt
pnp.txt
preempt-locking.txt
printk-formats.txt
prio_tree.txt
rbtree.txt rbtree: Add support for augmented rbtrees 2010-02-18 15:40:56 -08:00
rfkill.txt Document the rfkill sysfs ABI 2010-03-10 17:09:33 -05:00
robust-futex-ABI.txt
robust-futexes.txt
rt-mutex-design.txt variable name fix to Documentation/rt-mutex-design.txt 2010-06-05 17:39:09 +02:00
rt-mutex.txt
rtc.txt
SAK.txt
SecurityBugs
SELinux.txt
serial-console.txt
sgi-ioc4.txt
sgi-visws.txt
SM501.txt
Smack.txt Documentation/: it's -> its where appropriate 2010-04-23 02:09:52 +02:00
sparse.txt update email address 2010-07-19 10:56:54 +02:00
spinlocks.txt
stable_api_nonsense.txt
stable_kernel_rules.txt Documentation: -stable rules: upstream commit ID requirement reworded 2010-04-22 15:24:56 -07:00
SubmitChecklist Documentation: update SubmitChecklist for O=objdir and kconfig testing 2010-05-24 07:31:20 -07:00
SubmittingDrivers Documentation: update broken web addresses. 2010-08-04 15:21:40 +02:00
SubmittingPatches
svga.txt
sysfs-rules.txt Fix typos in comments 2010-03-16 11:47:56 +01:00
sysrq.txt Input: Documentation/sysrq.txt - update KEY_SYSRQ info 2010-05-19 10:14:15 -07:00
tomoyo.txt TOMOYO: Update version to 2.3.0 2010-08-02 15:35:10 +10:00
unaligned-memory-access.txt
unicode.txt
unshare.txt
VGA-softcursor.txt
vgaarbiter.txt
video-output.txt
volatile-considered-harmful.txt Documentation/volatile-considered-harmful.txt: correct cpu_relax() documentation 2010-03-24 16:31:20 -07:00
zorro.txt