linux/Documentation/filesystems
Vlastimil Babka bf9683d699 mm, documentation: clarify /proc/pid/status VmSwap limitations for shmem
This series is based on Jerome Marchand's [1] so let me quote the first
paragraph from there:

There are several shortcomings with the accounting of shared memory
(sysV shm, shared anonymous mapping, mapping to a tmpfs file).  The
values in /proc/<pid>/status and statm don't allow to distinguish
between shmem memory and a shared mapping to a regular file, even though
their implications on memory usage are quite different: at reclaim, file
mapping can be dropped or written back on disk while shmem needs a place
in swap.  As for shmem pages that are swapped-out or in swap cache, they
aren't accounted at all.

The original motivation for myself is that a customer found (IMHO
rightfully) confusing that e.g.  top output for process swap usage is
unreliable with respect to swapped out shmem pages, which are not
accounted for.

The fundamental difference between private anonymous and shmem pages is
that the latter has PTE's converted to pte_none, and not swapents.  As
such, they are not accounted to the number of swapents visible e.g.  in
/proc/pid/status VmSwap row.  It might be theoretically possible to use
swapents when swapping out shmem (without extra cost, as one has to
change all mappers anyway), and on swap in only convert the swapent for
the faulting process, leaving swapents in other processes until they
also fault (so again no extra cost).  But I don't know how many
assumptions this would break, and it would be too disruptive change for
a relatively small benefit.

Instead, my approach is to document the limitation of VmSwap, and
provide means to determine the swap usage for shmem areas for those who
are interested and willing to pay the price, using /proc/pid/smaps.
Because outside of ipcs, I don't think it's possible to currently to
determine the usage at all.  The previous patchset [1] did introduce new
shmem-specific fields into smaps output, and functions to determine the
values.  I take a simpler approach, noting that smaps output already has
a "Swap: X kB" line, where currently X == 0 always for shmem areas.  I
think we can just consider this a bug and provide the proper value by
consulting the radix tree, as e.g.  mincore_page() does.  In the patch
changelog I explain why this is also not perfect (and cannot be without
swapents), but still arguably much better than showing a 0.

The last two patches are adapted from Jerome's patchset and provide a
VmRSS breakdown to RssAnon, RssFile and RssShm in /proc/pid/status.
Hugh noted that this is a welcome addition, and I agree that it might
help e.g.  debugging process memory usage at albeit non-zero, but still
rather low cost of extra per-mm counter and some page flag checks.

[1] http://lwn.net/Articles/611966/

This patch (of 6):

The documentation for /proc/pid/status does not mention that the value
of VmSwap counts only swapped out anonymous private pages, and not
swapped out pages of the underlying shmem objects (for shmem mappings).
This is not obvious, so document this limitation.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14 16:00:49 -08:00
..
caching FS-Cache: Count the number of initialised operations 2015-04-02 14:28:53 +01:00
cifs
configfs configfs: implement binary attributes 2016-01-04 12:31:46 +01:00
nfs ipconfig: send Client-identifier in DHCP requests 2015-10-18 19:23:52 -07:00
pohmelfs
.gitignore
00-INDEX dax: replace XIP documentation with DAX documentation 2015-02-16 17:56:03 -08:00
9p.txt
adfs.txt
affs.txt
afs.txt
autofs4-mount-control.txt
autofs4.txt
automount-support.txt Documentation: remove outdated information from automount-support.txt 2015-05-15 01:10:38 -04:00
befs.txt
bfs.txt
btrfs.txt Documentation: filesystems: btrfs: Fixed typos and whitespace 2015-07-09 14:31:06 -06:00
ceph.txt
coda.txt
cramfs.txt
dax.txt dax: add huge page fault support 2015-09-08 15:35:28 -07:00
debugfs.txt debugfs: Pass bool pointer to debugfs_create_bool() 2015-10-04 11:36:07 +01:00
devpts.txt
directory-locking
dlmfs.txt ocfs2: update web page + git tree in documentation 2015-02-28 09:57:50 -08:00
dnotify.txt
dnotify_test.c
ecryptfs.txt
efivarfs.txt
exofs.txt
ext2.txt fs: Remove ext3 filesystem driver 2015-07-23 20:59:40 +02:00
ext3.txt fs: Remove ext3 filesystem driver 2015-07-23 20:59:40 +02:00
ext4.txt ext4: add DAX functionality 2015-02-16 17:56:04 -08:00
f2fs.txt f2fs: introduce new option for controlling data flush 2015-12-16 09:25:48 -08:00
fiemap.txt
files.txt
fuse.txt
gfs2-glocks.txt gfs2: Remove gl_spin define 2015-10-29 12:57:48 -05:00
gfs2-uevents.txt
gfs2.txt
hfs.txt
hfsplus.txt
hpfs.txt
inotify.txt
isofs.txt
jfs.txt
Locking switch ->get_link() to delayed_call, kill ->put_link() 2015-12-30 13:01:03 -05:00
locks.txt
logfs.txt
Makefile configfs: remove old API 2015-10-13 22:17:57 -07:00
mandatory-locking.txt
ncpfs.txt
nilfs2.txt
ntfs.txt
ocfs2.txt ocfs2: update web page + git tree in documentation 2015-02-28 09:57:50 -08:00
omfs.txt
overlayfs.txt Remove email address from Documentation/filesystems/overlayfs.txt 2015-11-11 10:04:53 -07:00
path-lookup.md Documentation: add new description of path-name lookup. 2015-11-02 18:18:25 -07:00
path-lookup.txt Documentation: add new description of path-name lookup. 2015-11-02 18:18:25 -07:00
porting switch ->get_link() to delayed_call, kill ->put_link() 2015-12-30 13:01:03 -05:00
proc.txt mm, documentation: clarify /proc/pid/status VmSwap limitations for shmem 2016-01-14 16:00:49 -08:00
qnx6.txt
quota.txt quota: Update documentation 2015-05-18 11:23:07 +02:00
ramfs-rootfs-initramfs.txt
relay.txt
romfs.txt
seq_file.txt
sharedsubtree.txt
spufs.txt
squashfs.txt
sysfs-pci.txt
sysfs-tagging.txt sysfs-tagging.txt: fix pre-kernfs references 2015-09-13 14:38:51 -06:00
sysfs.txt sysfs.txt: mention that store method buffers are null-terminated 2015-09-13 14:38:51 -06:00
sysv-fs.txt
tmpfs.txt
ubifs.txt
udf.txt
ufs.txt
vfat.txt
vfs.txt switch ->get_link() to delayed_call, kill ->put_link() 2015-12-30 13:01:03 -05:00
xfs-delayed-logging-design.txt
xfs-self-describing-metadata.txt
xfs.txt xfs: fix kernel version in docs 2015-06-01 07:15:38 +10:00