mirror of
				git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
				synced 2025-10-31 08:44:41 +00:00 
			
		
		
		
	cgroup: Add documentation for cgroup namespaces
Signed-off-by: Aditya Kali <adityakali@google.com> Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Tejun Heo <tj@kernel.org>
This commit is contained in:
		
							parent
							
								
									ed82571b1a
								
							
						
					
					
						commit
						d4021f6cd4
					
				
					 1 changed files with 147 additions and 0 deletions
				
			
		|  | @ -47,6 +47,11 @@ CONTENTS | ||||||
|   5-3. IO |   5-3. IO | ||||||
|     5-3-1. IO Interface Files |     5-3-1. IO Interface Files | ||||||
|     5-3-2. Writeback |     5-3-2. Writeback | ||||||
|  | 6. Namespace | ||||||
|  |   6-1. Basics | ||||||
|  |   6-2. The Root and Views | ||||||
|  |   6-3. Migration and setns(2) | ||||||
|  |   6-4. Interaction with Other Namespaces | ||||||
| P. Information on Kernel Programming | P. Information on Kernel Programming | ||||||
|   P-1. Filesystem Support for Writeback |   P-1. Filesystem Support for Writeback | ||||||
| D. Deprecated v1 Core Features | D. Deprecated v1 Core Features | ||||||
|  | @ -1085,6 +1090,148 @@ writeback as follows. | ||||||
| 	vm.dirty[_background]_ratio. | 	vm.dirty[_background]_ratio. | ||||||
| 
 | 
 | ||||||
| 
 | 
 | ||||||
|  | 6. Namespace | ||||||
|  | 
 | ||||||
|  | 6-1. Basics | ||||||
|  | 
 | ||||||
|  | cgroup namespace provides a mechanism to virtualize the view of the | ||||||
|  | "/proc/$PID/cgroup" file and cgroup mounts.  The CLONE_NEWCGROUP clone | ||||||
|  | flag can be used with clone(2) and unshare(2) to create a new cgroup | ||||||
|  | namespace.  The process running inside the cgroup namespace will have | ||||||
|  | its "/proc/$PID/cgroup" output restricted to cgroupns root.  The | ||||||
|  | cgroupns root is the cgroup of the process at the time of creation of | ||||||
|  | the cgroup namespace. | ||||||
|  | 
 | ||||||
|  | Without cgroup namespace, the "/proc/$PID/cgroup" file shows the | ||||||
|  | complete path of the cgroup of a process.  In a container setup where | ||||||
|  | a set of cgroups and namespaces are intended to isolate processes the | ||||||
|  | "/proc/$PID/cgroup" file may leak potential system level information | ||||||
|  | to the isolated processes.  For Example: | ||||||
|  | 
 | ||||||
|  |   # cat /proc/self/cgroup | ||||||
|  |   0::/batchjobs/container_id1 | ||||||
|  | 
 | ||||||
|  | The path '/batchjobs/container_id1' can be considered as system-data | ||||||
|  | and undesirable to expose to the isolated processes.  cgroup namespace | ||||||
|  | can be used to restrict visibility of this path.  For example, before | ||||||
|  | creating a cgroup namespace, one would see: | ||||||
|  | 
 | ||||||
|  |   # ls -l /proc/self/ns/cgroup | ||||||
|  |   lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835] | ||||||
|  |   # cat /proc/self/cgroup | ||||||
|  |   0::/batchjobs/container_id1 | ||||||
|  | 
 | ||||||
|  | After unsharing a new namespace, the view changes. | ||||||
|  | 
 | ||||||
|  |   # ls -l /proc/self/ns/cgroup | ||||||
|  |   lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183] | ||||||
|  |   # cat /proc/self/cgroup | ||||||
|  |   0::/ | ||||||
|  | 
 | ||||||
|  | When some thread from a multi-threaded process unshares its cgroup | ||||||
|  | namespace, the new cgroupns gets applied to the entire process (all | ||||||
|  | the threads).  This is natural for the v2 hierarchy; however, for the | ||||||
|  | legacy hierarchies, this may be unexpected. | ||||||
|  | 
 | ||||||
|  | A cgroup namespace is alive as long as there are processes inside or | ||||||
|  | mounts pinning it.  When the last usage goes away, the cgroup | ||||||
|  | namespace is destroyed.  The cgroupns root and the actual cgroups | ||||||
|  | remain. | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | 6-2. The Root and Views | ||||||
|  | 
 | ||||||
|  | The 'cgroupns root' for a cgroup namespace is the cgroup in which the | ||||||
|  | process calling unshare(2) is running.  For example, if a process in | ||||||
|  | /batchjobs/container_id1 cgroup calls unshare, cgroup | ||||||
|  | /batchjobs/container_id1 becomes the cgroupns root.  For the | ||||||
|  | init_cgroup_ns, this is the real root ('/') cgroup. | ||||||
|  | 
 | ||||||
|  | The cgroupns root cgroup does not change even if the namespace creator | ||||||
|  | process later moves to a different cgroup. | ||||||
|  | 
 | ||||||
|  |   # ~/unshare -c # unshare cgroupns in some cgroup | ||||||
|  |   # cat /proc/self/cgroup | ||||||
|  |   0::/ | ||||||
|  |   # mkdir sub_cgrp_1 | ||||||
|  |   # echo 0 > sub_cgrp_1/cgroup.procs | ||||||
|  |   # cat /proc/self/cgroup | ||||||
|  |   0::/sub_cgrp_1 | ||||||
|  | 
 | ||||||
|  | Each process gets its namespace-specific view of "/proc/$PID/cgroup" | ||||||
|  | 
 | ||||||
|  | Processes running inside the cgroup namespace will be able to see | ||||||
|  | cgroup paths (in /proc/self/cgroup) only inside their root cgroup. | ||||||
|  | From within an unshared cgroupns: | ||||||
|  | 
 | ||||||
|  |   # sleep 100000 & | ||||||
|  |   [1] 7353 | ||||||
|  |   # echo 7353 > sub_cgrp_1/cgroup.procs | ||||||
|  |   # cat /proc/7353/cgroup | ||||||
|  |   0::/sub_cgrp_1 | ||||||
|  | 
 | ||||||
|  | From the initial cgroup namespace, the real cgroup path will be | ||||||
|  | visible: | ||||||
|  | 
 | ||||||
|  |   $ cat /proc/7353/cgroup | ||||||
|  |   0::/batchjobs/container_id1/sub_cgrp_1 | ||||||
|  | 
 | ||||||
|  | From a sibling cgroup namespace (that is, a namespace rooted at a | ||||||
|  | different cgroup), the cgroup path relative to its own cgroup | ||||||
|  | namespace root will be shown.  For instance, if PID 7353's cgroup | ||||||
|  | namespace root is at '/batchjobs/container_id2', then it will see | ||||||
|  | 
 | ||||||
|  |   # cat /proc/7353/cgroup | ||||||
|  |   0::/../container_id2/sub_cgrp_1 | ||||||
|  | 
 | ||||||
|  | Note that the relative path always starts with '/' to indicate that | ||||||
|  | its relative to the cgroup namespace root of the caller. | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | 6-3. Migration and setns(2) | ||||||
|  | 
 | ||||||
|  | Processes inside a cgroup namespace can move into and out of the | ||||||
|  | namespace root if they have proper access to external cgroups.  For | ||||||
|  | example, from inside a namespace with cgroupns root at | ||||||
|  | /batchjobs/container_id1, and assuming that the global hierarchy is | ||||||
|  | still accessible inside cgroupns: | ||||||
|  | 
 | ||||||
|  |   # cat /proc/7353/cgroup | ||||||
|  |   0::/sub_cgrp_1 | ||||||
|  |   # echo 7353 > batchjobs/container_id2/cgroup.procs | ||||||
|  |   # cat /proc/7353/cgroup | ||||||
|  |   0::/../container_id2 | ||||||
|  | 
 | ||||||
|  | Note that this kind of setup is not encouraged.  A task inside cgroup | ||||||
|  | namespace should only be exposed to its own cgroupns hierarchy. | ||||||
|  | 
 | ||||||
|  | setns(2) to another cgroup namespace is allowed when: | ||||||
|  | 
 | ||||||
|  | (a) the process has CAP_SYS_ADMIN against its current user namespace | ||||||
|  | (b) the process has CAP_SYS_ADMIN against the target cgroup | ||||||
|  |     namespace's userns | ||||||
|  | 
 | ||||||
|  | No implicit cgroup changes happen with attaching to another cgroup | ||||||
|  | namespace.  It is expected that the someone moves the attaching | ||||||
|  | process under the target cgroup namespace root. | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | 6-4. Interaction with Other Namespaces | ||||||
|  | 
 | ||||||
|  | Namespace specific cgroup hierarchy can be mounted by a process | ||||||
|  | running inside a non-init cgroup namespace. | ||||||
|  | 
 | ||||||
|  |   # mount -t cgroup2 none $MOUNT_POINT | ||||||
|  | 
 | ||||||
|  | This will mount the unified cgroup hierarchy with cgroupns root as the | ||||||
|  | filesystem root.  The process needs CAP_SYS_ADMIN against its user and | ||||||
|  | mount namespaces. | ||||||
|  | 
 | ||||||
|  | The virtualization of /proc/self/cgroup file combined with restricting | ||||||
|  | the view of cgroup hierarchy by namespace-private cgroupfs mount | ||||||
|  | provides a properly isolated cgroup view inside the container. | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
| P. Information on Kernel Programming | P. Information on Kernel Programming | ||||||
| 
 | 
 | ||||||
| This section contains kernel programming information in the areas | This section contains kernel programming information in the areas | ||||||
|  |  | ||||||
		Loading…
	
	Add table
		
		Reference in a new issue
	
	 Serge Hallyn
						Serge Hallyn