Linux 单用户模式patch解析

2019-07-12 18:38发布

在我之前文章提到Linux 4.1内核支持单用户模式(传送门:https://blog.csdn.net/cui841923894/article/details/81568351),此模式下用户UID和GID均为0同时不再区分用户权限(类root权限),应用于在某些小系统(例如嵌入式系统)。
接下来我们看下这个patch是如何实现内核单用户的。

内核patch解析

patch查看地址:https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2813893f8b197a14f1e1ddb04d99bce46817c84a 1.commit说明 kernel: conditionally support non-root users, groups and capabilities There are a lot of embedded systems that run most or all of their functionality in init, running as root:root. For these systems, supporting multiple users is not necessary. 在很多嵌入式系统中,他们始终使用root:root用户进行操作。这些系统中,多用户功能显得不是很必需(鸡肋了~)。 This patch adds a new symbol, CONFIG_MULTIUSER, that makes support for non-root users, non-root groups, and capabilities optional. It is enabled under CONFIG_EXPERT menu. 这个patch添加了新的CONFIG_MULTIUSER内核开关,支持non-root users,, non-root groups, and capabilities。 When this symbol is not defined, UID and GID are zero in any possible case and processes always have all capabilities. 当CONFIG_MULTIUSER关闭(关闭多用户模式),UID和GID均是0,进程拥有所有capabilities拥有的功能。 The following syscalls are compiled out: setuid, setregid, setgid, setreuid, setresuid, getresuid, setresgid, getresgid, setgroups, getgroups, setfsuid, setfsgid, capget, capset. 同时系统调用setuid, setregid, setgid, setreuid, setresuid, getresuid, setresgid, getresgid, setgroups, getgroups, setfsuid, setfsgid, capget, capset将不再编译(和支持)。 Also, groups.c is compiled out completely. 同时group.c文件不再编译。 In kernel/capability.c, capable function was moved in order to avoid adding two ifdef blocks. kernel/capability.c中的capable相关函数也将移除(其实是采用#ifdef来判断进入正常处理还是直接返回)。 This change saves about 25 KB on a defconfig build. The most minimal kernels have total text sizes in the high hundreds of kB rather than low MB. (The 25k goes down a bit with allnoconfig, but not that much. 这项修改在使用defconfig(内核的默认config)可以节省25KB的内核二进制大小。在小内核的config场景可以节省数百KB空间(小于1MB)。在allnoconfig下节省稍微小于25KB的空间。 The kernel was booted in Qemu. All the common functionalities work. Adding users/groups is not possible, failing with -ENOSYS. 在虚拟机启动的系统(验证),基本系统调用都可以正常运行,所有设计添加users/groups的操作都无效,返回-ENOSYS。 Bloat-o-meter output: add/remove: 7/87 grow/shrink: 19/397 up/down: 1675/-26325 (-24650) [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Iulia Manda Reviewed-by: Josh Triplett Acked-by: Geert Uytterhoeven Tested-by: Paul E. McKenney Reviewed-by: Paul E. McKenney Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds 2.patch修改内容解析
因为patch涉及修改行多,并且很多目的相同,所以挑重点介绍。 a.某些功能和架构中添加对MULTIUSER config的支持: diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index a5ced5c..de2726a 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -328,6 +328,7 @@ config COMPAT select COMPAT_BINFMT_ELF if BINFMT_ELF select ARCH_WANT_OLD_COMPAT_IPC select COMPAT_OLD_SIGACTION + depends on MULTIUSER diff --git a/drivers/staging/lustre/lustre/Kconfig b/drivers/staging/lustre/lustre/Kconfig index 6725467..62c7bba 100644 --- a/drivers/staging/lustre/lustre/Kconfig +++ b/drivers/staging/lustre/lustre/Kconfig @@ -10,6 +10,7 @@ config LUSTRE_FS select CRYPTO_SHA1 select CRYPTO_SHA256 select CRYPTO_SHA512 + depends on MULTIUSER … b.通过#ifdef CONFIG_MULTIUSER设置函数分支 diff --git a/include/linux/capability.h b/include/linux/capability.h index aa93e5e..af9f0b9 100644 --- a/include/linux/capability.h +++ b/include/linux/capability.h @@ -205,6 +205,7 @@ static inline kernel_cap_t cap_raise_nfsd_set(const kernel_cap_t a, cap_intersect(permitted, __cap_nfsd_set)); } +#ifdef CONFIG_MULTIUSER //如果定义多用户,则执行正常功能函数 extern bool has_capability(struct task_struct *t, int cap); extern bool has_ns_capability(struct task_struct *t, struct user_namespace *ns, int cap); @@ -213,6 +214,34 @@ extern bool has_ns_capability_noaudit(struct task_struct *t, struct user_namespace *ns, int cap); extern bool capable(int cap); extern bool ns_capable(struct user_namespace *ns, int cap); +#else // 如果non-root模式,则capability等操作不支持 +static inline bool has_capability(struct task_struct *t, int cap) +{ + return true; +} … +static inline bool ns_capable(struct user_namespace *ns, int cap) +{ + return true; +} +#endif /* CONFIG_MULTIUSER * diff --git a/include/linux/cred.h b/include/linux/cred.h index 2fb2ca2..8b6c083 100644 --- a/include/linux/cred.h +++ b/include/linux/cred.h @@ -62,9 +62,27 @@ do { groups_free(group_info); } while (0) -extern struct group_info *groups_alloc(int); extern struct group_info init_groups; +#ifdef CONFIG_MULTIUSER //non-root模式屏蔽in_group_p和in_egroup_p等函数 +extern struct group_info *groups_alloc(int); extern void groups_free(struct group_info *); + +extern int in_group_p(kgid_t); +extern int in_egroup_p(kgid_t); +#else +static inline void groups_free(struct group_info *group_info) +{ +} + +static inline int in_group_p(kgid_t grp) +{ + return 1; +} +static inline int in_egroup_p(kgid_t grp) +{ + return 1; +} +#endif diff --git a/include/linux/uidgid.h b/include/linux/uidgid.h index 2d1f9b6..0ee05da 100644 --- a/include/linux/uidgid.h +++ b/include/linux/uidgid.h @@ -29,6 +29,7 @@ typedef struct { #define KUIDT_INIT(value) (kuid_t){ value } #define KGIDT_INIT(value) (kgid_t){ value } +#ifdef CONFIG_MULTIUSER //屏蔽__kuid_val和__kuid_val static inline uid_t __kuid_val(kuid_t uid) { return uid.val; @@ -38,6 +39,17 @@ static inline gid_t __kgid_val(kgid_t gid) { return gid.val; } +#else +static inline uid_t __kuid_val(kuid_t uid) +{ + return 0; +} + +static inline gid_t __kgid_val(kgid_t gid) +{ + return 0; +} +#endif c. init/Kconfig添加MULTIUSER支持,这样内核make menuconfig可以看到MULTIUSER … +config MULTIUSER + bool "Multiple users, groups and capabilities support" if EXPERT + default y + help + This option enables support for non-root users, groups and + capabilities. + + If you say N here, all processes will run with UID 0, GID 0, and all + possible capabilities. Saying N here also compiles out support for + system calls related to UIDs, GIDs, and capabilities, such as setuid, + setgid, and capset. + + If unsure, say Y here. + d. kernel/Makefile添加MULTIUSER支持 diff --git a/kernel/Makefile b/kernel/Makefile index 1408b33..0f8f8b0 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -9,7 +9,9 @@ obj-y = fork.o exec_domain.o panic.o extable.o params.o kthread.o sys_ni.o nsproxy.o notifier.o ksysfs.o cred.o reboot.o - async.o range.o groups.o smpboot.o + async.o range.o smpboot.o + +obj-$(CONFIG_MULTIUSER) += groups.o //这里,选择CONFIG_MULTIUSER后才会编译group.c e.这里在capability.c中,第35行添加ifdef CONFIG_MULTIUSER,第386行添加+#endif /* CONFIG_MULTIUSER */,说明只有选择CONFIG_MULTIUSER,文件第35行——386行中包括的函数,才可以生效(定义,实现)。 diff --git a/kernel/capability.c b/kernel/capability.c index 989f5bf..45432b5 100644 --- a/kernel/capability.c +++ b/kernel/capability.c @@ -35,6 +35,7 @@ static int __init file_caps_disable(char *str) } __setup("no_file_caps", file_caps_disable); +#ifdef CONFIG_MULTIUSER /* * More recent versions of libcap are available from: * @@ -386,6 +387,24 @@ bool ns_capable(struct user_namespace *ns, int cap) } EXPORT_SYMBOL(ns_capable); + +/** + * capable - Determine if the current task has a superior capability in effect + * @cap: The capability to be tested for + * + * Return true if the current task has the given superior capability currently + * available for use, false if not. + * + * This sets PF_SUPERPRIV on the task if the capability is available on the + * assumption that it's about to be used. + */ +bool capable(int cap) +{ + return ns_capable(&init_user_ns, cap); +} +EXPORT_SYMBOL(capable); +#endif /* CONFIG_MULTIUSER */ f.sys_ni.c中添加以上处理函数。
这里提一下sys_ni.c作用,如果一个系统调用被淘汰,它所对应的服务例程就要被指定为sys_ni_syscall。sys_ni_syscall中的”ni”即表示”not implemented(没有实现)”。 diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 5adcb0a..7995ef5 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -159,6 +159,20 @@ cond_syscall(sys_uselib); cond_syscall(sys_fadvise64); cond_syscall(sys_fadvise64_64); cond_syscall(sys_madvise); +cond_syscall(sys_setuid); +cond_syscall(sys_setregid); +cond_syscall(sys_setgid); +cond_syscall(sys_setreuid); +cond_syscall(sys_setresuid); +cond_syscall(sys_getresuid); +cond_syscall(sys_setresgid); +cond_syscall(sys_getresgid); +cond_syscall(sys_setgroups); +cond_syscall(sys_getgroups); +cond_syscall(sys_setfsuid); +cond_syscall(sys_setfsgid); +cond_syscall(sys_capget); +cond_syscall(sys_capset); 以上,patch简单来说,就是实现了:
The following syscalls are compiled out: setuid, setregid, setgid,
setreuid, setresuid, getresuid, setresgid, getresgid, setgroups,
getgroups, setfsuid, setfsgid, capget, capset. Also, groups.c is compiled out completely. In kernel/capability.c, capable function was moved in order to avoid
adding two ifdef blocks.

运行效果

1.使用v4.18内核编译bzImage #git branch * (HEAD detached at v4.18) #cp arch/x86/configs/x86_64_defconfig ./.config #make menuconfig (关闭MULTIUSER) #make bzImage -j8 这里写图片描述
编译好后,内核在目录arch/x86/boot/bzImage 2.使用qemu启动 / # adduser cuibixuan (这里为什么还能添加用户?) adduser: /home/cuibixuan: No such file or directory passwd: unknown uid 0 / # su cuibixuan su: can't set groups: Function not implemented 可以看到,groups相关操作,已经” Function not implemented”。说明添加到kernel/sys_ni.c的函数sys_setgroups已经生效(+cond_syscall(sys_setgroups);)。

后续

Linux对single-user system的支持,个人认为仅仅不支持uid/gid、group和等capability等相关函数是不够的。比如,启动前fs已经配置多个用户(/etc/passwd和/etc/group)怎么处理;以及某些(安全相关)系统调用建议运行在个人用户权限下怎么办?以及https://lwn.net/Articles/631853/讨论提到:
multiple processes, scheduling等问题:
Come to think of it, I look forward to the next tinification patch
that removes support for multiple processes, scheduling, and makes the
only running process always have pid 1.
或者针对threads讨论:
The problem is then that the single userspace task can prevent
necessary kernel threads from running.
这里写代码片