Docker setuid & setgid weirdness

Posted on 20 February 2020

During some work on a project I came across some strange behaviour on how docker handles setuid & setgid. In Linux the setuid and setgid C calls are used to change either the running user (setuid) or the current primary group (setgid), these C calls can only be used by a user with the relevant permissions (usually root). However there is another case in Linux where you can change your user and group and that is through the use of the setuid & setgid Bits on a file permissions.

For example if you have a binary executable you can chmod 4555 it to set the special setuid bit. This means that when the file is executed the program will be elevated to run as the user who owns the file on the file system not the calling user. It should be noted that this only works with binary files and will not work with scripts.

Within docker there is a option to run a container with a reduced set of capabilities designed to prevent the container for accessing specific functions. This is combined with the secomp filter to “hide” these syscalls from the container and prevent them from being executed. When running outside a container if a user lacked the setuid capability they would receive a “permission denied” error when trying to run the setuid C call. Secomp complicates this as a filters the C call and does not deny access essentially making the C call a noop. These two systems working together cause some strange behaviour which may not lead to the security you expect.

Side note: The bash shell has a built in security feature that prevents it being executed as a setuid program. If bash detects this it will throw away its privileges. This is not very helpful when trying to debug a privilege bug. You can override this behaviour using the -p flag which is what I have done in each C program

Linux uses three septate values for a UserID.

EUID - “Effective” UserID = Used by the OS to perform permissions checks
RUID - “Real” UserID = Shows the “real” user of the application. For example when running a setuid binary shows the user which executed the binary
SUID - “Saved” UserID - Used to place a identity to one side so it can be loaded and used again in the future. Useful for temporary dropping privileges and then using the saved ID to go back to root.

The setuid() C call works to set there permission bits. Man entry:

setuid() sets the effective user ID of the calling process. If the calling process is privileged (more precisely: if the process has the CAP_SE‐TUID capability in its user namespace), the real UID and saved set-user-ID are also set.

Consider the 2C programs below:

suid_binary_only.c

A simple program to execute bash but will work with the setuid bit set

#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <sys/capability.h>
#include <sys/prctl.h>
#include <sys/types.h>
#include <unistd.h>


int main(void) {
      execl("/bin/bash", "bash", "-p",  0);
}

suid_binary_with_c_code.c

Same as above but includes the setuid and setgid C syscalls

#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <sys/capability.h>
#include <sys/prctl.h>
#include <sys/types.h>
#include <unistd.h>


int main(void) {
      if (setgid(0) != 0) {
        printf("Error in set gid %s\n", strerror(errno));
        return(1);
      }
      if (setuid(0) != 0) {
        printf("Error in set uid %s\n", strerror(errno));
        return(1);
      }
      execl("/bin/bash", "bash", "-p",  0);
}

Docker file

FROM registry.access.redhat.com/ubi7
COPY suid_binary_only /sbin/suid_binary_only
COPY suid_binary_with_c_code /sbin/suid_binary_with_c_code
RUN chmod 4555 /sbin/suid_binary_only
RUN chmod 4555 /sbin/suid_binary_with_c_code

Using Redhat UBI as a base places both the C programs into /sbin and sets the setuid permission bit on the file. When each of these programs run they should run as root.

Normal Docker run Command

In this example we run the container as a low privilege user (1000) and test each of the binaries.

docker run -u 1000:1000 --cap-add=SYS_PTRACE -it cap-test:latest
bash-4.2$ /sbin/suid_binary_only
bash-4.2# ps -f -p $$ -o euid,ruid,suid
 EUID  RUID  SUID
    0  1000     0
bash-4.2# head -1 /etc/shadow
root:locked::0:99999:7:::
bash-4.2# 
bash-4.2# exit
exit
bash-4.2$ /sbin/suid_binary_with_c_code 
bash-4.2#  ps -f -p $$ -o euid,ruid,suid
 EUID  RUID  SUID
    0     0     0
bash-4.2# head -1 /etc/shadow
root:locked::0:99999:7:::
bash-4.2#

In this example the setuid binary still allows for root access, but the added C code allows for the “Real” UID to also be set.

Docker run dropping both setuid and setgid

docker run -u 1000:1000 --cap-add=SYS_PTRACE --cap-drop=setuid --cap-drop=setgid -it cap-test:latest
bash-4.2$ /sbin/suid_binary_only
bash-4.2# ps -f -p $$ -o euid,ruid,suid
 EUID  RUID  SUID
    0  1000     0
bash-4.2# head -1 /etc/shadow
root:locked::0:99999:7:::
bash-4.2# exit
exit
bash-4.2$ /sbin/suid_binary_with_c_code
Error in set gid Operation not permitted
bash-4.2$

Interestingly in this case even with the setuid and setgid permissions revoked for this container the setuid bit on the binary bypasses these controls! However when using the C code setgid is blocked.

Docker run dropping setuid only

docker run -u 1000:1000 --cap-add=SYS_PTRACE --cap-drop=setuid -it cap-test:latest
bash-4.2$ /sbin/suid_binary_only
bash-4.2#  ps -f -p $$ -o euid,ruid,suid
 EUID  RUID  SUID
    0  1000     0
bash-4.2#  head -1 /etc/shadow
root:locked::0:99999:7:::
bash-4.2# exit
exit
bash-4.2$ /sbin/suid_binary_with_c_code
bash-4.2# ps -f -p $$ -o euid,ruid,suid
 EUID  RUID  SUID
    0  1000     0
bash-4.2# head -1 /etc/shadow
root:locked::0:99999:7:::
bash-4.2# exit
exit
bash-4.2$ exit
exit

What is strange here is even tho setuid is not allowed in this container the C syscall returns without an error! “Bypassing” in air quotes the setuid capability, meaning that you can still get root access even if the setuid capability is removed!

TL;DR

The setuid & setgid capabilities can be bypassed in a docker container through the use of the setuid bit using file permissions. To Prevent this you should use the --security-opt="no-new-privileges" flag when running a container.