Landlock: unprivileged access control

Author

Mickaël Salaün

Date

November 2020

The goal of Landlock is to enable to restrict ambient rights (e.g. global filesystem access) for a set of processes. Because Landlock is a stackable LSM, it makes possible to create safe security sandboxes as new security layers in addition to the existing system-wide access-controls. This kind of sandbox is expected to help mitigate the security impact of bugs or unexpected/malicious behaviors in user space applications. Landlock empowers any process, including unprivileged ones, to securely restrict themselves.

Landlock rules

A Landlock rule describes an action on an object. An object is currently a file hierarchy, and the related filesystem actions are defined in Access rights. A set of rules is aggregated in a ruleset, which can then restrict the thread enforcing it, and its future children.

Defining and enforcing a security policy

We first need to create the ruleset that will contain our rules. For this example, the ruleset will contain rules which only allow read actions, but write actions will be denied. The ruleset then needs to handle both of these kind of actions.

int ruleset_fd;
struct landlock_ruleset_attr ruleset_attr = {
    .handled_access_fs =
        LANDLOCK_ACCESS_FS_EXECUTE |
        LANDLOCK_ACCESS_FS_WRITE_FILE |
        LANDLOCK_ACCESS_FS_READ_FILE |
        LANDLOCK_ACCESS_FS_READ_DIR |
        LANDLOCK_ACCESS_FS_REMOVE_DIR |
        LANDLOCK_ACCESS_FS_REMOVE_FILE |
        LANDLOCK_ACCESS_FS_MAKE_CHAR |
        LANDLOCK_ACCESS_FS_MAKE_DIR |
        LANDLOCK_ACCESS_FS_MAKE_REG |
        LANDLOCK_ACCESS_FS_MAKE_SOCK |
        LANDLOCK_ACCESS_FS_MAKE_FIFO |
        LANDLOCK_ACCESS_FS_MAKE_BLOCK |
        LANDLOCK_ACCESS_FS_MAKE_SYM,
};

ruleset_fd = landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0);
if (ruleset_fd < 0) {
    perror("Failed to create a ruleset");
    return 1;
}

We can now add a new rule to this ruleset thanks to the returned file descriptor referring to this ruleset. The rule will only allow reading the file hierarchy /usr. Without another rule, write actions would then be denied by the ruleset. To add /usr to the ruleset, we open it with the O_PATH flag and fill the &struct landlock_path_beneath_attr with this file descriptor.

int err;
struct landlock_path_beneath_attr path_beneath = {
    .allowed_access =
        LANDLOCK_ACCESS_FS_EXECUTE |
        LANDLOCK_ACCESS_FS_READ_FILE |
        LANDLOCK_ACCESS_FS_READ_DIR,
};

path_beneath.parent_fd = open("/usr", O_PATH | O_CLOEXEC);
if (path_beneath.parent_fd < 0) {
    perror("Failed to open file");
    close(ruleset_fd);
    return 1;
}
err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
                        &path_beneath, 0);
close(path_beneath.parent_fd);
if (err) {
    perror("Failed to update ruleset");
    close(ruleset_fd);
    return 1;
}

We now have a ruleset with one rule allowing read access to /usr while denying all other handled accesses for the filesystem. The next step is to restrict the current thread from gaining more privileges (e.g. thanks to a SUID binary).

if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
    perror("Failed to restrict privileges");
    close(ruleset_fd);
    return 1;
}

The current thread is now ready to sandbox itself with the ruleset.

if (landlock_enforce_ruleset_current(ruleset_fd, 0)) {
    perror("Failed to enforce ruleset");
    close(ruleset_fd);
    return 1;
}
close(ruleset_fd);

If the landlock_enforce_ruleset_current system call succeeds, the current thread is now restricted and this policy will be enforced on all its subsequently created children as well. Once a thread is landlocked, there is no way to remove its security policy; only adding more restrictions is allowed. These threads are now in a new Landlock domain, merge of their parent one (if any) with the new ruleset.

Full working code can be found in samples/landlock/sandboxer.c.

Inheritance

Every new thread resulting from a clone(2) inherits Landlock domain restrictions from its parent. This is similar to the seccomp inheritance (cf. Seccomp BPF (SECure COMPuting with filters)) or any other LSM dealing with task’s credentials(7). For instance, one process’s thread may apply Landlock rules to itself, but they will not be automatically applied to other sibling threads (unlike POSIX thread credential changes, cf. nptl(7)).

When a thread sandboxes itself, we have the guarantee that the related security policy will stay enforced on all this thread’s descendants. This allows creating standalone and modular security policies per application, which will automatically be composed between themselves according to their runtime parent policies.

Ptrace restrictions

A sandboxed process has less privileges than a non-sandboxed process and must then be subject to additional restrictions when manipulating another process. To be allowed to use ptrace(2) and related syscalls on a target process, a sandboxed process should have a subset of the target process rules, which means the tracee must be in a sub-domain of the tracer.

Kernel interface

Access rights

A set of actions on kernel objects may be defined by an attribute (e.g. struct landlock_path_beneath_attr) including a bitmask of access.

Filesystem flags

These flags enable to restrict a sandboxed process to a set of actions on files and directories. Files or directories opened before the sandboxing are not subject to these restrictions.

A file can only receive these access rights:

  • LANDLOCK_ACCESS_FS_EXECUTE: Execute a file.

  • LANDLOCK_ACCESS_FS_WRITE_FILE: Open a file with write access.

  • LANDLOCK_ACCESS_FS_READ_FILE: Open a file with read access.

A directory can receive access rights related to files or directories. The following access right is applied to the directory itself, and the directories beneath it:

  • LANDLOCK_ACCESS_FS_READ_DIR: Open a directory or list its content.

However, the following access rights only apply to the content of a directory, not the directory itself:

  • LANDLOCK_ACCESS_FS_REMOVE_DIR: Remove an empty directory or rename one.

  • LANDLOCK_ACCESS_FS_REMOVE_FILE: Unlink (or rename) a file.

  • LANDLOCK_ACCESS_FS_MAKE_CHAR: Create (or rename or link) a character device.

  • LANDLOCK_ACCESS_FS_MAKE_DIR: Create (or rename) a directory.

  • LANDLOCK_ACCESS_FS_MAKE_REG: Create (or rename or link) a regular file.

  • LANDLOCK_ACCESS_FS_MAKE_SOCK: Create (or rename or link) a UNIX domain socket.

  • LANDLOCK_ACCESS_FS_MAKE_FIFO: Create (or rename or link) a named pipe.

  • LANDLOCK_ACCESS_FS_MAKE_BLOCK: Create (or rename or link) a block device.

  • LANDLOCK_ACCESS_FS_MAKE_SYM: Create (or rename or link) a symbolic link.

Warning

It is currently not possible to restrict some file-related actions accessible through these syscall families: chdir(2), truncate(2), stat(2), flock(2), chmod(2), chown(2), setxattr(2), ioctl(2), fcntl(2). Future Landlock evolutions will enable to restrict them.

Creating a new ruleset

long sys_landlock_create_ruleset(const struct landlock_ruleset_attr __user *const attr, const size_t size, const __u32 flags)

Create a new ruleset

Parameters

const struct landlock_ruleset_attr __user *const attr

Pointer to a struct landlock_ruleset_attr identifying the scope of the new ruleset.

const size_t size

Size of the pointed struct landlock_ruleset_attr (needed for backward and forward compatibility).

const __u32 flags

Must be 0.

Description

This system call enables to create a new Landlock ruleset, and returns the related file descriptor on success.

Possible returned errors are:

  • EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot time;

  • EINVAL: flags is not 0, or unknown access, or too small size;

  • E2BIG or EFAULT: attr or size inconsistencies;

  • ENOMSG: empty landlock_ruleset_attr.handled_access_fs.

struct landlock_ruleset_attr

Ruleset definition

Definition

struct landlock_ruleset_attr {
  __u64 handled_access_fs;
};

Members

handled_access_fs

Bitmask of actions (cf. Filesystem flags) that is handled by this ruleset and should then be forbidden if no rule explicitly allow them. This is needed for backward compatibility reasons.

Description

Argument of sys_landlock_create_ruleset(). This structure can grow in future versions.

Extending a ruleset

long sys_landlock_add_rule(const int ruleset_fd, const enum landlock_rule_type rule_type, const void __user *const rule_attr, const __u32 flags)

Add a new rule to a ruleset

Parameters

const int ruleset_fd

File descriptor tied to the ruleset which should be extended with the new rule.

const enum landlock_rule_type rule_type

Identify the structure type pointed to by rule_attr (only LANDLOCK_RULE_PATH_BENEATH for now).

const void __user *const rule_attr

Pointer to a rule (only of type struct landlock_path_beneath_attr for now).

const __u32 flags

Must be 0.

Description

This system call enables to define a new rule and add it to an existing ruleset.

Possible returned errors are:

  • EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot time;

  • EINVAL: flags is not 0, or inconsistent access in the rule (i.e. landlock_path_beneath_attr.allowed_access is not a subset of the rule’s accesses);

  • EBADF: ruleset_fd is not a file descriptor for the current thread;

  • EBADFD: ruleset_fd is not a ruleset file descriptor;

  • EPERM: ruleset_fd has no write access to the underlying ruleset;

  • EFAULT: rule_attr inconsistency.

enum landlock_rule_type

Landlock rule type

Constants

LANDLOCK_RULE_PATH_BENEATH

Type of a struct landlock_path_beneath_attr .

Description

Argument of sys_landlock_add_rule().

struct landlock_path_beneath_attr

Path hierarchy definition

Definition

struct landlock_path_beneath_attr {
  __u64 allowed_access;
  __s32 parent_fd;
};

Members

allowed_access

Bitmask of allowed actions for this file hierarchy (cf. Filesystem flags).

parent_fd

File descriptor, open with O_PATH, which identifies the parent directory of a file hierarchy, or just a file.

Description

Argument of sys_landlock_add_rule().

Enforcing a ruleset

long sys_landlock_enforce_ruleset_current(const int ruleset_fd, const __u32 flags)

Enforce a ruleset on the current task

Parameters

const int ruleset_fd

File descriptor tied to the ruleset to merge with the target.

const __u32 flags

Must be 0.

Description

This system call enables to enforce a Landlock ruleset on the current thread. Enforcing a ruleset requires that the task has CAP_SYS_ADMIN in its namespace or is running with no_new_privs. This avoids scenarios where unprivileged tasks can affect the behavior of privileged children.

Possible returned errors are:

  • EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot time;

  • EINVAL: flags is not 0.

  • EBADF: ruleset_fd is not a file descriptor for the current thread;

  • EBADFD: ruleset_fd is not a ruleset file descriptor;

  • EPERM: ruleset_fd has no read access to the underlying ruleset, or the current thread is not running with no_new_privs, or it doesn’t have CAP_SYS_ADMIN in its namespace.

Current limitations

File renaming and linking

Because Landlock targets unprivileged access controls, it is needed to properly handle composition of rules. Such property also implies rules nesting. Properly handling multiple layers of ruleset, each one of them able to restrict access to files, also implies to inherit the ruleset restrictions from a parent to its hierarchy. Because files are identified and restricted by their hierarchy, moving or linking a file from one directory to another imply to propagate the hierarchy constraints. To protect against privilege escalations through renaming or linking, and for the sack of simplicity, Landlock currently limits linking and renaming to the same directory. Future Landlock evolutions will enable more flexibility for renaming and linking, with dedicated ruleset flags.

OverlayFS

An OverlayFS mount point consists of upper and lower layers. It is currently not possible to reliably infer which underlying file hierarchy matches an OverlayFS path composed of such layers. It is then not currently possible to track the source of an indirect access request, and then not possible to properly identify and allow an unified OverlayFS hierarchy. Restricting files in an OverlayFS mount point works, but files allowed in one layer may not be allowed in a related OverlayFS mount point. A future Landlock evolution will make possible to properly work with OverlayFS, according to a dedicated ruleset flag.

Special filesystems

Access to regular files and directories can be restricted by Landlock, according to the handled accesses of a ruleset. However, files which do not come from a user-visible filesystem (e.g. pipe, socket), but can still be accessed through /proc/self/fd/, cannot currently be restricted. Likewise, some special kernel filesystems such as nsfs which can be accessed through /proc/self/ns/, cannot currently be restricted. For now, these kind of special paths are then always allowed. Future Landlock evolutions will enable to restrict such paths, with dedicated ruleset flags.

Questions and answers

What about user space sandbox managers?

Using user space process to enforce restrictions on kernel resources can lead to race conditions or inconsistent evaluations (i.e. Incorrect mirroring of the OS code and state).

What about namespaces and containers?

Namespaces can help create sandboxes but they are not designed for access-control and then miss useful features for such use case (e.g. no fine-grained restrictions). Moreover, their complexity can lead to security issues, especially when untrusted processes can manipulate them (cf. Controlling access to user namespaces).