Re: [PATCH bpf-next v9 2/4] bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs

From: Kumar Kartikeya Dwivedi
Date: Mon Nov 21 2022 - 11:07:17 EST


On Mon, Nov 21, 2022 at 09:01:16PM IST, David Vernet wrote:
> On Mon, Nov 21, 2022 at 01:15:48AM +0530, Kumar Kartikeya Dwivedi wrote:
> > On Sun, Nov 20, 2022 at 10:40:02AM IST, David Vernet wrote:
> > > Kfuncs currently support specifying the KF_TRUSTED_ARGS flag to signal
> > > to the verifier that it should enforce that a BPF program passes it a
> > > "safe", trusted pointer. Currently, "safe" means that the pointer is
> > > either PTR_TO_CTX, or is refcounted. There may be cases, however, where
> > > the kernel passes a BPF program a safe / trusted pointer to an object
> > > that the BPF program wishes to use as a kptr, but because the object
> > > does not yet have a ref_obj_id from the perspective of the verifier, the
> > > program would be unable to pass it to a KF_ACQUIRE | KF_TRUSTED_ARGS
> > > kfunc.
> > >
> > > The solution is to expand the set of pointers that are considered
> > > trusted according to KF_TRUSTED_ARGS, so that programs can invoke kfuncs
> > > with these pointers without getting rejected by the verifier.
> > >
> > > There is already a PTR_UNTRUSTED flag that is set in some scenarios,
> > > such as when a BPF program reads a kptr directly from a map
> > > without performing a bpf_kptr_xchg() call. These pointers of course can
> > > and should be rejected by the verifier. Unfortunately, however,
> > > PTR_UNTRUSTED does not cover all the cases for safety that need to
> > > be addressed to adequately protect kfuncs. Specifically, pointers
> > > obtained by a BPF program "walking" a struct are _not_ considered
> > > PTR_UNTRUSTED according to BPF. For example, say that we were to add a
> > > kfunc called bpf_task_acquire(), with KF_ACQUIRE | KF_TRUSTED_ARGS, to
> > > acquire a struct task_struct *. If we only used PTR_UNTRUSTED to signal
> > > that a task was unsafe to pass to a kfunc, the verifier would mistakenly
> > > allow the following unsafe BPF program to be loaded:
> > >
> > > SEC("tp_btf/task_newtask")
> > > int BPF_PROG(unsafe_acquire_task,
> > > struct task_struct *task,
> > > u64 clone_flags)
> > > {
> > > struct task_struct *acquired, *nested;
> > >
> > > nested = task->last_wakee;
> > >
> > > /* Would not be rejected by the verifier. */
> > > acquired = bpf_task_acquire(nested);
> > > if (!acquired)
> > > return 0;
> > >
> > > bpf_task_release(acquired);
> > > return 0;
> > > }
> > >
> > > To address this, this patch defines a new type flag called PTR_TRUSTED
> > > which tracks whether a PTR_TO_BTF_ID pointer is safe to pass to a
> > > KF_TRUSTED_ARGS kfunc or a BPF helper function. PTR_TRUSTED pointers are
> > > passed directly from the kernel as a tracepoint or struct_ops callback
> > > argument. Any nested pointer that is obtained from walking a PTR_TRUSTED
> > > pointer is no longer PTR_TRUSTED. From the example above, the struct
> > > task_struct *task argument is PTR_TRUSTED, but the 'nested' pointer
> > > obtained from 'task->last_wakee' is not PTR_TRUSTED.
> > >
> > > A subsequent patch will add kfuncs for storing a task kfunc as a kptr,
> > > and then another patch will add selftests to validate.
> > >
> > > Signed-off-by: David Vernet <void@xxxxxxxxxxxxx>
> > > ---
> >
> > Sorry that I couldn't look at it earlier.
> >
> > > [...]
> > > @@ -5884,8 +5889,18 @@ static const struct bpf_reg_types scalar_types = { .types = { SCALAR_VALUE } };
> > > static const struct bpf_reg_types context_types = { .types = { PTR_TO_CTX } };
> > > static const struct bpf_reg_types ringbuf_mem_types = { .types = { PTR_TO_MEM | MEM_RINGBUF } };
> > > static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } };
> > > -static const struct bpf_reg_types btf_ptr_types = { .types = { PTR_TO_BTF_ID } };
> > > -static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_BTF_ID | MEM_PERCPU } };
> > > +static const struct bpf_reg_types btf_ptr_types = {
> > > + .types = {
> > > + PTR_TO_BTF_ID,
> > > + PTR_TO_BTF_ID | PTR_TRUSTED,
> > > + },
> > > +};
> > > +static const struct bpf_reg_types percpu_btf_ptr_types = {
> > > + .types = {
> > > + PTR_TO_BTF_ID | MEM_PERCPU,
> > > + PTR_TO_BTF_ID | MEM_PERCPU | PTR_TRUSTED,
> >
> > Where is PTR_TRUSTED set for MEM_PERCPU?
>
> We set PTR_TRUSTED in btf_ctx_access() for all PTR_TO_BTF_ID regs for
> BPF_PROG_TYPE_TRACING and BPF_PROG_TYPE_STRUCT_OPS. See [0]. Let me know
> if I've misunderstood anything.
>
> [0]: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/kernel/bpf/btf.c#n5972
>

Ah, I see. Makes sense.

> > > + }
> > > +};
> > > static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
> > > static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } };
> > > static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } };
> > > @@ -5973,7 +5988,7 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
> > > return -EACCES;
> > >
> > > found:
> > > - if (reg->type == PTR_TO_BTF_ID) {
> > > + if (reg->type == PTR_TO_BTF_ID || reg->type & PTR_TRUSTED) {
> >
> > Now, earlier MEM_ALLOC was supposed to not enter this branch. If your patch
> > allows MEM_ALLOC | PTR_TRUSTED (but I don't think it does), it will enter this
> > branch. I think it is better to just be explicit and say PTR_TO_BTF_ID ||
> > PTR_TO_BTF_ID | PTR_TRUSTED.
>
> Currently I don't believe we set PTR_TRUSTED | MEM_ALLOC, so this won't
> happen. I originally had this code doing:
>
> if (reg->type == PTR_TO_BTF_ID || reg->type & BPF_REG_TRUSTED_MODIFIERS) {
>
> and it caused a bunch of the linked list tests to fail with:
>
> verifier internal error: R0 has non-overwritten BPF_PTR_POISON type
>

Yes, because that will make MEM_ALLOC enter this branch for
bpf_spin_lock/bpf_spin_unlock, which is what shouldn't be happening. The else if
(type_is_alloc) is precisely to handle MEM_ALLOC case.

> Checking just PTR_TRUSTED avoids this (which I assume is what you were
> worried about?). I'm happy to respin a patch that applies your
> suggestion to do || PTR_TO_BTF_ID | PTR_TRUSTED, but to be honest I
> don't think it buys us anything. That whole codepath where we take it
> only in the event of no modifiers is kind of sketchy. Consider, e.g.,
> that we're skipping this check if we don't take that path:

It should be taken for PTR_TO_BTF_ID | PTR_TRUSTED, but not those with
MEM_ALLOC.
>
> if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> btf_vmlinux, *arg_btf_id,
> strict_type_match)) {
> verbose(env, "R%d is of type %s but %s is expected\n",
> regno, kernel_type_name(reg->btf, reg->btf_id),
> kernel_type_name(btf_vmlinux, *arg_btf_id));
> return -EACCES;
> }
>

That's because we shouldn't take that path. MEM_ALLOC is for prog BTF
PTR_TO_BTF_ID, matching with btf_vmlinux types is incorrect.

You won't see errors now because that case of MEM_ALLOC | PTR_TRUSTED is not
happening.

> I know we check it elsewhere such as in map_kptr_match_type() and
> process_kf_arg_ptr_to_list_node(), but it feels pretty brittle to say:
> "Check it only if there are no modifiers set, else check it later in
> some helper-specific logic". I'd prefer to keep the check as broad as
> possible for now, and then refactor and clean this up. Lmk if you
> disagree.
>

I think this one needs to be fixed, both MEM_ALLOC and MEM_ALLOC | PTR_TRUSTED
should go to that else if branch. This should only be taken for PTR_TO_BTF_ID
and PTR_TO_BTF_ID | PTR_TRUSTED.

> >
> > > /* For bpf_sk_release, it needs to match against first member
> > > * 'struct sock_common', hence make an exception for it. This
> > > * allows bpf_sk_release to work for multiple socket types.
> > > @@ -6055,6 +6070,8 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
> > > */
> > > case PTR_TO_BTF_ID:
> > > case PTR_TO_BTF_ID | MEM_ALLOC:
> > > + case PTR_TO_BTF_ID | PTR_TRUSTED:
> > > + case PTR_TO_BTF_ID | MEM_ALLOC | PTR_TRUSTED:
> >
> > This and the one below:
> >
> > > @@ -8366,6 +8402,7 @@ static int check_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_
> > > ptr = reg->map_ptr;
> > > break;
> > > case PTR_TO_BTF_ID | MEM_ALLOC:
> > > + case PTR_TO_BTF_ID | MEM_ALLOC | PTR_TRUSTED:
> >
> > I think this will never be set, based on my reading of the code.
> > Is the case with MEM_ALLOC | PTR_TRUSTED ever possible?
> > And if this is needed here, why not update btf_struct_access?
> > And KF_ARG_PTR_TO_ALLOC_BTF_ID is not updated either?
> > Let me know if I missed something.
>
> These are all reasonable observations, but we went into them
> intentionally. Eventually the goal is to have PTR_TRUSTED be the single
> source of truth for whether a pointer is trusted or not. See [1] for the
> thread with the discussions.
>
> I agree that I don't believe that MEM_ALLOC | PTR_TRUSTED can be set
> together yet, but eventually they should and will be. Conceptually, the
> behavior of check_func_arg_reg_off() should be the same for
> PTR_TO_BTF_ID, PTR_TO_BTF_ID | MEM_ALLOC | PTR_TRUSTED, PTR_TO_BTF_ID |
> PTR_TRUSTED, etc, so IMO it's correct to add that case to
> check_func_arg_reg_off() even if it's not yet used. Not adding it
> because no callers currently happen to require it is IMO a bit brittle.
>

I don't have a problem with PTR_TRUSTED being the source of truth. I think it's
fine.

I was just pointing out that the checks are there in some places but not all,
even if there are no users, you should be accounting for MEM_ALLOC | PTR_TRUSTED
either everywhere or nowhere. It was a bit confusing to see it in
check_reg_allocation_locked right now but not in check_ptr_to_btf_access (e.g.
it would disallow writes for MEM_ALLOC | PTR_TRUSTED), or in kfunc handling.

But I guess you plan to address that in a follow up, so it's not a big deal.
It would be a great improvement over the status quo if we can make this work
properly, and then finally flip KF_TRUSTED_ARGS eventually to default on.

> [1]: https://lore.kernel.org/all/20221119164855.qvhgdpg5axa7kzey@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
>
> > > /* When referenced PTR_TO_BTF_ID is passed to release function,
> > > * it's fixed offset must be 0. In the other cases, fixed offset
> > > * can be non-zero.
> > > @@ -7939,6 +7956,25 @@ static bool is_kfunc_arg_kptr_get(struct bpf_kfunc_call_arg_meta *meta, int arg)
> > > return arg == 0 && (meta->kfunc_flags & KF_KPTR_GET);
> > > }
> > >
> > > +static bool is_trusted_reg(const struct bpf_reg_state *reg)
> > > +{
> > > + /* A referenced register is always trusted. */
> > > + if (reg->ref_obj_id)
> > > + return true;
> > > +
> > > + /* If a register is not referenced, it is trusted if it has either the
> > > + * MEM_ALLOC or PTR_TRUSTED type modifiers, and no others. Some of the
> > > + * other type modifiers may be safe, but we elect to take an opt-in
> > > + * approach here as some (e.g. PTR_UNTRUSTED and PTR_MAYBE_NULL) are
> > > + * not.
> > > + *
> > > + * Eventually, we should make PTR_TRUSTED the single source of truth
> > > + * for whether a register is trusted.
> > > + */
> > > + return type_flag(reg->type) & BPF_REG_TRUSTED_MODIFIERS &&
> > > + !bpf_type_has_unsafe_modifiers(reg->type);
> > > +}
> > > +
> > > static bool __kfunc_param_match_suffix(const struct btf *btf,
> > > const struct btf_param *arg,
> > > const char *suffix)
> > > @@ -8220,7 +8256,7 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
> > > const char *reg_ref_tname;
> > > u32 reg_ref_id;
> > >
> > > - if (reg->type == PTR_TO_BTF_ID) {
> > > + if (base_type(reg->type) == PTR_TO_BTF_ID) {
> > > reg_btf = reg->btf;
> > > reg_ref_id = reg->btf_id;
> > > } else {
> > > ptr = reg->btf;
> > > break;
> > > default:
> > > @@ -8596,8 +8633,9 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
> > > case KF_ARG_PTR_TO_BTF_ID:
> > > if (!is_kfunc_trusted_args(meta))
> > > break;
> > > - if (!reg->ref_obj_id) {
> > > - verbose(env, "R%d must be referenced\n", regno);
> > > +
> > > + if (!is_trusted_reg(reg)) {
> > > + verbose(env, "R%d must be referenced or trusted\n", regno);
> > > return -EINVAL;
> > > }
> > > fallthrough;
> > > @@ -8702,9 +8740,13 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
> > > break;
> > > case KF_ARG_PTR_TO_BTF_ID:
> > > /* Only base_type is checked, further checks are done here */
> > > - if (reg->type != PTR_TO_BTF_ID &&
> > > - (!reg2btf_ids[base_type(reg->type)] || type_flag(reg->type))) {
> > > - verbose(env, "arg#%d expected pointer to btf or socket\n", i);
> > > + if ((base_type(reg->type) != PTR_TO_BTF_ID ||
> > > + bpf_type_has_unsafe_modifiers(reg->type)) &&
> > > + !reg2btf_ids[base_type(reg->type)]) {
> > > + verbose(env, "arg#%d is %s ", i, reg_type_str(env, reg->type));
> > > + verbose(env, "expected %s or socket\n",
> > > + reg_type_str(env, base_type(reg->type) |
> > > + (type_flag(reg->type) & BPF_REG_TRUSTED_MODIFIERS)));
> > > return -EINVAL;
> > > }
> > > ret = process_kf_arg_ptr_to_btf_id(env, reg, ref_t, ref_tname, ref_id, meta, i);
> > > @@ -14713,6 +14755,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
> > > break;
> > > case PTR_TO_BTF_ID:
> > > case PTR_TO_BTF_ID | PTR_UNTRUSTED:
> > > + case PTR_TO_BTF_ID | PTR_TRUSTED:
> > > /* PTR_TO_BTF_ID | MEM_ALLOC always has a valid lifetime, unlike
> > > * PTR_TO_BTF_ID, and an active ref_obj_id, but the same cannot
> > > * be said once it is marked PTR_UNTRUSTED, hence we must handle
> > > @@ -14720,6 +14763,8 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
> > > * for this case.
> > > */
> > > case PTR_TO_BTF_ID | MEM_ALLOC | PTR_UNTRUSTED:
> > > + case PTR_TO_BTF_ID | PTR_UNTRUSTED | PTR_TRUSTED:
> >
> > I feel this is confusing. What do we mean with PTR_UNTRUSTED | PTR_TRUSTED?
>
> 100% agreed. There are plans to clean this up, see the link above.

Great, looking forward.