1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
linux/kernel
Yan Zhai 5644c6b50f bpf: skip non exist keys in generic_map_lookup_batch
The generic_map_lookup_batch currently returns EINTR if it fails with
ENOENT and retries several times on bpf_map_copy_value. The next batch
would start from the same location, presuming it's a transient issue.
This is incorrect if a map can actually have "holes", i.e.
"get_next_key" can return a key that does not point to a valid value. At
least the array of maps type may contain such holes legitly. Right now
these holes show up, generic batch lookup cannot proceed any more. It
will always fail with EINTR errors.

Rather, do not retry in generic_map_lookup_batch. If it finds a non
existing element, skip to the next key. This simple solution comes with
a price that transient errors may not be recovered, and the iteration
might cycle back to the first key under parallel deletion. For example,
Hou Tao <houtao@huaweicloud.com> pointed out a following scenario:

For LPM trie map:
(1) ->map_get_next_key(map, prev_key, key) returns a valid key

(2) bpf_map_copy_value() return -ENOMENT
It means the key must be deleted concurrently.

(3) goto next_key
It swaps the prev_key and key

(4) ->map_get_next_key(map, prev_key, key) again
prev_key points to a non-existing key, for LPM trie it will treat just
like prev_key=NULL case, the returned key will be duplicated.

With the retry logic, the iteration can continue to the key next to the
deleted one. But if we directly skip to the next key, the iteration loop
would restart from the first key for the lpm_trie type.

However, not all races may be recovered. For example, if current key is
deleted after instead of before bpf_map_copy_value, or if the prev_key
also gets deleted, then the loop will still restart from the first key
for lpm_tire anyway. For generic lookup it might be better to stay
simple, i.e. just skip to the next key. To guarantee that the output
keys are not duplicated, it is better to implement map type specific
batch operations, which can properly lock the trie and synchronize with
concurrent mutators.

Fixes: cb4d03ab49 ("bpf: Add generic support for lookup batch op")
Closes: https://lore.kernel.org/bpf/Z6JXtA1M5jAZx8xD@debian.debian/
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/85618439eea75930630685c467ccefeac0942e2b.1739171594.git.yan@cloudflare.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-18 17:27:37 -08:00
..
bpf bpf: skip non exist keys in generic_map_lookup_batch 2025-02-18 17:27:37 -08:00
cgroup drm next for 6.14-rc1 2025-01-21 16:09:47 -08:00
configs configs/debug: make sure PROVE_RCU_LIST=y takes effect 2024-10-28 10:21:09 -07:00
debug kdb: Remove unused flags stack 2025-01-25 08:22:26 +00:00
dma dma-debug: fix physical address calculation for struct dma_debug_entry 2024-11-28 10:19:16 +01:00
entry sched: Add TIF_NEED_RESCHED_LAZY infrastructure 2024-11-05 12:55:37 +01:00
events Performance events changes for v6.14: 2025-01-21 10:52:03 -08:00
futex Mainly individually changelogged singleton patches. The patch series in 2025-01-26 17:50:53 -08:00
gcov gcov: clang: use correct function param names 2025-01-24 22:47:27 -08:00
irq Updates for the interrupt subsystem: 2025-01-21 13:51:07 -08:00
kcsan kcsan: Remove redundant call of kallsyms_lookup_name() 2024-10-14 16:44:56 +02:00
livepatch livepatch: Add stack_order sysfs attribute 2024-12-09 11:44:03 +01:00
locking RCU pull request for v6.14 2025-01-21 14:39:21 -08:00
module Driver core and debugfs updates 2025-01-28 12:25:12 -08:00
power The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
printk Merge branch 'for-6.14-cpu_sync-fixup' into for-linus 2025-01-20 13:40:52 +01:00
rcu The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
sched The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
time Updates for timers and timekeeping: 2025-01-21 13:16:00 -08:00
trace rv: tools/rtla: Updates for 6.14 2025-01-26 14:25:58 -08:00
.gitignore
acct.c acct: avoid pointless reference count bump 2024-12-02 11:25:13 +01:00
async.c async: Use a dedicated unbound workqueue with raised min_active 2024-02-09 11:13:59 -10:00
audit.c lsm: replace context+len with lsm_context 2024-12-04 14:42:31 -05:00
audit.h audit: change context data from secid to lsm_prop 2024-10-11 14:34:16 -04:00
audit_fsnotify.c
audit_tree.c fsnotify: create a wrapper fsnotify_find_inode_mark() 2024-04-04 16:24:16 +02:00
audit_watch.c fsnotify: create a wrapper fsnotify_find_inode_mark() 2024-04-04 16:24:16 +02:00
auditfilter.c audit: fix suffixed '/' filename matching 2024-12-05 19:22:38 -05:00
auditsc.c lsm/stable-6.14 PR 20250121 2025-01-21 20:03:04 -08:00
backtracetest.c backtracetest: add MODULE_DESCRIPTION() 2024-06-24 22:24:55 -07:00
bounds.c bounds: Use the right number of bits for power-of-two CONFIG_NR_CPUS 2024-04-29 08:29:29 -07:00
capability.c kernel: remove get_task_comm() and print task comm directly 2025-01-12 20:21:15 -08:00
cfi.c
compat.c
configs.c
context_tracking.c context_tracking, rcu: Rename rcu_dyntick trace event into rcu_watching 2024-08-15 21:30:43 +05:30
cpu.c The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
cpu_pm.c
crash_core.c kexec/crash: no crash update when kexec in progress 2024-11-05 17:12:27 -08:00
crash_reserve.c crash: fix crash memory reserve exceed system memory bug 2024-09-01 20:43:30 -07:00
cred.c cred: remove old {override,revert}_creds() helpers 2024-12-02 11:25:09 +01:00
delayacct.c delayacct: add delay min to record delay peak 2025-01-12 20:21:16 -08:00
dma.c
elfcorehdr.c crash: remove dependency of FA_DUMP on CRASH_DUMP 2024-02-23 17:48:22 -08:00
exec_domain.c
exit.c remove pointless includes of <linux/fdtable.h> 2024-10-07 13:34:41 -04:00
exit.h
extable.c
fail_function.c
fork.c Mainly individually changelogged singleton patches. The patch series in 2025-01-26 17:50:53 -08:00
freezer.c sched/fair: Fix external p->on_rq users 2024-10-14 09:14:35 +02:00
gen_kheaders.sh kheaders: Ignore silly-rename files 2024-12-20 22:07:55 +01:00
groups.c
hung_task.c hung_task: add task->flags, blocked by coredump to log 2025-01-24 22:47:24 -08:00
iomem.c
irq_work.c kasan: make kasan_record_aux_stack_noalloc() the default behaviour 2025-01-13 22:40:36 -08:00
jump_label.c jump_label: Fix static_key_slow_dec() yet again 2024-09-10 11:57:27 +02:00
kallsyms.c kallsyms: Match symbols exactly with CONFIG_LTO_CLANG 2024-08-15 09:33:35 -07:00
kallsyms_internal.h kallsyms: get rid of code for absolute kallsyms 2024-07-20 16:33:21 +09:00
kallsyms_selftest.c kallsyms: Use kthread_run_on_cpu() 2025-01-02 22:12:12 +01:00
kallsyms_selftest.h
kcmp.c get rid of ...lookup...fdget_rcu() family 2024-10-07 13:34:41 -04:00
Kconfig.freezer
Kconfig.hz
Kconfig.kexec crash, powerpc: default to CRASH_DUMP=n on PPC_BOOK3S_32 2024-11-14 22:43:48 -08:00
Kconfig.locks
Kconfig.preempt sched: No PREEMPT_RT=y for all{yes,mod}config 2024-11-07 15:25:05 +01:00
kcov.c kcov: mark in_softirq_really() as __always_inline 2024-12-30 17:59:08 -08:00
kexec.c crash: add a new kexec flag for hotplug support 2024-04-23 14:59:01 +10:00
kexec_core.c kexec_core: Add and update comments regarding the KEXEC_JUMP flow 2025-01-14 13:03:34 +01:00
kexec_elf.c
kexec_file.c kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y 2024-09-01 17:59:01 -07:00
kexec_internal.h kexec: use atomic_try_cmpxchg_acquire() in kexec_trylock() 2024-09-01 20:43:23 -07:00
kheaders.c kheaders: Simplify attribute through __BIN_ATTR_SIMPLE_RO() 2024-12-24 09:46:49 +01:00
kprobes.c kprobes: Remove remaining gotos 2025-01-10 09:00:13 +09:00
ksyms_common.c
ksysfs.c kernel/ksysfs.c: simplify bin_attribute definition 2025-01-07 16:59:15 +01:00
kthread.c Mainly individually changelogged singleton patches. The patch series in 2025-01-26 17:50:53 -08:00
latencytop.c latencytop: use correct kernel-doc format for func params 2025-01-24 22:47:27 -08:00
Makefile mm: move kernel/numa.c to mm/ 2024-09-03 21:15:26 -07:00
module_signature.c
notifier.c reboot: move reboot_notifier_list to kernel/reboot.c 2024-11-05 17:12:31 -08:00
nsproxy.c fdget(), trivial conversions 2024-11-03 01:28:06 -05:00
padata.c padata: avoid UAF for reorder_work 2025-01-19 12:44:28 +08:00
panic.c drm next for 6.12-rc1 2024-09-19 10:18:15 +02:00
params.c module: Constify 'struct module_attribute' 2025-01-26 13:05:23 +01:00
pid.c kernel-6.14-rc1.pid 2025-01-20 10:29:11 -08:00
pid_namespace.c pid: allow pid_max to be set per pid namespace 2024-12-02 11:25:25 +01:00
pid_sysctl.h sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
profile.c profiling: remove profile=sleep support 2024-08-04 13:36:28 -07:00
ptrace.c ptrace_attach: shift send(SIGSTOP) into ptrace_set_stopped() 2024-02-22 15:38:52 -08:00
range.c
reboot.c kernel/reboot: replace sprintf() with sysfs_emit() 2024-11-11 17:17:05 -08:00
regset.c regset: use kvzalloc() for regset_get_alloc() 2024-04-25 21:07:03 -07:00
relay.c [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
resource.c kernel/resource: simplify API __devm_release_region() implementation 2025-01-12 20:20:58 -08:00
resource_kunit.c resource, kunit: fix user-after-free in resource_test_region_intersects() 2024-10-09 12:47:19 -07:00
rseq.c rseq: Fix rseq unregistration regression 2025-01-21 08:10:51 +01:00
scftorture.c scftorture: Handle NULL argument passed to scf_add_to_free_list(). 2024-11-14 16:09:51 -08:00
scs.c
seccomp.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
signal.c signal/posixtimers: Handle ignore/blocked sequences correctly 2025-01-15 18:08:01 +01:00
smp.c CSD-lock pull request for v6.14 2025-01-28 11:34:03 -08:00
smpboot.c
smpboot.h
softirq.c softirq: Allow raising SCHED_SOFTIRQ from SMP-call-function on RT kernel 2024-12-02 12:01:27 +01:00
stackleak.c stackleak: Use str_enabled_disabled() helper in stack_erasing_sysctl() 2024-12-22 20:28:11 -08:00
stacktrace.c stacktrace: fix kernel-doc typo 2023-12-29 12:22:29 -08:00
static_call.c
static_call_inline.c x86/static-call: provide a way to do very early static-call updates 2024-12-13 09:28:32 +01:00
stop_machine.c stop_machine: Fix rcu_momentary_eqs() call in multi_cpu_stop() 2024-12-11 20:50:47 -08:00
sys.c tracing: Add task_prctl_unknown tracepoint 2024-12-22 20:28:11 -08:00
sys_ni.c Probes updates for v6.11: 2024-07-18 12:19:20 -07:00
sysctl-test.c sysctl: Add module description to sysctl-testing 2024-06-03 15:20:37 +02:00
sysctl.c pid: allow pid_max to be set per pid namespace 2024-12-02 11:25:25 +01:00
task_work.c kasan: make kasan_record_aux_stack_noalloc() the default behaviour 2025-01-13 22:40:36 -08:00
taskstats.c fdget(), more trivial conversions 2024-11-03 01:28:06 -05:00
torture.c torture: Add MODULE_DESCRIPTION() 2024-05-30 15:31:38 -07:00
tracepoint.c tracing: Fix syscall tracepoint use-after-free 2024-11-01 14:37:31 -04:00
tsacct.c tsacct: replace strncpy() with strscpy() 2024-07-12 16:39:53 -07:00
ucount.c ucounts: move kfree() out of critical zone protected by ucounts_lock 2025-01-12 20:21:00 -08:00
uid16.c
uid16.h
umh.c remove pointless includes of <linux/fdtable.h> 2024-10-07 13:34:41 -04:00
up.c
user-return-notifier.c
user.c uidgid: make sure we fit into one cacheline 2024-09-12 12:16:09 +02:00
user_namespace.c user_namespace: use kmemdup_array() instead of kmemdup() for multiple allocation 2024-09-09 16:47:42 -07:00
usermode_driver.c
utsname.c
utsname_sysctl.c sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
vhost_task.c vhost_task: Handle SIGKILL by flushing work and exiting 2024-05-22 08:31:15 -04:00
vmcore_info.c mm: support only one page_type per page 2024-09-03 21:15:43 -07:00
watch_queue.c watch_queue: Use page->private instead of page->index 2024-12-22 11:29:51 +01:00
watchdog.c watchdog: output this_cpu when printing hard LOCKUP 2025-01-12 20:21:05 -08:00
watchdog_buddy.c
watchdog_perf.c watchdog/perf: properly initialize the turbo mode timestamp and rearm counter 2024-07-17 21:11:34 -07:00
workqueue.c The various patchsets are summarized below. Plus of course many 2025-01-26 18:36:23 -08:00
workqueue_internal.h