1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
linux/net/ipv6/netfilter
Florian Westphal 18685451fc inet: inet_defrag: prevent sk release while still in use
ip_local_out() and other functions can pass skb->sk as function argument.

If the skb is a fragment and reassembly happens before such function call
returns, the sk must not be released.

This affects skb fragments reassembled via netfilter or similar
modules, e.g. openvswitch or ct_act.c, when run as part of tx pipeline.

Eric Dumazet made an initial analysis of this bug.  Quoting Eric:
  Calling ip_defrag() in output path is also implying skb_orphan(),
  which is buggy because output path relies on sk not disappearing.

  A relevant old patch about the issue was :
  8282f27449 ("inet: frag: Always orphan skbs inside ip_defrag()")

  [..]

  net/ipv4/ip_output.c depends on skb->sk being set, and probably to an
  inet socket, not an arbitrary one.

  If we orphan the packet in ipvlan, then downstream things like FQ
  packet scheduler will not work properly.

  We need to change ip_defrag() to only use skb_orphan() when really
  needed, ie whenever frag_list is going to be used.

Eric suggested to stash sk in fragment queue and made an initial patch.
However there is a problem with this:

If skb is refragmented again right after, ip_do_fragment() will copy
head->sk to the new fragments, and sets up destructor to sock_wfree.
IOW, we have no choice but to fix up sk_wmem accouting to reflect the
fully reassembled skb, else wmem will underflow.

This change moves the orphan down into the core, to last possible moment.
As ip_defrag_offset is aliased with sk_buff->sk member, we must move the
offset into the FRAG_CB, else skb->sk gets clobbered.

This allows to delay the orphaning long enough to learn if the skb has
to be queued or if the skb is completing the reasm queue.

In the former case, things work as before, skb is orphaned.  This is
safe because skb gets queued/stolen and won't continue past reasm engine.

In the latter case, we will steal the skb->sk reference, reattach it to
the head skb, and fix up wmem accouting when inet_frag inflates truesize.

Fixes: 7026b1ddb6 ("netfilter: Pass socket pointer down through okfn().")
Diagnosed-by: Eric Dumazet <edumazet@google.com>
Reported-by: xingwei lee <xrivendell7@gmail.com>
Reported-by: yue sun <samsun1006219@gmail.com>
Reported-by: syzbot+e5167d7144a62715044c@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240326101845.30836-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-28 12:06:22 +01:00
..
ip6_tables.c xtables: move icmp/icmpv6 logic to xt_tcpudp 2023-03-22 21:48:59 +01:00
ip6t_ah.c netfilter: ip6tables: Remove redundant null checks 2020-07-29 20:39:43 +02:00
ip6t_eui64.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
ip6t_frag.c netfilter: ip6tables: Remove redundant null checks 2020-07-29 20:39:43 +02:00
ip6t_hbh.c netfilter: ip6tables: Remove redundant null checks 2020-07-29 20:39:43 +02:00
ip6t_ipv6header.c netfilter: move inline nf_ip6_ext_hdr() function to a more appropriate header. 2019-09-13 12:34:09 +02:00
ip6t_mh.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
ip6t_NPT.c netfilter: ip6t_NPT: rewrite addresses in ICMPv6 original packet 2020-08-28 19:18:48 +02:00
ip6t_REJECT.c netfilter: use actual socket sk for REJECT action 2020-12-01 14:33:55 +01:00
ip6t_rpfilter.c netfilter: ip6t_rpfilter: Fix regression with VRF interfaces 2023-02-22 00:22:20 +01:00
ip6t_rt.c netfilter: ip6t_rt: fix rt0_hdr parsing in rt_mt6 2021-10-14 23:08:35 +02:00
ip6t_srh.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
ip6t_SYNPROXY.c netfilter: Add MODULE_DESCRIPTION entries to kernel modules 2020-06-25 00:50:31 +02:00
ip6table_filter.c netfilter: ip6tables: allow use of ip6t_do_table as hookfn 2021-10-14 23:06:53 +02:00
ip6table_mangle.c netfilter: xt_mangle: only check verdict part of return value 2023-10-18 10:26:43 +02:00
ip6table_nat.c netfilter: add missing module descriptions 2023-11-08 13:52:32 +01:00
ip6table_raw.c netfilter: add missing module descriptions 2023-11-08 13:52:32 +01:00
ip6table_security.c netfilter: ip6tables: allow use of ip6t_do_table as hookfn 2021-10-14 23:06:53 +02:00
Kconfig netfilter: xtables: allow xtables-nft only builds 2024-01-29 15:43:21 +01:00
Makefile netfilter: xtables: allow xtables-nft only builds 2024-01-29 15:43:21 +01:00
nf_conntrack_reasm.c inet: inet_defrag: prevent sk release while still in use 2024-03-28 12:06:22 +01:00
nf_defrag_ipv6_hooks.c netfilter: add missing module descriptions 2023-11-08 13:52:32 +01:00
nf_dup_ipv6.c netfilter: drop bridge nf reset from nf_reset 2019-10-01 18:42:15 +02:00
nf_reject_ipv6.c ipv6: annotate data-races around cnf.hop_limit 2024-03-01 08:42:31 +00:00
nf_socket_ipv6.c tcp: Access &tcp_hashinfo via net. 2022-09-20 10:21:49 -07:00
nf_tproxy_ipv6.c netfilter: tproxy: fix deadlock due to missing BH disable 2023-03-06 12:09:48 +01:00
nft_dup_ipv6.c netfilter: nf_tables: Extend nft_expr_ops::dump callback parameters 2022-11-15 10:46:34 +01:00
nft_fib_ipv6.c netfilter: rpfilter/fib: Set ->flowic_uid correctly for user namespaces. 2022-10-19 08:46:48 +02:00
nft_reject_ipv6.c netfilter: nf_tables: do not reduce read-only expressions 2022-03-20 00:29:46 +01:00