1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

135935 commits

Author SHA1 Message Date
Greg Kroah-Hartman
7a19006b60 kernfs: remove unneeded #if 0 guard
Commit f2eb478f2f ("kernfs: move struct kernfs_root out of the public
view.") moved kernfs_root out of kernfs.h, but my debugging code of a
 #if 0 was left in accidentally.  Fix that up by removing the guards.

Fixes: f2eb478f2f ("kernfs: move struct kernfs_root out of the public view.")
Cc: Tejun Heo <tj@kernel.org>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/20220318073452.1486568-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-18 09:47:04 +01:00
Greg Kroah-Hartman
4a248f85b3 Merge 5.17-rc6 into driver-core-next
We need the driver core fix in here as well for future changes.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-28 07:45:41 +01:00
Linus Torvalds
98f3e84f8d dma-mapping fix for Linux 5.17
- fix a swiotlb info leak (Halil Pasic)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmIbvm8LHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYMcBg/8DD+QsKEEv+D/+bCSWZVbW1ekFDiDyEdO9xuSymxT
 pf56Dy693G0BoPZ1JFiN17LIOp3vsHQfwE4ZDo/OHmyDTcziepD40SZiHD3eqEDB
 cl7RXhAcoxnUMkgK+hIb6Q7t/4dXaC3+rVXUuL//xUjXZu7GML46tRzuDoVr9MZp
 MBSGdZlXIzfvpzNf+yVzpk7sZP/w4nrzyuzeE5T6fdipjN2o78PhWmZzqsHGujeI
 ptMuZQwwDuXXetQNiA/t0+DkeWCz/+ERI7k/6lAma14aKHjM1Jn8HcNqW0UGUuGV
 148V4wKFqhaOXnoWuodrgjo9AJkOVqh+YH4ui60MMOUvZ4MPh4+5muobybzndtNy
 3bAM3JbcNUH1/vyyazvwr22My/TapbydwPuXda63Ls/2WbnV66xbgFEm6S9GXC4X
 6+ynK5CzWaTF5INgpd6WjrEMPXGCi0hXM9yQmJvlI1I0muFv9mBXhW/wP7OE3Na8
 eQXuP/v+QqYaDVTsGTtySNPkwBstFa9GQUTghrnxKzk3giLVZfGPX5Bs4/+gl4gl
 rrGBNhUAnMOdvBSkLUf4hDgyGvsCiYtB1YX9QYxq2aRCbIT03164VnXB0lQRKnmz
 Yu547NO96tBnoIT9/gQYMTHvwPU1WjWi3jVbT2zpMKgG4DBUEEIH33P9S1vbgecY
 vLY=
 =bR6d
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-5.17-1' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fix from Christoph Hellwig:

 - fix a swiotlb info leak (Halil Pasic)

* tag 'dma-mapping-5.17-1' of git://git.infradead.org/users/hch/dma-mapping:
  swiotlb: fix info leak with DMA_FROM_DEVICE
2022-02-27 12:42:37 -08:00
Linus Torvalds
2293be58d6 Tracing fixes for 5.17:
- rtla (Real-Time Linux Analysis tool): fix typo in man page
 
  - rtla: Update API -e to -E before it is released
 
  - rlla: Error message fix and memory leak fix
 
  - Partially uninline trace event soft disable to shrink text
 
  - Fix function graph start up test
 
  - Have triggers affect the trace instance they are in and not top level
 
  - Have osnoise sleep in the units it says it uses
 
  - Remove unused ftrace stub function
 
  - Remove event probe redundant info from event in the buffer
 
  - Fix group ownership setting in tracefs
 
  - Ensure trace buffer is minimum size to prevent crashes
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYho7XBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qiOhAQDbCbEjIYwkGCpckuGgSQiMU4bAWUzk
 jCz9PoaTxoIWJwEAsLWrAPb0pDzNwdEKjiC3fJoUJhz3NwlEjJ7hQ3BxzAI=
 =iXOQ
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:

 - rtla (Real-Time Linux Analysis tool):
    - fix typo in man page
    - Update API -e to -E before it is released
    - Error message fix and memory leak fix

 - Partially uninline trace event soft disable to shrink text

 - Fix function graph start up test

 - Have triggers affect the trace instance they are in and not top level

 - Have osnoise sleep in the units it says it uses

 - Remove unused ftrace stub function

 - Remove event probe redundant info from event in the buffer

 - Fix group ownership setting in tracefs

 - Ensure trace buffer is minimum size to prevent crashes

* tag 'trace-v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  rtla/osnoise: Fix error message when failing to enable trace instance
  rtla/osnoise: Free params at the exit
  rtla/hist: Make -E the short version of --entries
  tracing: Fix selftest config check for function graph start up test
  tracefs: Set the group ownership in apply_options() not parse_options()
  tracing/osnoise: Make osnoise_main to sleep for microseconds
  ftrace: Remove unused ftrace_startup_enable() stub
  tracing: Ensure trace buffer is at least 4096 bytes large
  tracing: Uninline trace_trigger_soft_disabled() partly
  eprobes: Remove redundant event type information
  tracing: Have traceon and traceoff trigger honor the instance
  tracing: Dump stacktrace trigger to the corresponding instance
  rtla: Fix systme -> system typo on man page
2022-02-26 12:10:17 -08:00
Linus Torvalds
2800b6d0fc Power management fixes for 5.17-rc6
Fix the throttle IRQ handling during cpufreq initialization on Qualcomm
 platforms (Bjorn Andersson).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmIY9IwSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxPikP/Apg+pjSPt6OaPB/LxZUbL7OMMn6YkPn
 YUuxvpw3zznKA/eB6KBnLiwAal5IhRyPUGBRSPq/dKNmd+7+3oJID6xAqcxBGUuI
 GaA2d4HAqq2k7jDDiGLCiDwIxYPHfeuW6T++dJGQ11QBM1CZYkPXKliUhi0GDFd9
 j1aA6Yfo4CHKU7tN97fnCe0sGrxU7GGAUCg5S6w5CC5YRvR0uIb/+gbMwcscS0zi
 TLHhIARY62nUI4yciqT8enRl8rRyN/wHud2SzKP+Nvt2bktWSIzSbngHe9mxhw19
 hw4kGKti/hjlppfvmx0fGRiWYvFaQcccS183WxZmAcUxWNwxB8vghNf19qsRVOIa
 csbi5WJ73Tnp/ZlcK4etSMXSV8rIWwhker+n5aROL7Nh29qG4vJ3C/XtJPySof+P
 kMv0NMLyrt4AKiVvC7ncH7XCJMCC3ypUkm27fhW4c4y2n2rv9OFNAu1ieQL2DiIK
 K1xRwnkO2aWsjiu8BX3UcjwooE4qvPKCCq5pMFp/7tvlwNWLss7gg2pTZF4jkL8b
 joboftsZqxXSZ5G9U0I18btzJtRuasvxKnpeGG74+H2bhwraqyuilLsiOrVymhyG
 0CFrj6ZUKQKwhNvkSjVpHRqMT8HdI/hIkWviHfTDC95/d0ZZ2Pss9YSqKFp/BXz5
 v7tRN7mHwZAw
 =Pi5/
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "Fix the throttle IRQ handling during cpufreq initialization on
  Qualcomm platforms (Bjorn Andersson)"

* tag 'pm-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: qcom-hw: Delay enabling throttle_irq
  cpufreq: Reintroduce ready() callback
2022-02-25 12:17:20 -08:00
Linus Torvalds
c47658311d Char/Misc driver fixes for 5.17-rc6
Here are a few small driver fixes for 5.17-rc6 for reported issues.  The
 majority of these are IIO fixes for small things, and the other two are
 a mvmem and mtd core conflict fix.
 
 All of these have been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYhjVGA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymtYQCgwSOl1N4iPmv2lpaHmofd+BF4frgAn1DJ0q/A
 OCG5siwdf3wMxPjL9HIC
 =AUOQ
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are a few small driver fixes for 5.17-rc6 for reported issues.

  The majority of these are IIO fixes for small things, and the other
  two are a mvmem and mtd core conflict fix.

  All of these have been in linux-next with no reported issues"

* tag 'char-misc-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  mtd: core: Fix a conflict between MTD and NVMEM on wp-gpios property
  nvmem: core: Fix a conflict between MTD and NVMEM on wp-gpios property
  iio: imu: st_lsm6dsx: wait for settling time in st_lsm6dsx_read_oneshot
  iio: Fix error handling for PM
  iio: addac: ad74413r: correct comparator gpio getters mask usage
  iio: addac: ad74413r: use ngpio size when iterating over mask
  iio: addac: ad74413r: Do not reference negative array offsets
  iio: adc: men_z188_adc: Fix a resource leak in an error handling path
  iio: frequency: admv1013: remove the always true condition
  iio: accel: fxls8962af: add padding to regmap for SPI
  iio:imu:adis16480: fix buffering for devices with no burst mode
  iio: adc: ad7124: fix mask used for setting AIN_BUFP & AIN_BUFM bits
  iio: adc: tsc2046: fix memory corruption by preventing array overflow
2022-02-25 12:12:06 -08:00
Christophe Leroy
bc82c38a69 tracing: Uninline trace_trigger_soft_disabled() partly
On a powerpc32 build with CONFIG_CC_OPTIMISE_FOR_SIZE, the inline
keyword is not honored and trace_trigger_soft_disabled() appears
approx 50 times in vmlinux.

Adding -Winline to the build, the following message appears:

	./include/linux/trace_events.h:712:1: error: inlining failed in call to 'trace_trigger_soft_disabled': call is unlikely and code size would grow [-Werror=inline]

That function is rather big for an inlined function:

	c003df60 <trace_trigger_soft_disabled>:
	c003df60:	94 21 ff f0 	stwu    r1,-16(r1)
	c003df64:	7c 08 02 a6 	mflr    r0
	c003df68:	90 01 00 14 	stw     r0,20(r1)
	c003df6c:	bf c1 00 08 	stmw    r30,8(r1)
	c003df70:	83 e3 00 24 	lwz     r31,36(r3)
	c003df74:	73 e9 01 00 	andi.   r9,r31,256
	c003df78:	41 82 00 10 	beq     c003df88 <trace_trigger_soft_disabled+0x28>
	c003df7c:	38 60 00 00 	li      r3,0
	c003df80:	39 61 00 10 	addi    r11,r1,16
	c003df84:	4b fd 60 ac 	b       c0014030 <_rest32gpr_30_x>
	c003df88:	73 e9 00 80 	andi.   r9,r31,128
	c003df8c:	7c 7e 1b 78 	mr      r30,r3
	c003df90:	41 a2 00 14 	beq     c003dfa4 <trace_trigger_soft_disabled+0x44>
	c003df94:	38 c0 00 00 	li      r6,0
	c003df98:	38 a0 00 00 	li      r5,0
	c003df9c:	38 80 00 00 	li      r4,0
	c003dfa0:	48 05 c5 f1 	bl      c009a590 <event_triggers_call>
	c003dfa4:	73 e9 00 40 	andi.   r9,r31,64
	c003dfa8:	40 82 00 28 	bne     c003dfd0 <trace_trigger_soft_disabled+0x70>
	c003dfac:	73 ff 02 00 	andi.   r31,r31,512
	c003dfb0:	41 82 ff cc 	beq     c003df7c <trace_trigger_soft_disabled+0x1c>
	c003dfb4:	80 01 00 14 	lwz     r0,20(r1)
	c003dfb8:	83 e1 00 0c 	lwz     r31,12(r1)
	c003dfbc:	7f c3 f3 78 	mr      r3,r30
	c003dfc0:	83 c1 00 08 	lwz     r30,8(r1)
	c003dfc4:	7c 08 03 a6 	mtlr    r0
	c003dfc8:	38 21 00 10 	addi    r1,r1,16
	c003dfcc:	48 05 6f 6c 	b       c0094f38 <trace_event_ignore_this_pid>
	c003dfd0:	38 60 00 01 	li      r3,1
	c003dfd4:	4b ff ff ac 	b       c003df80 <trace_trigger_soft_disabled+0x20>

However it is located in a hot path so inlining it is important.
But forcing inlining of the entire function by using __always_inline
leads to increasing the text size by approx 20 kbytes.

Instead, split the fonction in two parts, one part with the likely
fast path, flagged __always_inline, and a second part out of line.

With this change, on a powerpc32 with CONFIG_CC_OPTIMISE_FOR_SIZE
vmlinux text increases by only 1,4 kbytes, which is partly
compensated by a decrease of vmlinux data by 7 kbytes.

On ppc64_defconfig which has CONFIG_CC_OPTIMISE_FOR_SPEED, this
change reduces vmlinux text by more than 30 kbytes.

Link: https://lkml.kernel.org/r/69ce0986a52d026d381d612801d978aa4f977460.1644563295.git.christophe.leroy@csgroup.eu

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-25 12:07:01 -05:00
Yong Wu
2502960fba component: Add common helper for compare/release functions
The component requires the compare/release functions, there are so many
copies in current kernel. Just define four common helpers for them.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Link: https://lore.kernel.org/r/20220214060819.7334-2-yong.wu@mediatek.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-25 12:16:12 +01:00
Linus Torvalds
1f840c0ef4 x86 host:
* Expose KVM_CAP_ENABLE_CAP since it is supported
 
 * Disable KVM_HC_CLOCK_PAIRING in TSC catchup mode
 
 * Ensure async page fault token is nonzero
 
 * Fix lockdep false negative
 
 * Fix FPU migration regression from the AMX changes
 
 x86 guest:
 
 * Don't use PV TLB/IPI/yield on uniprocessor guests
 
 PPC:
 * reserve capability id (topic branch for ppc/kvm)
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmIXyQAUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroPKJQf/T9NeXOFIPIIlH4ZKM7155qlwX8dx
 NR2YV+RNYd27MDkaEm9w4ucXacGpPuBPPx9v7UiLlAqAN+NP7nF3rQKC0SpQMC6H
 EKFtm+8al8EzyDYP36fqnwDne/xWHlOeGXRRJMKPGhXBSoXoY5cK35IXmNZjfteQ
 hK7siBs2saJ2VFqMCbJ9Pqdu1NDO6OEt8HWz2Dnx6EUd90O0pHWZy5JvWOYfyLjL
 Y2pP0dZQxuB/PmqkpVj2gV9jK2Zhj33eerzDV4tVXPV7le8fgGeTaJ8ft+SUIizS
 YCcPR89+u5c9yzlwY2i7mvloayKnuqkECiGtRG6VHNlrPZTPijems8tH1w==
 =lWjy
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "x86 host:

   - Expose KVM_CAP_ENABLE_CAP since it is supported

   - Disable KVM_HC_CLOCK_PAIRING in TSC catchup mode

   - Ensure async page fault token is nonzero

   - Fix lockdep false negative

   - Fix FPU migration regression from the AMX changes

  x86 guest:

   - Don't use PV TLB/IPI/yield on uniprocessor guests

  PPC:

   - reserve capability id (topic branch for ppc/kvm)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: nSVM: disallow userspace setting of MSR_AMD64_TSC_RATIO to non default value when tsc scaling disabled
  KVM: x86/mmu: make apf token non-zero to fix bug
  KVM: PPC: reserve capability 210 for KVM_CAP_PPC_AIL_MODE_3
  x86/kvm: Don't use pv tlb/ipi/sched_yield if on 1 vCPU
  x86/kvm: Fix compilation warning in non-x86_64 builds
  x86/kvm/fpu: Remove kvm_vcpu_arch.guest_supported_xcr0
  x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0
  kvm: x86: Disable KVM_HC_CLOCK_PAIRING if tsc is in always catchup mode
  KVM: Fix lockdep false negative during host resume
  KVM: x86: Add KVM_CAP_ENABLE_CAP to x86
2022-02-24 14:05:49 -08:00
Linus Torvalds
f672ff9123 Networking fixes for 5.17-rc6, including fixes from bpf and netfilter.
Current release - regressions:
 
  - bpf: fix crash due to out of bounds access into reg2btf_ids
 
  - mvpp2: always set port pcs ops, avoid null-deref
 
  - eth: marvell: fix driver load from initrd
 
  - eth: intel: revert "Fix reset bw limit when DCB enabled with 1 TC"
 
 Current release - new code bugs:
 
  - mptcp: fix race in overlapping signal events
 
 Previous releases - regressions:
 
  - xen-netback: revert hotplug-status changes causing devices to
    not be configured
 
  - dsa:
    - avoid call to __dev_set_promiscuity() while rtnl_mutex isn't held
    - fix panic when removing unoffloaded port from bridge
 
  - dsa: microchip: fix bridging with more than two member ports
 
 Previous releases - always broken:
 
  - bpf:
   - fix crash due to incorrect copy_map_value when both spin lock
     and timer are present in a single value
   - fix a bpf_timer initialization issue with clang
   - do not try bpf_msg_push_data with len 0
   - add schedule points in batch ops
 
  - nf_tables:
    - unregister flowtable hooks on netns exit
    - correct flow offload action array size
    - fix a couple of memory leaks
 
  - vsock: don't check owner in vhost_vsock_stop() while releasing
 
  - gso: do not skip outer ip header in case of ipip and net_failover
 
  - smc: use a mutex for locking "struct smc_pnettable"
 
  - openvswitch: fix setting ipv6 fields causing hw csum failure
 
  - mptcp: fix race in incoming ADD_ADDR option processing
 
  - sysfs: add check for netdevice being present to speed_show
 
  - sched: act_ct: fix flow table lookup after ct clear or switching
    zones
 
  - eth: intel: fixes for SR-IOV forwarding offloads
 
  - eth: broadcom: fixes for selftests and error recovery
 
  - eth: mellanox: flow steering and SR-IOV forwarding fixes
 
 Misc:
 
  - make __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor
    friends not report freed skbs as drops
 
  - force inlining of checksum functions in net/checksum.h
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmIX2ssACgkQMUZtbf5S
 IrvImQ//b+JILp0M/jz6q25n5U7qxuNmJypq659kR19jnwGH520XTwnFE9/FB3gw
 UnlCb28+jdMX1HHQJaUKkKYTilfFvyMoRPAMbLFO51Y02dVALTjD7C2wJ1AyEiTV
 eKhOcGHLbDzLom3+FnK566adOlGsIZfr4bR4zlGcthU0wTvU6S2K3WTkVJMASJzJ
 JizNgN+SvpdpmnYj+wsg2cj/5W4R/IPdxCrkZMkEMomJnVxA61RV+wsCcsT+Cjrf
 wu+cknUiVIGQNtCT4hz8VZ3tOoAeX+Xg/4YbaxVxnvunTQh+D+eIza40IEqewlEq
 KFOXGuPXsse6ZJ7IqVZt1hgBxJ8bpItxEBNSgU3KqJKMTTKOpWWjZxkTYeIERMry
 Ywb/ciZ7pwbo2CNhICh6+xefQvGbU0jgsiMgSkQvXZ9b9IsdPM4bwgvjFsyqnEMz
 0HVpqN02F7MM44mD4P0TQct9OSemu6sVqQFrpk8+CvPfaSEctCv/iJ6WR/xxUgSp
 uPvKYlv7BqOKZtqzGOk215WEvTUf8dy9cxcQwoYBOBxs8h2XQSRXEWCsGWCOg5+V
 xLnlnreXHXKWcUrAmsJlZh6XmWGk9lBDqLX7hKCYZzMgU8nNopSDKKcDpVDkaBzC
 DrK8Y3y+lBhpBwCHt/GZw8Qg9aDDsczFpOfPZBVJy+jH+7AGK7M=
 =LT/x
 -----END PGP SIGNATURE-----

Merge tag 'net-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf and netfilter.

  Current release - regressions:

   - bpf: fix crash due to out of bounds access into reg2btf_ids

   - mvpp2: always set port pcs ops, avoid null-deref

   - eth: marvell: fix driver load from initrd

   - eth: intel: revert "Fix reset bw limit when DCB enabled with 1 TC"

  Current release - new code bugs:

   - mptcp: fix race in overlapping signal events

  Previous releases - regressions:

   - xen-netback: revert hotplug-status changes causing devices to not
     be configured

   - dsa:
      - avoid call to __dev_set_promiscuity() while rtnl_mutex isn't
        held
      - fix panic when removing unoffloaded port from bridge

   - dsa: microchip: fix bridging with more than two member ports

  Previous releases - always broken:

   - bpf:
      - fix crash due to incorrect copy_map_value when both spin lock
        and timer are present in a single value
      - fix a bpf_timer initialization issue with clang
      - do not try bpf_msg_push_data with len 0
      - add schedule points in batch ops

   - nf_tables:
      - unregister flowtable hooks on netns exit
      - correct flow offload action array size
      - fix a couple of memory leaks

   - vsock: don't check owner in vhost_vsock_stop() while releasing

   - gso: do not skip outer ip header in case of ipip and net_failover

   - smc: use a mutex for locking "struct smc_pnettable"

   - openvswitch: fix setting ipv6 fields causing hw csum failure

   - mptcp: fix race in incoming ADD_ADDR option processing

   - sysfs: add check for netdevice being present to speed_show

   - sched: act_ct: fix flow table lookup after ct clear or switching
     zones

   - eth: intel: fixes for SR-IOV forwarding offloads

   - eth: broadcom: fixes for selftests and error recovery

   - eth: mellanox: flow steering and SR-IOV forwarding fixes

  Misc:

   - make __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor
     friends not report freed skbs as drops

   - force inlining of checksum functions in net/checksum.h"

* tag 'net-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
  net: mv643xx_eth: process retval from of_get_mac_address
  ping: remove pr_err from ping_lookup
  Revert "i40e: Fix reset bw limit when DCB enabled with 1 TC"
  openvswitch: Fix setting ipv6 fields causing hw csum failure
  ipv6: prevent a possible race condition with lifetimes
  net/smc: Use a mutex for locking "struct smc_pnettable"
  bnx2x: fix driver load from initrd
  Revert "xen-netback: Check for hotplug-status existence before watching"
  Revert "xen-netback: remove 'hotplug-status' once it has served its purpose"
  net/mlx5e: Fix VF min/max rate parameters interchange mistake
  net/mlx5e: Add missing increment of count
  net/mlx5e: MPLSoUDP decap, fix check for unsupported matches
  net/mlx5e: Fix MPLSoUDP encap to use MPLS action information
  net/mlx5e: Add feature check for set fec counters
  net/mlx5e: TC, Skip redundant ct clear actions
  net/mlx5e: TC, Reject rules with forward and drop actions
  net/mlx5e: TC, Reject rules with drop and modify hdr action
  net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets
  net/mlx5e: Fix wrong return value on ioctl EEPROM query failure
  net/mlx5: Fix possible deadlock on rule deletion
  ...
2022-02-24 12:45:32 -08:00
Linus Torvalds
73878e5eb1 block-5.17-2022-02-24
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmIXp/gQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgppUzD/9wm9OHebVIjFPDCb7pznbrc7zvUXTXMLGJ
 dFh/Un8NVXXoOMppkMV2rXyJhhKb2kcjaqa5OlT64/3SJkmESHbSpQT1lbDcQfGu
 Yu4K2JU5KwSMNBNbSJJo4G4gM6VcgD22k4d6PbeTYZVuXY0rzk/KsG5lZZZFrQzG
 mQnobyPtsjtOwUMWAlpABYqVU9sZU4o0YUkHUaAHn+Xsm13CDJDsDPZx7/XNctvp
 mdoWI5RsffWxeVt3iB/f1AVBuczgsu4NO5aofF4XCjmnj5r1tGsXWqFFP4aiHP2i
 Iiu6twdA1KAgSihEHrsmDSiwwG3B4rLmtu739f5AEi8w3Haj9QBK/tgADRI2XLGq
 ulRMIHEE2qCApy9NWigyMn6fxt9ne8njAH8coljFsb1sV+2a+mn47f434Gzqo5Jl
 cKZeYElrJGfeNdjDTeG76fVGR89fKEKBvhml8cf/jph8RcPmLSNsTHPxOZcVSLKU
 X46FYi5rj9Y9VcPlF8UXmoDZt8+cBVDA12OzxIiKQDMzWeQkb4ShqE3IPYcpZ+f0
 dJN6oZTIMc283kHWva5ok9egSnC7Ly6xv3+Mf6jBVgAuefgVAncRGBCLKFTLZL6D
 CprhBmby/ubMkWstB2t9o0pFBxC7ttrpGexig/wvEe/OpDz+1d4rXEIEEwRFDLmR
 x+XsGKgOKQ==
 =7WMB
 -----END PGP SIGNATURE-----

Merge tag 'block-5.17-2022-02-24' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - NVMe pull request:
    - send H2CData PDUs based on MAXH2CDATA (Varun Prakash)
    - fix passthrough to namespaces with unsupported features (Christoph
      Hellwig)

 - Clear iocb->private at poll completion (Stefano)

* tag 'block-5.17-2022-02-24' of git://git.kernel.dk/linux-block:
  nvme-tcp: send H2CData PDUs based on MAXH2CDATA
  nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info
  nvme: don't return an error from nvme_configure_metadata
  block: clear iocb->private in blkdev_bio_end_io_async()
2022-02-24 11:15:10 -08:00
Rafael J. Wysocki
c5eb92f57d Merge branch 'cpufreq/arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Pull ARM cpufreq fixes for 5.18-rc6 from Viresh Kumar:

"This fixes issues related to throttle IRQ for Qcom SoCs."

* 'cpufreq/arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
  cpufreq: qcom-hw: Delay enabling throttle_irq
  cpufreq: Reintroduce ready() callback
2022-02-24 19:54:59 +01:00
Paul Blakey
d9b5ae5c1b openvswitch: Fix setting ipv6 fields causing hw csum failure
Ipv6 ttl, label and tos fields are modified without first
pulling/pushing the ipv6 header, which would have updated
the hw csum (if available). This might cause csum validation
when sending the packet to the stack, as can be seen in
the trace below.

Fix this by updating skb->csum if available.

Trace resulted by ipv6 ttl dec and then sending packet
to conntrack [actions: set(ipv6(hlimit=63)),ct(zone=99)]:
[295241.900063] s_pf0vf2: hw csum failure
[295241.923191] Call Trace:
[295241.925728]  <IRQ>
[295241.927836]  dump_stack+0x5c/0x80
[295241.931240]  __skb_checksum_complete+0xac/0xc0
[295241.935778]  nf_conntrack_tcp_packet+0x398/0xba0 [nf_conntrack]
[295241.953030]  nf_conntrack_in+0x498/0x5e0 [nf_conntrack]
[295241.958344]  __ovs_ct_lookup+0xac/0x860 [openvswitch]
[295241.968532]  ovs_ct_execute+0x4a7/0x7c0 [openvswitch]
[295241.979167]  do_execute_actions+0x54a/0xaa0 [openvswitch]
[295242.001482]  ovs_execute_actions+0x48/0x100 [openvswitch]
[295242.006966]  ovs_dp_process_packet+0x96/0x1d0 [openvswitch]
[295242.012626]  ovs_vport_receive+0x6c/0xc0 [openvswitch]
[295242.028763]  netdev_frame_hook+0xc0/0x180 [openvswitch]
[295242.034074]  __netif_receive_skb_core+0x2ca/0xcb0
[295242.047498]  netif_receive_skb_internal+0x3e/0xc0
[295242.052291]  napi_gro_receive+0xba/0xe0
[295242.056231]  mlx5e_handle_rx_cqe_mpwrq_rep+0x12b/0x250 [mlx5_core]
[295242.062513]  mlx5e_poll_rx_cq+0xa0f/0xa30 [mlx5_core]
[295242.067669]  mlx5e_napi_poll+0xe1/0x6b0 [mlx5_core]
[295242.077958]  net_rx_action+0x149/0x3b0
[295242.086762]  __do_softirq+0xd7/0x2d6
[295242.090427]  irq_exit+0xf7/0x100
[295242.093748]  do_IRQ+0x7f/0xd0
[295242.096806]  common_interrupt+0xf/0xf
[295242.100559]  </IRQ>
[295242.102750] RIP: 0033:0x7f9022e88cbd
[295242.125246] RSP: 002b:00007f9022282b20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffda
[295242.132900] RAX: 0000000000000005 RBX: 0000000000000010 RCX: 0000000000000000
[295242.140120] RDX: 00007f9022282ba8 RSI: 00007f9022282a30 RDI: 00007f9014005c30
[295242.147337] RBP: 00007f9014014d60 R08: 0000000000000020 R09: 00007f90254a8340
[295242.154557] R10: 00007f9022282a28 R11: 0000000000000246 R12: 0000000000000000
[295242.161775] R13: 00007f902308c000 R14: 000000000000002b R15: 00007f9022b71f40

Fixes: 3fdbd1ce11 ("openvswitch: add ipv6 'set' action")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Link: https://lore.kernel.org/r/20220223163416.24096-1-paulb@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-24 09:16:21 -08:00
Linus Torvalds
4eb0a7c8e1 slab fixes for 5.17-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEjUuTAak14xi+SF7M4CHKc/GJqRAFAmIV/fQACgkQ4CHKc/GJ
 qRBhaAgAoz81fjhlkcCaHdgxVTEx6L93iJQJiZWoE3gTKk2jruun3sIYmPSOiY+b
 bWR1datDnvaS/Xv04rZ6pm6XPjCT+LmrOQCOlZMjptc6HKoKuDZTvcQ0u5CYOfQS
 I9ZRtPaHjSmhntS8BErxGes5+PF1hz/q2rGuODt4/DQCNPZNdHXMdym9w4Z4xHXm
 TuH2VXzv5JXhYlUEDz2HP8LXmbvxA9rGaMgngpX92pCL8uTLqANoZCT+zHEj3cKw
 db6/A8S7Y4PsfF0JphNup+wcsWj+yfIrfAQwnTgNXR4hlhbUxOHTJqXlQGK/NW7C
 tg2nXxQQn14MwPlkatdxFYzg1TbMYg==
 =U6Fu
 -----END PGP SIGNATURE-----

Merge tag 'slab-for-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab fixes from Vlastimil Babka:

 - Build fix (workaround) for clang.

 - Fix a /proc/kcore based slabinfo script broken by struct slab changes
   in 5.17-rc1.

* tag 'slab-for-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  tools/cgroup/slabinfo: update to work with struct slab
  slab: remove __alloc_size attribute from __kmalloc_track_caller
2022-02-23 11:33:12 -08:00
Greg Kroah-Hartman
f2eb478f2f kernfs: move struct kernfs_root out of the public view.
There is no need to have struct kernfs_root be part of kernfs.h for
the whole kernel to see and poke around it.  Move it internal to kernfs
code and provide a helper function, kernfs_root_to_node(), to handle the
one field that kernfs users were directly accessing from the structure.

Cc: Imran Khan <imran.f.khan@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220222070713.3517679-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23 15:46:34 +01:00
Varun Prakash
c2700d2886 nvme-tcp: send H2CData PDUs based on MAXH2CDATA
As per NVMe/TCP specification (revision 1.0a, section 3.6.2.3)
Maximum Host to Controller Data length (MAXH2CDATA): Specifies the
maximum number of PDU-Data bytes per H2CData PDU in bytes. This value
is a multiple of dwords and should be no less than 4,096.

Current code sets H2CData PDU data_length to r2t_length,
it does not check MAXH2CDATA value. Fix this by setting H2CData PDU
data_length to min(req->h2cdata_left, queue->maxh2cdata).

Also validate MAXH2CDATA value returned by target in ICResp PDU,
if it is not a multiple of dword or if it is less than 4096 return
-EINVAL from nvme_tcp_init_connection().

Signed-off-by: Varun Prakash <varun@chelsio.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-23 14:43:11 +01:00
Nicholas Piggin
93b71801a8 KVM: PPC: reserve capability 210 for KVM_CAP_PPC_AIL_MODE_3
Add KVM_CAP_PPC_AIL_MODE_3 to advertise the capability to set the AIL
resource mode to 3 with the H_SET_MODE hypercall. This capability
differs between processor types and KVM types (PR, HV, Nested HV), and
affects guest-visible behaviour.

QEMU will implement a cap-ail-mode-3 to control this behaviour[1], and
use the KVM CAP if available to determine KVM support[2].

Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-22 09:06:54 -05:00
David S. Miller
5663b85462 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

This is fixing up the use without proper initialization in patch 5/5

-o-

Hi,

The following patchset contains Netfilter fixes for net:

1) Missing #ifdef CONFIG_IP6_NF_IPTABLES in recent xt_socket fix.

2) Fix incorrect flow action array size in nf_tables.

3) Unregister flowtable hooks from netns exit path.

4) Fix missing limit object release, from Florian Westphal.

5) Memleak in nf_tables object update path, also from Florian.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-22 11:00:51 +00:00
Christophe Kerello
f6c052afe6 nvmem: core: Fix a conflict between MTD and NVMEM on wp-gpios property
Wp-gpios property can be used on NVMEM nodes and the same property can
be also used on MTD NAND nodes. In case of the wp-gpios property is
defined at NAND level node, the GPIO management is done at NAND driver
level. Write protect is disabled when the driver is probed or resumed
and is enabled when the driver is released or suspended.

When no partitions are defined in the NAND DT node, then the NAND DT node
will be passed to NVMEM framework. If wp-gpios property is defined in
this node, the GPIO resource is taken twice and the NAND controller
driver fails to probe.

It would be possible to set config->wp_gpio at MTD level before calling
nvmem_register function but NVMEM framework will toggle this GPIO on
each write when this GPIO should only be controlled at NAND level driver
to ensure that the Write Protect has not been enabled.

A way to fix this conflict is to add a new boolean flag in nvmem_config
named ignore_wp. In case ignore_wp is set, the GPIO resource will
be managed by the provider.

Fixes: 2a127da461 ("nvmem: add support for the write-protect pin")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Kerello <christophe.kerello@foss.st.com>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20220220151432.16605-2-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-21 17:59:25 +01:00
Greg Kroah-Hartman
93dd04ab0b slab: remove __alloc_size attribute from __kmalloc_track_caller
Commit c37495d625 ("slab: add __alloc_size attributes for better
bounds checking") added __alloc_size attributes to a bunch of kmalloc
function prototypes.  Unfortunately the change to __kmalloc_track_caller
seems to cause clang to generate broken code and the first time this is
called when booting, the box will crash.

While the compiler problems are being reworked and attempted to be
solved [1], let's just drop the attribute to solve the issue now.  Once
it is resolved it can be added back.

[1] https://github.com/ClangBuiltLinux/linux/issues/1599

Fixes: c37495d625 ("slab: add __alloc_size attributes for better bounds checking")
Cc: stable <stable@vger.kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: llvm@lists.linux.dev
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/r/20220218131358.3032912-1-gregkh@linuxfoundation.org
2022-02-21 11:32:44 +01:00
Linus Torvalds
0b0894ff78 - Fix task exposure order when forking tasks
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmISLs0ACgkQEsHwGGHe
 VUrq+g/+NxLVl3QTmteNh4jEzA/3FDccRKbALNs09YX116nCOZ1sv1Z5FgtdOLUT
 ukIujAm1SKDXQrXoglmm3eH6ylWZ54DQo6qrP+vzavtIKSgbt+vPS63qDR4ehF8N
 Kh04qssWy+AzOSeZRQ6lXTiGei2nBKFbiwlwPnV5vbQmLBFvzv2dEErVFYkHwWN6
 foygRUOhslv5aizwpDriAvDq9ZTVUPXqzIi5zWnTON8y8Vy32eZjSJKez8SQH57w
 vbEZkJTw33tCzG/5+J5mh3UmP9Mcj34W/GDCKHxjO1Y38SbLTMy2Agl5hbRKjVZ8
 W4wElyXyXtLx1RsrZpOJ90mhqwADcxLgkq4ipl6uzikeQlQVnOOBsHZOi3NkTorg
 1YhpxZhqzWWotdGUQwzfZYSuoqQAyUxeqZKwqUD+dYsEqQWhTWSa41IWcfz+r8UD
 uD+pO03EbAEzBB+BSJL9xh0xnchdOnHbpTSFo1AJeQ+LUKanXxO2tXk9/3SwliMj
 M751vPtQH6DiVukSfywxJym+aaLbjeju6RvA69Atap51roYXPSsaCHGDwLeNUS+w
 5D6KC0IuCxsliLpAD4c4PXkRdTvwnG/0lUt+s83bGnyfh5nQzfXXbBp3I5+o0Fcq
 jIQ37uH2Mj5MKtmC8o4ekCtNTtWM8QNZvKzMzoju616Wqee0Ag4=
 =RagE
 -----END PGP SIGNATURE-----

Merge tag 'sched_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Borislav Petkov:
 "Fix task exposure order when forking tasks"

* tag 'sched_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix yet more sched_fork() races
2022-02-20 12:40:20 -08:00
Pablo Neira Ayuso
b1a5983f56 netfilter: nf_tables_offload: incorrect flow offload action array size
immediate verdict expression needs to allocate one slot in the flow offload
action array, however, immediate data expression does not need to do so.

fwd and dup expression need to allocate one slot, this is missing.

Add a new offload_action interface to report if this expression needs to
allocate one slot in the flow offload action array.

Fixes: be2861dc36 ("netfilter: nft_{fwd,dup}_netdev: add offload support")
Reported-and-tested-by: Nick Gregory <Nick.Gregory@Sophos.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-02-20 01:22:20 +01:00
Christophe Leroy
5486f5bf79 net: Force inlining of checksum functions in net/checksum.h
All functions defined as static inline in net/checksum.h are
meant to be inlined for performance reason.

But since commit ac7c3e4ff4 ("compiler: enable
CONFIG_OPTIMIZE_INLINING forcibly") the compiler is allowed to
uninline functions when it wants.

Fair enough in the general case, but for tiny performance critical
checksum helpers that's counter-productive.

The problem mainly arises when selecting CONFIG_CC_OPTIMISE_FOR_SIZE,
Those helpers being 'static inline' in header files you suddenly find
them duplicated many times in the resulting vmlinux.

Here is a typical exemple when building powerpc pmac32_defconfig
with CONFIG_CC_OPTIMISE_FOR_SIZE. csum_sub() appears 4 times:

	c04a23cc <csum_sub>:
	c04a23cc:	7c 84 20 f8 	not     r4,r4
	c04a23d0:	7c 63 20 14 	addc    r3,r3,r4
	c04a23d4:	7c 63 01 94 	addze   r3,r3
	c04a23d8:	4e 80 00 20 	blr
		...
	c04a2ce8:	4b ff f6 e5 	bl      c04a23cc <csum_sub>
		...
	c04a2d2c:	4b ff f6 a1 	bl      c04a23cc <csum_sub>
		...
	c04a2d54:	4b ff f6 79 	bl      c04a23cc <csum_sub>
		...
	c04a754c <csum_sub>:
	c04a754c:	7c 84 20 f8 	not     r4,r4
	c04a7550:	7c 63 20 14 	addc    r3,r3,r4
	c04a7554:	7c 63 01 94 	addze   r3,r3
	c04a7558:	4e 80 00 20 	blr
		...
	c04ac930:	4b ff ac 1d 	bl      c04a754c <csum_sub>
		...
	c04ad264:	4b ff a2 e9 	bl      c04a754c <csum_sub>
		...
	c04e3b08 <csum_sub>:
	c04e3b08:	7c 84 20 f8 	not     r4,r4
	c04e3b0c:	7c 63 20 14 	addc    r3,r3,r4
	c04e3b10:	7c 63 01 94 	addze   r3,r3
	c04e3b14:	4e 80 00 20 	blr
		...
	c04e5788:	4b ff e3 81 	bl      c04e3b08 <csum_sub>
		...
	c04e65c8:	4b ff d5 41 	bl      c04e3b08 <csum_sub>
		...
	c0512d34 <csum_sub>:
	c0512d34:	7c 84 20 f8 	not     r4,r4
	c0512d38:	7c 63 20 14 	addc    r3,r3,r4
	c0512d3c:	7c 63 01 94 	addze   r3,r3
	c0512d40:	4e 80 00 20 	blr
		...
	c0512dfc:	4b ff ff 39 	bl      c0512d34 <csum_sub>
		...
	c05138bc:	4b ff f4 79 	bl      c0512d34 <csum_sub>
		...

Restore the expected behaviour by using __always_inline for all
functions defined in net/checksum.h

vmlinux size is even reduced by 256 bytes with this patch:

	   text	   data	    bss	    dec	    hex	filename
	6980022	2515362	 194384	9689768	 93daa8	vmlinux.before
	6979862	2515266	 194384	9689512	 93d9a8	vmlinux.now

Fixes: ac7c3e4ff4 ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly")
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-19 16:07:12 +00:00
Peter Zijlstra
b1e8206582 sched: Fix yet more sched_fork() races
Where commit 4ef0c5c6b5 ("kernel/sched: Fix sched_fork() access an
invalid sched_task_group") fixed a fork race vs cgroup, it opened up a
race vs syscalls by not placing the task on the runqueue before it
gets exposed through the pidhash.

Commit 13765de814 ("sched/fair: Fix fault in reweight_entity") is
trying to fix a single instance of this, instead fix the whole class
of issues, effectively reverting this commit.

Fixes: 4ef0c5c6b5 ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Tested-by: Zhang Qiao <zhangqiao22@huawei.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lkml.kernel.org/r/YgoeCbwj5mbCR0qA@hirez.programming.kicks-ass.net
2022-02-19 11:11:05 +01:00
Linus Torvalds
b9889768bd block-5.17-2022-02-17
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmIPE0oQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpq56D/9yUd82CT1c9drWjbANnIuOxxOAauNwDCdH
 wLAz5FptjRC5ju3Ai9FzklfeXlU8SEBR4/TLp+lO8elA/NS9tOAW3+v/0pZ4oZ4G
 MwxnE7Zj00AiRZ60w9zcb3Vq5TTXV71kcqbcx8YAh7b8niMDow8boLBY/FkB1EZc
 1pwCLjF5I5USmU8zlST5G6ctqcqHmV8TxIU3wijlYmZnkyxMo6UxqCpO5BOZRgGt
 S5VDvjoVDIQO757wLnaB5tA2NzEmaBG44yFoGkJqlChP7IgJrjTZQGKsqsIKLZzH
 Btot/FpEjRYzFdgumHxJ9Zjv/5QlzFT115H14e85CtfdeIhfifd8G4JKtPGU24IV
 5p/czOP8SA8qnEL6ply6osltzA9OkQBt1xZ6JW+0lK/izcHTIxVevrD8KsNHzh2z
 lskEn0Sowb1viaGrwuQfWyX7xtJCX2hGOsD2gH95B4FcsPNbfp0u4uGimSYS02bB
 2jJIhrhvQZTFyd/qeOGQ4MnJxoLF4Q63l+NzCqVSTBdSTJK/iGiofyqxL0WHB65L
 /q8J4fTEvDK6Fc3GETwHP70yoGitJ1p2+5n7uB/Ynfh7RHq/b7Ispjv5AenAyo52
 2GtY476aaYMdHVOM8h1Fnc8q1Qu4vwFY3Ly23FAhmRxilAXk5roPB6TP6Q6puCKe
 wqdhwjgVAA==
 =f22m
 -----END PGP SIGNATURE-----

Merge tag 'block-5.17-2022-02-17' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - Surprise removal fix (Christoph)

 - Ensure that pages are zeroed before submitted for userspace IO
   (Haimin)

 - Fix blk-wbt accounting issue with BFQ (Laibin)

 - Use bsize for discard granularity in loop (Ming)

 - Fix missing zone handling in blk_complete_request() (Pankaj)

* tag 'block-5.17-2022-02-17' of git://git.kernel.dk/linux-block:
  block/wbt: fix negative inflight counter when remove scsi device
  block: fix surprise removal for drivers calling blk_set_queue_dying
  block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern
  block: loop:use kstatfs.f_bsize of backing file to set discard granularity
  block: Add handling for zone append command in blk_complete_request
2022-02-18 09:27:10 -08:00
Eric Dumazet
a1cdec57e0 net-timestamp: convert sk->sk_tskey to atomic_t
UDP sendmsg() can be lockless, this is causing all kinds
of data races.

This patch converts sk->sk_tskey to remove one of these races.

BUG: KCSAN: data-race in __ip_append_data / __ip_append_data

read to 0xffff8881035d4b6c of 4 bytes by task 8877 on cpu 1:
 __ip_append_data+0x1c1/0x1de0 net/ipv4/ip_output.c:994
 ip_make_skb+0x13f/0x2d0 net/ipv4/ip_output.c:1636
 udp_sendmsg+0x12bd/0x14c0 net/ipv4/udp.c:1249
 inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg net/socket.c:725 [inline]
 ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
 ___sys_sendmsg net/socket.c:2467 [inline]
 __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
 __do_sys_sendmmsg net/socket.c:2582 [inline]
 __se_sys_sendmmsg net/socket.c:2579 [inline]
 __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

write to 0xffff8881035d4b6c of 4 bytes by task 8880 on cpu 0:
 __ip_append_data+0x1d8/0x1de0 net/ipv4/ip_output.c:994
 ip_make_skb+0x13f/0x2d0 net/ipv4/ip_output.c:1636
 udp_sendmsg+0x12bd/0x14c0 net/ipv4/udp.c:1249
 inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg net/socket.c:725 [inline]
 ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
 ___sys_sendmsg net/socket.c:2467 [inline]
 __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
 __do_sys_sendmmsg net/socket.c:2582 [inline]
 __se_sys_sendmmsg net/socket.c:2579 [inline]
 __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0x0000054d -> 0x0000054e

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 8880 Comm: syz-executor.5 Not tainted 5.17.0-rc2-syzkaller-00167-gdcb85f85fa6f-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: 09c2d251b7 ("net-timestamp: add key to disambiguate concurrent datagrams")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-18 11:14:52 +00:00
Jakub Kicinski
7a2fb91285 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2022-02-17

We've added 8 non-merge commits during the last 7 day(s) which contain
a total of 8 files changed, 119 insertions(+), 15 deletions(-).

The main changes are:

1) Add schedule points in map batch ops, from Eric.

2) Fix bpf_msg_push_data with len 0, from Felix.

3) Fix crash due to incorrect copy_map_value, from Kumar.

4) Fix crash due to out of bounds access into reg2btf_ids, from Kumar.

5) Fix a bpf_timer initialization issue with clang, from Yonghong.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Add schedule points in batch ops
  bpf: Fix crash due to out of bounds access into reg2btf_ids.
  selftests: bpf: Check bpf_msg_push_data return value
  bpf: Fix a bpf_timer initialization issue
  bpf: Emit bpf_timer in vmlinux BTF
  selftests/bpf: Add test for bpf_timer overwriting crash
  bpf: Fix crash due to incorrect copy_map_value
  bpf: Do not try bpf_msg_push_data with len 0
====================

Link: https://lore.kernel.org/r/20220217190000.37925-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17 12:01:55 -08:00
Linus Torvalds
8b97cae315 Networking fixes for 5.17-rc5, including fixes from wireless and
netfilter.
 
 Current release - regressions:
 
  - dsa: lantiq_gswip: fix use after free in gswip_remove()
 
  - smc: avoid overwriting the copies of clcsock callback functions
 
 Current release - new code bugs:
 
  - iwlwifi:
    - fix use-after-free when no FW is present
    - mei: fix the pskb_may_pull check in ipv4
    - mei: retry mapping the shared area
    - mvm: don't feed the hardware RFKILL into iwlmei
 
 Previous releases - regressions:
 
  - ipv6: mcast: use rcu-safe version of ipv6_get_lladdr()
 
  - tipc: fix wrong publisher node address in link publications
 
  - iwlwifi: mvm: don't send SAR GEO command for 3160 devices,
    avoid FW assertion
 
  - bgmac: make idm and nicpm resource optional again
 
  - atl1c: fix tx timeout after link flap
 
 Previous releases - always broken:
 
  - vsock: remove vsock from connected table when connect is
    interrupted by a signal
 
  - ping: change destination interface checks to match raw sockets
 
  - crypto: af_alg - get rid of alg_memory_allocated to avoid confusing
    semantics (and null-deref) after SO_RESERVE_MEM was added
 
  - ipv6: make exclusive flowlabel checks per-netns
 
  - bonding: force carrier update when releasing slave
 
  - sched: limit TC_ACT_REPEAT loops
 
  - bridge: multicast: notify switchdev driver whenever MC processing
    gets disabled because of max entries reached
 
  - wifi: brcmfmac: fix crash in brcm_alt_fw_path when WLAN not found
 
  - iwlwifi: fix locking when "HW not ready"
 
  - phy: mediatek: remove PHY mode check on MT7531
 
  - dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN
 
  - dsa: lan9303:
    - fix polarity of reset during probe
    - fix accelerated VLAN handling
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmIOm44ACgkQMUZtbf5S
 IruWWBAAmNJBxVoVUahidwIVKHnKYeHClzDee3B6sYSupRW22Eeuh7Q8fqPMO4J7
 KO9nP/vGibFuhKfjApvS6wPvpYWCuXAoSozfOa+JWNrFg9uVpdyHIhfdXl0WPZqx
 A4+p2vIs1ldV0yOac/7ZGMWg57dzlYUurkld3xFwf7KyOOhV5/PkxkxpGN9eFkv1
 NTFWTaIsVzFMXMjNXkGfGHmt/8mSmZHgsH+tYd+KXsjbs2UpbGM3SyfHBlUf3aA0
 bceT4h07xA6C4rlUCbmalRqwvtcdM15MwlDBtSBXm5fXy0c59XxQOqj/dLhPuAO4
 42sQlO2MhqDrZjR0tOjmuP2cpc7llj1lIZe1Qs3nKiNFHcuOJEHGw1PYCO85jKdn
 xiWquuoe3G5YkQeoOoi+HqmXcP6aBZpUbROvYNjSJhcci4Ck0Qjna5J1rk8IRjb8
 AkDf68dodn8I+W5dx/EnopH/ShPQcqGw1+tH4215UB7b40Ecpc+laqFAHRcgs654
 ONuJVdRC4k3TyES1B9z8vawLcGYWa06fz8Mh/dS3gnLDphe5ZiH2tTrESfBdYixH
 idmuO5C/YDhsVelVuO+B0RT/yziPb3Lr+BTplSfkODCXT6LuOdCYUcHx5nGZ1TYW
 EeZ9hMSaxp2E06llEyD6JQQ+0Q17wnDGjLxtOMk+A8fmNX2F17g=
 =Nrzq
 -----END PGP SIGNATURE-----

Merge tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from wireless and netfilter.

  Current release - regressions:

   - dsa: lantiq_gswip: fix use after free in gswip_remove()

   - smc: avoid overwriting the copies of clcsock callback functions

  Current release - new code bugs:

   - iwlwifi:
      - fix use-after-free when no FW is present
      - mei: fix the pskb_may_pull check in ipv4
      - mei: retry mapping the shared area
      - mvm: don't feed the hardware RFKILL into iwlmei

  Previous releases - regressions:

   - ipv6: mcast: use rcu-safe version of ipv6_get_lladdr()

   - tipc: fix wrong publisher node address in link publications

   - iwlwifi: mvm: don't send SAR GEO command for 3160 devices, avoid FW
     assertion

   - bgmac: make idm and nicpm resource optional again

   - atl1c: fix tx timeout after link flap

  Previous releases - always broken:

   - vsock: remove vsock from connected table when connect is
     interrupted by a signal

   - ping: change destination interface checks to match raw sockets

   - crypto: af_alg - get rid of alg_memory_allocated to avoid confusing
     semantics (and null-deref) after SO_RESERVE_MEM was added

   - ipv6: make exclusive flowlabel checks per-netns

   - bonding: force carrier update when releasing slave

   - sched: limit TC_ACT_REPEAT loops

   - bridge: multicast: notify switchdev driver whenever MC processing
     gets disabled because of max entries reached

   - wifi: brcmfmac: fix crash in brcm_alt_fw_path when WLAN not found

   - iwlwifi: fix locking when "HW not ready"

   - phy: mediatek: remove PHY mode check on MT7531

   - dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN

   - dsa: lan9303:
      - fix polarity of reset during probe
      - fix accelerated VLAN handling"

* tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
  bonding: force carrier update when releasing slave
  nfp: flower: netdev offload check for ip6gretap
  ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt
  ipv4: fix data races in fib_alias_hw_flags_set
  net: dsa: lan9303: add VLAN IDs to master device
  net: dsa: lan9303: handle hwaccel VLAN tags
  vsock: remove vsock from connected table when connect is interrupted by a signal
  Revert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname"
  ping: fix the dif and sdif check in ping_lookup
  net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990
  net: sched: limit TC_ACT_REPEAT loops
  tipc: fix wrong notification node addresses
  net: dsa: lantiq_gswip: fix use after free in gswip_remove()
  ipv6: per-netns exclusive flowlabel checks
  net: bridge: multicast: notify switchdev driver whenever MC processing gets disabled
  CDC-NCM: avoid overflow in sanity checking
  mctp: fix use after free
  net: mscc: ocelot: fix use-after-free in ocelot_vlan_del()
  bonding: fix data-races around agg_select_timer
  dpaa2-eth: Initialize mutex used in one step timestamping path
  ...
2022-02-17 11:33:59 -08:00
Eric Dumazet
d95d6320ba ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt
Because fib6_info_hw_flags_set() is called without any synchronization,
all accesses to gi6->offload, fi->trap and fi->offload_failed
need some basic protection like READ_ONCE()/WRITE_ONCE().

BUG: KCSAN: data-race in fib6_info_hw_flags_set / fib6_purge_rt

read to 0xffff8881087d5886 of 1 bytes by task 13953 on cpu 0:
 fib6_drop_pcpu_from net/ipv6/ip6_fib.c:1007 [inline]
 fib6_purge_rt+0x4f/0x580 net/ipv6/ip6_fib.c:1033
 fib6_del_route net/ipv6/ip6_fib.c:1983 [inline]
 fib6_del+0x696/0x890 net/ipv6/ip6_fib.c:2028
 __ip6_del_rt net/ipv6/route.c:3876 [inline]
 ip6_del_rt+0x83/0x140 net/ipv6/route.c:3891
 __ipv6_dev_ac_dec+0x2b5/0x370 net/ipv6/anycast.c:374
 ipv6_dev_ac_dec net/ipv6/anycast.c:387 [inline]
 __ipv6_sock_ac_close+0x141/0x200 net/ipv6/anycast.c:207
 ipv6_sock_ac_close+0x79/0x90 net/ipv6/anycast.c:220
 inet6_release+0x32/0x50 net/ipv6/af_inet6.c:476
 __sock_release net/socket.c:650 [inline]
 sock_close+0x6c/0x150 net/socket.c:1318
 __fput+0x295/0x520 fs/file_table.c:280
 ____fput+0x11/0x20 fs/file_table.c:313
 task_work_run+0x8e/0x110 kernel/task_work.c:164
 tracehook_notify_resume include/linux/tracehook.h:189 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
 exit_to_user_mode_prepare+0x160/0x190 kernel/entry/common.c:207
 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
 syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:300
 do_syscall_64+0x50/0xd0 arch/x86/entry/common.c:86
 entry_SYSCALL_64_after_hwframe+0x44/0xae

write to 0xffff8881087d5886 of 1 bytes by task 1912 on cpu 1:
 fib6_info_hw_flags_set+0x155/0x3b0 net/ipv6/route.c:6230
 nsim_fib6_rt_hw_flags_set drivers/net/netdevsim/fib.c:668 [inline]
 nsim_fib6_rt_add drivers/net/netdevsim/fib.c:691 [inline]
 nsim_fib6_rt_insert drivers/net/netdevsim/fib.c:756 [inline]
 nsim_fib6_event drivers/net/netdevsim/fib.c:853 [inline]
 nsim_fib_event drivers/net/netdevsim/fib.c:886 [inline]
 nsim_fib_event_work+0x284f/0x2cf0 drivers/net/netdevsim/fib.c:1477
 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
 worker_thread+0x616/0xa70 kernel/workqueue.c:2454
 kthread+0x2c7/0x2e0 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30

value changed: 0x22 -> 0x2a

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 1912 Comm: kworker/1:3 Not tainted 5.16.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events nsim_fib_event_work

Fixes: 0c5fcf9e24 ("IPv6: Add "offload failed" indication to routes")
Fixes: bb3c4ab93e ("ipv6: Add "offload" and "trap" indications to routes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Amit Cohen <amcohen@nvidia.com>
Cc: Ido Schimmel <idosch@nvidia.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20220216173217.3792411-2-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17 09:48:24 -08:00
Christoph Hellwig
7a5428dcb7 block: fix surprise removal for drivers calling blk_set_queue_dying
Various block drivers call blk_set_queue_dying to mark a disk as dead due
to surprise removal events, but since commit 8e141f9eb8 that doesn't
work given that the GD_DEAD flag needs to be set to stop I/O.

Replace the driver calls to blk_set_queue_dying with a new (and properly
documented) blk_mark_disk_dead API, and fold blk_set_queue_dying into the
only remaining caller.

Fixes: 8e141f9eb8 ("block: drain file system I/O on del_gendisk")
Reported-by: Markus Blöchl <markus.bloechl@ipetronik.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Link: https://lore.kernel.org/r/20220217075231.1140-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-17 07:54:03 -07:00
Willem de Bruijn
0b0dff5b3b ipv6: per-netns exclusive flowlabel checks
Ipv6 flowlabels historically require a reservation before use.
Optionally in exclusive mode (e.g., user-private).

Commit 59c820b231 ("ipv6: elide flowlabel check if no exclusive
leases exist") introduced a fastpath that avoids this check when no
exclusive leases exist in the system, and thus any flowlabel use
will be granted.

That allows skipping the control operation to reserve a flowlabel
entirely. Though with a warning if the fast path fails:

  This is an optimization. Robust applications still have to revert to
  requesting leases if the fast path fails due to an exclusive lease.

Still, this is subtle. Better isolate network namespaces from each
other. Flowlabels are per-netns. Also record per-netns whether
exclusive leases are in use. Then behavior does not change based on
activity in other netns.

Changes
  v2
    - wrap in IS_ENABLED(CONFIG_IPV6) to avoid breakage if disabled

Fixes: 59c820b231 ("ipv6: elide flowlabel check if no exclusive leases exist")
Link: https://lore.kernel.org/netdev/MWHPR2201MB1072BCCCFCE779E4094837ACD0329@MWHPR2201MB1072.namprd22.prod.outlook.com/
Reported-by: Congyu Liu <liu3101@purdue.edu>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Tested-by: Congyu Liu <liu3101@purdue.edu>
Link: https://lore.kernel.org/r/20220215160037.1976072-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-16 20:37:47 -08:00
Linus Torvalds
c24449b321 hyperv-fixes for 5.17-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmILjZgTHHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXuPAB/96ls6ZaUrvAEseODu9N+qbCKCjqjGx
 LBqP5gzzG71LoFLlvvGp7quwDXkSlfzMmZM1oxBrsRN4TiexdQ4b9tR2jNqxUBS0
 pR7efRpR4YN1TrvxCr0lgfbKV6F3EC4YqMwK2Tf7zMRkzAbMQmw500JTkAdtplUT
 opSZKlWt1Q66Iudz7vwz2e2D/NnAg03icB7lZUNulV1WTDfDfBUZyAphAnOqpty9
 Y9UlVjFYSozOA6dv3HClYiJPjxhzUev88W/Sld81+2+gGCngWRT4a2OyUvCqNWXD
 P8EswLHuqECpkz6uxnbQOLDBgdiMXigtANNky/E4Za8r9MqpZT3+ueLf
 =/jBG
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-fixes-signed-20220215' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv fixes from Wei Liu:

 - Rework use of DMA_BIT_MASK in vmbus to work around a clang bug
   (Michael Kelley)

 - Fix NUMA topology (Long Li)

 - Fix a memory leak in vmbus (Miaoqian Lin)

 - One minor clean-up patch (Cai Huoqing)

* tag 'hyperv-fixes-signed-20220215' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  Drivers: hv: utils: Make use of the helper macro LIST_HEAD()
  Drivers: hv: vmbus: Rework use of DMA_BIT_MASK(64)
  Drivers: hv: vmbus: Fix memory leak in vmbus_add_channel_kobj
  PCI: hv: Fix NUMA node assignment when kernel boots with custom NUMA topology
2022-02-15 09:05:01 -08:00
Eric Dumazet
9ceaf6f76b bonding: fix data-races around agg_select_timer
syzbot reported that two threads might write over agg_select_timer
at the same time. Make agg_select_timer atomic to fix the races.

BUG: KCSAN: data-race in bond_3ad_initiate_agg_selection / bond_3ad_state_machine_handler

read to 0xffff8881242aea90 of 4 bytes by task 1846 on cpu 1:
 bond_3ad_state_machine_handler+0x99/0x2810 drivers/net/bonding/bond_3ad.c:2317
 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
 worker_thread+0x616/0xa70 kernel/workqueue.c:2454
 kthread+0x1bf/0x1e0 kernel/kthread.c:377
 ret_from_fork+0x1f/0x30

write to 0xffff8881242aea90 of 4 bytes by task 25910 on cpu 0:
 bond_3ad_initiate_agg_selection+0x18/0x30 drivers/net/bonding/bond_3ad.c:1998
 bond_open+0x658/0x6f0 drivers/net/bonding/bond_main.c:3967
 __dev_open+0x274/0x3a0 net/core/dev.c:1407
 dev_open+0x54/0x190 net/core/dev.c:1443
 bond_enslave+0xcef/0x3000 drivers/net/bonding/bond_main.c:1937
 do_set_master net/core/rtnetlink.c:2532 [inline]
 do_setlink+0x94f/0x2500 net/core/rtnetlink.c:2736
 __rtnl_newlink net/core/rtnetlink.c:3414 [inline]
 rtnl_newlink+0xfeb/0x13e0 net/core/rtnetlink.c:3529
 rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5594
 netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494
 rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg net/socket.c:725 [inline]
 ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
 ___sys_sendmsg net/socket.c:2467 [inline]
 __sys_sendmsg+0x195/0x230 net/socket.c:2496
 __do_sys_sendmsg net/socket.c:2505 [inline]
 __se_sys_sendmsg net/socket.c:2503 [inline]
 __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0x00000050 -> 0x0000004f

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 25910 Comm: syz-executor.1 Tainted: G        W         5.17.0-rc4-syzkaller-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-15 14:35:18 +00:00
Eric Dumazet
5891cd5ec4 net_sched: add __rcu annotation to netdev->qdisc
syzbot found a data-race [1] which lead me to add __rcu
annotations to netdev->qdisc, and proper accessors
to get LOCKDEP support.

[1]
BUG: KCSAN: data-race in dev_activate / qdisc_lookup_rcu

write to 0xffff888168ad6410 of 8 bytes by task 13559 on cpu 1:
 attach_default_qdiscs net/sched/sch_generic.c:1167 [inline]
 dev_activate+0x2ed/0x8f0 net/sched/sch_generic.c:1221
 __dev_open+0x2e9/0x3a0 net/core/dev.c:1416
 __dev_change_flags+0x167/0x3f0 net/core/dev.c:8139
 rtnl_configure_link+0xc2/0x150 net/core/rtnetlink.c:3150
 __rtnl_newlink net/core/rtnetlink.c:3489 [inline]
 rtnl_newlink+0xf4d/0x13e0 net/core/rtnetlink.c:3529
 rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5594
 netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494
 rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg net/socket.c:725 [inline]
 ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
 ___sys_sendmsg net/socket.c:2467 [inline]
 __sys_sendmsg+0x195/0x230 net/socket.c:2496
 __do_sys_sendmsg net/socket.c:2505 [inline]
 __se_sys_sendmsg net/socket.c:2503 [inline]
 __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff888168ad6410 of 8 bytes by task 13560 on cpu 0:
 qdisc_lookup_rcu+0x30/0x2e0 net/sched/sch_api.c:323
 __tcf_qdisc_find+0x74/0x3a0 net/sched/cls_api.c:1050
 tc_del_tfilter+0x1c7/0x1350 net/sched/cls_api.c:2211
 rtnetlink_rcv_msg+0x5ba/0x7e0 net/core/rtnetlink.c:5585
 netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494
 rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:705 [inline]
 sock_sendmsg net/socket.c:725 [inline]
 ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
 ___sys_sendmsg net/socket.c:2467 [inline]
 __sys_sendmsg+0x195/0x230 net/socket.c:2496
 __do_sys_sendmsg net/socket.c:2505 [inline]
 __se_sys_sendmsg net/socket.c:2503 [inline]
 __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0xffffffff85dee080 -> 0xffff88815d96ec00

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 13560 Comm: syz-executor.2 Not tainted 5.17.0-rc3-syzkaller-00116-gf1baf68e1383-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: 470502de5b ("net: sched: unlock rules update API")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14 13:36:36 +00:00
Vladimir Oltean
a2614140dc net: dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN
mv88e6xxx is special among DSA drivers in that it requires the VTU to
contain the VID of the FDB entry it modifies in
mv88e6xxx_port_db_load_purge(), otherwise it will return -EOPNOTSUPP.

Sometimes due to races this is not always satisfied even if external
code does everything right (first deletes the FDB entries, then the
VLAN), because DSA commits to hardware FDB entries asynchronously since
commit c9eb3e0f87 ("net: dsa: Add support for learning FDB through
notification").

Therefore, the mv88e6xxx driver must close this race condition by
itself, by asking DSA to flush the switchdev workqueue of any FDB
deletions in progress, prior to exiting a VLAN.

Fixes: c9eb3e0f87 ("net: dsa: Add support for learning FDB through notification")
Reported-by: Rafael Richter <rafael.richter@gin.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14 13:31:12 +00:00
Ignat Korchagin
26394fc118 ipv6: mcast: use rcu-safe version of ipv6_get_lladdr()
Some time ago 8965779d2c ("ipv6,mcast: always hold idev->lock before mca_lock")
switched ipv6_get_lladdr() to __ipv6_get_lladdr(), which is rcu-unsafe
version. That was OK, because idev->lock was held for these codepaths.

In 88e2ca3080 ("mld: convert ifmcaddr6 to RCU") these external locks were
removed, so we probably need to restore the original rcu-safe call.

Otherwise, we occasionally get a machine crashed/stalled with the following
in dmesg:

[ 3405.966610][T230589] general protection fault, probably for non-canonical address 0xdead00000000008c: 0000 [#1] SMP NOPTI
[ 3405.982083][T230589] CPU: 44 PID: 230589 Comm: kworker/44:3 Tainted: G           O      5.15.19-cloudflare-2022.2.1 #1
[ 3405.998061][T230589] Hardware name: SUPA-COOL-SERV
[ 3406.009552][T230589] Workqueue: mld mld_ifc_work
[ 3406.017224][T230589] RIP: 0010:__ipv6_get_lladdr+0x34/0x60
[ 3406.025780][T230589] Code: 57 10 48 83 c7 08 48 89 e5 48 39 d7 74 3e 48 8d 82 38 ff ff ff eb 13 48 8b 90 d0 00 00 00 48 8d 82 38 ff ff ff 48 39 d7 74 22 <66> 83 78 32 20 77 1b 75 e4 89 ca 23 50 2c 75 dd 48 8b 50 08 48 8b
[ 3406.055748][T230589] RSP: 0018:ffff94e4b3fc3d10 EFLAGS: 00010202
[ 3406.065617][T230589] RAX: dead00000000005a RBX: ffff94e4b3fc3d30 RCX: 0000000000000040
[ 3406.077477][T230589] RDX: dead000000000122 RSI: ffff94e4b3fc3d30 RDI: ffff8c3a31431008
[ 3406.089389][T230589] RBP: ffff94e4b3fc3d10 R08: 0000000000000000 R09: 0000000000000000
[ 3406.101445][T230589] R10: ffff8c3a31430000 R11: 000000000000000b R12: ffff8c2c37887100
[ 3406.113553][T230589] R13: ffff8c3a39537000 R14: 00000000000005dc R15: ffff8c3a31431000
[ 3406.125730][T230589] FS:  0000000000000000(0000) GS:ffff8c3b9fc80000(0000) knlGS:0000000000000000
[ 3406.138992][T230589] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3406.149895][T230589] CR2: 00007f0dfea1db60 CR3: 000000387b5f2000 CR4: 0000000000350ee0
[ 3406.162421][T230589] Call Trace:
[ 3406.170235][T230589]  <TASK>
[ 3406.177736][T230589]  mld_newpack+0xfe/0x1a0
[ 3406.186686][T230589]  add_grhead+0x87/0xa0
[ 3406.195498][T230589]  add_grec+0x485/0x4e0
[ 3406.204310][T230589]  ? newidle_balance+0x126/0x3f0
[ 3406.214024][T230589]  mld_ifc_work+0x15d/0x450
[ 3406.223279][T230589]  process_one_work+0x1e6/0x380
[ 3406.232982][T230589]  worker_thread+0x50/0x3a0
[ 3406.242371][T230589]  ? rescuer_thread+0x360/0x360
[ 3406.252175][T230589]  kthread+0x127/0x150
[ 3406.261197][T230589]  ? set_kthread_struct+0x40/0x40
[ 3406.271287][T230589]  ret_from_fork+0x22/0x30
[ 3406.280812][T230589]  </TASK>
[ 3406.288937][T230589] Modules linked in: ... [last unloaded: kheaders]
[ 3406.476714][T230589] ---[ end trace 3525a7655f2f3b9e ]---

Fixes: 88e2ca3080 ("mld: convert ifmcaddr6 to RCU")
Reported-by: David Pinilla Caparros <dpini@cloudflare.com>
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14 13:30:37 +00:00
Halil Pasic
ddbd89deb7 swiotlb: fix info leak with DMA_FROM_DEVICE
The problem I'm addressing was discovered by the LTP test covering
cve-2018-1000204.

A short description of what happens follows:
1) The test case issues a command code 00 (TEST UNIT READY) via the SG_IO
   interface with: dxfer_len == 524288, dxdfer_dir == SG_DXFER_FROM_DEV
   and a corresponding dxferp. The peculiar thing about this is that TUR
   is not reading from the device.
2) In sg_start_req() the invocation of blk_rq_map_user() effectively
   bounces the user-space buffer. As if the device was to transfer into
   it. Since commit a45b599ad8 ("scsi: sg: allocate with __GFP_ZERO in
   sg_build_indirect()") we make sure this first bounce buffer is
   allocated with GFP_ZERO.
3) For the rest of the story we keep ignoring that we have a TUR, so the
   device won't touch the buffer we prepare as if the we had a
   DMA_FROM_DEVICE type of situation. My setup uses a virtio-scsi device
   and the  buffer allocated by SG is mapped by the function
   virtqueue_add_split() which uses DMA_FROM_DEVICE for the "in" sgs (here
   scatter-gather and not scsi generics). This mapping involves bouncing
   via the swiotlb (we need swiotlb to do virtio in protected guest like
   s390 Secure Execution, or AMD SEV).
4) When the SCSI TUR is done, we first copy back the content of the second
   (that is swiotlb) bounce buffer (which most likely contains some
   previous IO data), to the first bounce buffer, which contains all
   zeros.  Then we copy back the content of the first bounce buffer to
   the user-space buffer.
5) The test case detects that the buffer, which it zero-initialized,
  ain't all zeros and fails.

One can argue that this is an swiotlb problem, because without swiotlb
we leak all zeros, and the swiotlb should be transparent in a sense that
it does not affect the outcome (if all other participants are well
behaved).

Copying the content of the original buffer into the swiotlb buffer is
the only way I can think of to make swiotlb transparent in such
scenarios. So let's do just that if in doubt, but allow the driver
to tell us that the whole mapped buffer is going to be overwritten,
in which case we can preserve the old behavior and avoid the performance
impact of the extra bounce.

Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-14 10:22:28 +01:00
Linus Torvalds
42964a18f8 - Fix a case where objtool would mistakenly warn about instructions being unreachable
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmII/BUACgkQEsHwGGHe
 VUr5VBAAhK4/Hqe5wgbbKE2CwUfvCd7KBbKQb0FbBguO9Sg9NkQhD0ROE5qQCsRE
 AMBJKuC4AahoVIsPuJ2nr73iaM5UY3xx6Dth6qPnA7VEwpWM/h/A2t0D9bgm5PdR
 1FK5QnmIP+o/astYTvP8YpIlhRqHK97fZM0KYZX8SfLG2sbnYiBANj7YiwafNLQs
 7FR/J+LH2PRAU1sBrdrhvd/u4jvSTjcbutPEETGuTTMoPiJj4TGa0XyZNQR4/br+
 WTnbzLFJpRabBMVE2ELzbzSXnEeUZpSKe79G9LW5AFaGa9UpyooXVUn4PcPNXTia
 Q8zkMKusNFUlQc0pbcUxaS/g+UnC7koFn/XvPxp5k9tL7x4exq9hhOn/F+K9Ctuw
 jUVrg63+/VFYtMwbZpYd81k5I/rx+o7t8rkmUrk0Wz/gpE9CDgbjfhGyAFqmXAFU
 mGAGcFHbBaG6J5/XGqOGZR42yajlg9lwxpaj+taCtfbf46/48E6mTtLG2qQRDAqW
 QlGVS8H9t+vmwfO8oAt2tWwLTyZqt+6VmNTbwerKSEblEC+yB/lLO5AcWsnNwHVl
 9ZwSRTPw8ejkj/AdyoTSedMJHzcsAXT8PtIr2rSjuX1b8SubN8GmbsApyQmzV3G0
 n5DLTcKvpCPjtWjtpF44yjP1rfdDLFBIh+RYHF7iNPFo0uhQFOg=
 =+cTD
 -----END PGP SIGNATURE-----

Merge tag 'objtool_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fix from Borislav Petkov:
 "Fix a case where objtool would mistakenly warn about instructions
  being unreachable"

* tag 'objtool_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bug: Merge annotate_reachable() into _BUG_FLAGS() asm
2022-02-13 09:43:34 -08:00
Linus Torvalds
9917ff5f31 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "5 patches.

  Subsystems affected by this patch series: binfmt, procfs, and mm
  (vmscan, memcg, and kfence)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  kfence: make test case compatible with run time set sample interval
  mm: memcg: synchronize objcg lists with a dedicated spinlock
  mm: vmscan: remove deadlock due to throttling failing to make progress
  fs/proc: task_mmu.c: don't read mapcount for migration entry
  fs/binfmt_elf: fix PT_LOAD p_align values for loaders
2022-02-12 08:57:37 -08:00
Peng Liu
8913c61001 kfence: make test case compatible with run time set sample interval
The parameter kfence_sample_interval can be set via boot parameter and
late shell command, which is convenient for automated tests and KFENCE
parameter optimization.  However, KFENCE test case just uses
compile-time CONFIG_KFENCE_SAMPLE_INTERVAL, which will make KFENCE test
case not run as users desired.  Export kfence_sample_interval, so that
KFENCE test case can use run-time-set sample interval.

Link: https://lkml.kernel.org/r/20220207034432.185532-1-liupeng256@huawei.com
Signed-off-by: Peng Liu <liupeng256@huawei.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Christian Knig <christian.koenig@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-11 17:55:00 -08:00
Roman Gushchin
0764db9b49 mm: memcg: synchronize objcg lists with a dedicated spinlock
Alexander reported a circular lock dependency revealed by the mmap1 ltp
test:

  LOCKDEP_CIRCULAR (suite: ltp, case: mtest06 (mmap1))
          WARNING: possible circular locking dependency detected
          5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1 Not tainted
          ------------------------------------------------------
          mmap1/202299 is trying to acquire lock:
          00000001892c0188 (css_set_lock){..-.}-{2:2}, at: obj_cgroup_release+0x4a/0xe0
          but task is already holding lock:
          00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
          which lock already depends on the new lock.
          the existing dependency chain (in reverse order) is:
          -> #1 (&sighand->siglock){-.-.}-{2:2}:
                 __lock_acquire+0x604/0xbd8
                 lock_acquire.part.0+0xe2/0x238
                 lock_acquire+0xb0/0x200
                 _raw_spin_lock_irqsave+0x6a/0xd8
                 __lock_task_sighand+0x90/0x190
                 cgroup_freeze_task+0x2e/0x90
                 cgroup_migrate_execute+0x11c/0x608
                 cgroup_update_dfl_csses+0x246/0x270
                 cgroup_subtree_control_write+0x238/0x518
                 kernfs_fop_write_iter+0x13e/0x1e0
                 new_sync_write+0x100/0x190
                 vfs_write+0x22c/0x2d8
                 ksys_write+0x6c/0xf8
                 __do_syscall+0x1da/0x208
                 system_call+0x82/0xb0
          -> #0 (css_set_lock){..-.}-{2:2}:
                 check_prev_add+0xe0/0xed8
                 validate_chain+0x736/0xb20
                 __lock_acquire+0x604/0xbd8
                 lock_acquire.part.0+0xe2/0x238
                 lock_acquire+0xb0/0x200
                 _raw_spin_lock_irqsave+0x6a/0xd8
                 obj_cgroup_release+0x4a/0xe0
                 percpu_ref_put_many.constprop.0+0x150/0x168
                 drain_obj_stock+0x94/0xe8
                 refill_obj_stock+0x94/0x278
                 obj_cgroup_charge+0x164/0x1d8
                 kmem_cache_alloc+0xac/0x528
                 __sigqueue_alloc+0x150/0x308
                 __send_signal+0x260/0x550
                 send_signal+0x7e/0x348
                 force_sig_info_to_task+0x104/0x180
                 force_sig_fault+0x48/0x58
                 __do_pgm_check+0x120/0x1f0
                 pgm_check_handler+0x11e/0x180
          other info that might help us debug this:
           Possible unsafe locking scenario:
                 CPU0                    CPU1
                 ----                    ----
            lock(&sighand->siglock);
                                         lock(css_set_lock);
                                         lock(&sighand->siglock);
            lock(css_set_lock);
           *** DEADLOCK ***
          2 locks held by mmap1/202299:
           #0: 00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
           #1: 00000001892ad560 (rcu_read_lock){....}-{1:2}, at: percpu_ref_put_many.constprop.0+0x0/0x168
          stack backtrace:
          CPU: 15 PID: 202299 Comm: mmap1 Not tainted 5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1
          Hardware name: IBM 3906 M04 704 (LPAR)
          Call Trace:
            dump_stack_lvl+0x76/0x98
            check_noncircular+0x136/0x158
            check_prev_add+0xe0/0xed8
            validate_chain+0x736/0xb20
            __lock_acquire+0x604/0xbd8
            lock_acquire.part.0+0xe2/0x238
            lock_acquire+0xb0/0x200
            _raw_spin_lock_irqsave+0x6a/0xd8
            obj_cgroup_release+0x4a/0xe0
            percpu_ref_put_many.constprop.0+0x150/0x168
            drain_obj_stock+0x94/0xe8
            refill_obj_stock+0x94/0x278
            obj_cgroup_charge+0x164/0x1d8
            kmem_cache_alloc+0xac/0x528
            __sigqueue_alloc+0x150/0x308
            __send_signal+0x260/0x550
            send_signal+0x7e/0x348
            force_sig_info_to_task+0x104/0x180
            force_sig_fault+0x48/0x58
            __do_pgm_check+0x120/0x1f0
            pgm_check_handler+0x11e/0x180
          INFO: lockdep is turned off.

In this example a slab allocation from __send_signal() caused a
refilling and draining of a percpu objcg stock, resulted in a releasing
of another non-related objcg.  Objcg release path requires taking the
css_set_lock, which is used to synchronize objcg lists.

This can create a circular dependency with the sighandler lock, which is
taken with the locked css_set_lock by the freezer code (to freeze a
task).

In general it seems that using css_set_lock to synchronize objcg lists
makes any slab allocations and deallocation with the locked css_set_lock
and any intervened locks risky.

To fix the problem and make the code more robust let's stop using
css_set_lock to synchronize objcg lists and use a new dedicated spinlock
instead.

Link: https://lkml.kernel.org/r/Yfm1IHmoGdyUR81T@carbon.dhcp.thefacebook.com
Fixes: bf4f059954 ("mm: memcg/slab: obj_cgroup API")
Signed-off-by: Roman Gushchin <guro@fb.com>
Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-11 17:55:00 -08:00
Linus Torvalds
83e3966411 ARM: SoC fixes for 5.17
This is a fairly large set of bugfixes, most of which had
 been sent a while ago but only now made it into the soc tree:
 
 Maintainer file updates:
 
  - Claudiu Beznea now co-maintains the at91 soc family,
    replacing Ludovic Desroches.
 
  - Michael Walle maintains the sl28cpld drivers
 
  - Alain Volmat and Raphael Gallais-Pou take over some
    drivers for ST platforms
 
  - Alim Akhtar is an additional reviewer for Samsung platforms
 
 Code fixes:
 
  - Op-tee had a problem with object lifetime that needs
    a slightly complex fix, as well as another bug with
    error handling.
 
  - Several minor issues for the OMAP platform, including
    a regression with the timer
 
  - A Kconfig change to fix a build-time issue on Intel
    SoCFPGA
 
 Device tree fixes:
 
  - The Amlogic Meson platform fixes a boot regression on
    am1-odroid, a spurious interrupt, and a problem with
    reserved memory regions
 
  - In the i.MX platform, several bug fixes are needed to
    make devices work correctly: SD card detection,
    alarmtimer, and sound card on some board. One patch
    for the GPU got in there by accident and gets reverted
    again.
 
  - TI K3 needs a fix for J721S2 serial port numbers
 
  - ux500 needs a fix to mount the SD card as root on
    the Skomer phone.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmIG1ZEACgkQmmx57+YA
 GNl89A/9G+BMgx/uWwjYPrKqT3uYHfG6g1RJMnJ1c9R/K5NaKBUFJ5KPWaStfnII
 17HKcQ18ugGk2gGpFs0ir4upRReDeb3MbmGSpytU7GnNgtGqSJIMcczMVXirsrCd
 52N5FKasDZHBM4GYNyFwnZ/HUXyoSYPzt1pA9L9qKqYz3CS74DcgIkeqXR8J3KMn
 VZhU5uFxJAk82IDHkRMqXVWykXObphB26iqtRRrf9HmwNirOKR8BLBWfeWEBwxpt
 ecqJtQAoLzTZVBQu57Lel7QPUsJh3/xoSZlbUlhY8bUvGKHQvRlOx47x7kJ3AUNS
 5Hi7+PY3qrYG3/B1N9XSnFeiHGCAoVR6jHA8tPGIMsnMdJ8HywwUh8AI2if9I9Hk
 nqbeGRG3eOoGJZ1oiJ0YMSadU2FCUG2TxrbyI7JcHzyjNCAgkZ7mXawCHgqLry1S
 azHDdRlIQHtDCxLu/Fept0ujGVJe1PYLzAfyzezrcQrf7N1dnbgBbuvQt7bQ3PQD
 Xcsx9xwPD3kJeXeVB+gGyDjZTO4zpD59+P7DGwwRNKlVKtzLzbJHQ3/+/ajdKbHP
 JlWSlDAdLxt4MaeFf3fZHRtO4tS4nYRUg0CsLjwPxd6jGmFT4V5Dzj2nHGQIarpw
 THQt15abGikpVHTZNSvzOkr7rvJDDmHUniZS480KhxT6E2Aldw8=
 =bRzz
 -----END PGP SIGNATURE-----

Merge tag 'soc-fixes-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
 "This is a fairly large set of bugfixes, most of which had been sent a
  while ago but only now made it into the soc tree:

  Maintainer file updates:

   - Claudiu Beznea now co-maintains the at91 soc family, replacing
     Ludovic Desroches.

   - Michael Walle maintains the sl28cpld drivers

   - Alain Volmat and Raphael Gallais-Pou take over some drivers for ST
     platforms

   - Alim Akhtar is an additional reviewer for Samsung platforms

  Code fixes:

   - Op-tee had a problem with object lifetime that needs a slightly
     complex fix, as well as another bug with error handling.

   - Several minor issues for the OMAP platform, including a regression
     with the timer

   - A Kconfig change to fix a build-time issue on Intel SoCFPGA

  Device tree fixes:

   - The Amlogic Meson platform fixes a boot regression on am1-odroid, a
     spurious interrupt, and a problem with reserved memory regions

   - In the i.MX platform, several bug fixes are needed to make devices
     work correctly: SD card detection, alarmtimer, and sound card on
     some board. One patch for the GPU got in there by accident and gets
     reverted again.

   - TI K3 needs a fix for J721S2 serial port numbers

   - ux500 needs a fix to mount the SD card as root on the Skomer phone"

* tag 'soc-fixes-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (46 commits)
  Revert "arm64: dts: imx8mn-venice-gw7902: disable gpu"
  arm64: Remove ARCH_VULCAN
  MAINTAINERS: add myself as a maintainer for the sl28cpld
  MAINTAINERS: add IRC to ARM sub-architectures and Devicetree
  MAINTAINERS: arm: samsung: add Git tree and IRC
  ARM: dts: Fix boot regression on Skomer
  ARM: dts: spear320: Drop unused and undocumented 'irq-over-gpio' property
  soc: aspeed: lpc-ctrl: Block error printing on probe defer cases
  docs/ABI: testing: aspeed-uart-routing: Escape asterisk
  MAINTAINERS: update drm/stm drm/sti and cec/sti maintainers
  MAINTAINERS: Update Benjamin Gaignard maintainer status
  ARM: socfpga: fix missing RESET_CONTROLLER
  arm64: dts: meson-sm1-odroid: fix boot loop after reboot
  arm64: dts: meson-g12: drop BL32 region from SEI510/SEI610
  arm64: dts: meson-g12: add ATF BL32 reserved-memory region
  arm64: dts: meson-gx: add ATF BL32 reserved-memory region
  arm64: dts: meson-sm1-bananapi-m5: fix wrong GPIO domain for GPIOE_2
  arm64: dts: meson-sm1-odroid: use correct enable-gpio pin for tf-io regulator
  arm64: dts: meson-g12b-odroid-n2: fix typo 'dio2133'
  optee: use driver internal tee_context for some rpc
  ...
2022-02-11 13:40:03 -08:00
Yonghong Song
5eaed6eedb bpf: Fix a bpf_timer initialization issue
The patch in [1] intends to fix a bpf_timer related issue,
but the fix caused existing 'timer' selftest to fail with
hang or some random errors. After some debug, I found
an issue with check_and_init_map_value() in the hashtab.c.
More specifically, in hashtab.c, we have code
  l_new = bpf_map_kmalloc_node(&htab->map, ...)
  check_and_init_map_value(&htab->map, l_new...)
Note that bpf_map_kmalloc_node() does not do initialization
so l_new contains random value.

The function check_and_init_map_value() intends to zero the
bpf_spin_lock and bpf_timer if they exist in the map.
But I found bpf_spin_lock is zero'ed but bpf_timer is not zero'ed.
With [1], later copy_map_value() skips copying of
bpf_spin_lock and bpf_timer. The non-zero bpf_timer caused
random failures for 'timer' selftest.
Without [1], for both bpf_spin_lock and bpf_timer case,
bpf_timer will be zero'ed, so 'timer' self test is okay.

For check_and_init_map_value(), why bpf_spin_lock is zero'ed
properly while bpf_timer not. In bpf uapi header, we have
  struct bpf_spin_lock {
        __u32   val;
  };
  struct bpf_timer {
        __u64 :64;
        __u64 :64;
  } __attribute__((aligned(8)));

The initialization code:
  *(struct bpf_spin_lock *)(dst + map->spin_lock_off) =
      (struct bpf_spin_lock){};
  *(struct bpf_timer *)(dst + map->timer_off) =
      (struct bpf_timer){};
It appears the compiler has no obligation to initialize anonymous fields.
For example, let us use clang with bpf target as below:
  $ cat t.c
  struct bpf_timer {
        unsigned long long :64;
  };
  struct bpf_timer2 {
        unsigned long long a;
  };

  void test(struct bpf_timer *t) {
    *t = (struct bpf_timer){};
  }
  void test2(struct bpf_timer2 *t) {
    *t = (struct bpf_timer2){};
  }
  $ clang -target bpf -O2 -c -g t.c
  $ llvm-objdump -d t.o
   ...
   0000000000000000 <test>:
       0:       95 00 00 00 00 00 00 00 exit
   0000000000000008 <test2>:
       1:       b7 02 00 00 00 00 00 00 r2 = 0
       2:       7b 21 00 00 00 00 00 00 *(u64 *)(r1 + 0) = r2
       3:       95 00 00 00 00 00 00 00 exit

gcc11.2 does not have the above issue. But from
  INTERNATIONAL STANDARD ©ISO/IEC ISO/IEC 9899:201x
  Programming languages — C
  http://www.open-std.org/Jtc1/sc22/wg14/www/docs/n1547.pdf
  page 157:
  Except where explicitly stated otherwise, for the purposes of
  this subclause unnamed members of objects of structure and union
  type do not participate in initialization. Unnamed members of
  structure objects have indeterminate value even after initialization.

To fix the problem, let use memset for bpf_timer case in
check_and_init_map_value(). For consistency, memset is also
used for bpf_spin_lock case.

  [1] https://lore.kernel.org/bpf/20220209070324.1093182-2-memxor@gmail.com/

Fixes: 68134668c1 ("bpf: Add map side support for bpf timers.")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220211194953.3142152-1-yhs@fb.com
2022-02-11 13:21:47 -08:00
Kumar Kartikeya Dwivedi
a8abb0c3dc bpf: Fix crash due to incorrect copy_map_value
When both bpf_spin_lock and bpf_timer are present in a BPF map value,
copy_map_value needs to skirt both objects when copying a value into and
out of the map. However, the current code does not set both s_off and
t_off in copy_map_value, which leads to a crash when e.g. bpf_spin_lock
is placed in map value with bpf_timer, as bpf_map_update_elem call will
be able to overwrite the other timer object.

When the issue is not fixed, an overwriting can produce the following
splat:

[root@(none) bpf]# ./test_progs -t timer_crash
[   15.930339] bpf_testmod: loading out-of-tree module taints kernel.
[   16.037849] ==================================================================
[   16.038458] BUG: KASAN: user-memory-access in __pv_queued_spin_lock_slowpath+0x32b/0x520
[   16.038944] Write of size 8 at addr 0000000000043ec0 by task test_progs/325
[   16.039399]
[   16.039514] CPU: 0 PID: 325 Comm: test_progs Tainted: G           OE     5.16.0+ #278
[   16.039983] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[   16.040485] Call Trace:
[   16.040645]  <TASK>
[   16.040805]  dump_stack_lvl+0x59/0x73
[   16.041069]  ? __pv_queued_spin_lock_slowpath+0x32b/0x520
[   16.041427]  kasan_report.cold+0x116/0x11b
[   16.041673]  ? __pv_queued_spin_lock_slowpath+0x32b/0x520
[   16.042040]  __pv_queued_spin_lock_slowpath+0x32b/0x520
[   16.042328]  ? memcpy+0x39/0x60
[   16.042552]  ? pv_hash+0xd0/0xd0
[   16.042785]  ? lockdep_hardirqs_off+0x95/0xd0
[   16.043079]  __bpf_spin_lock_irqsave+0xdf/0xf0
[   16.043366]  ? bpf_get_current_comm+0x50/0x50
[   16.043608]  ? jhash+0x11a/0x270
[   16.043848]  bpf_timer_cancel+0x34/0xe0
[   16.044119]  bpf_prog_c4ea1c0f7449940d_sys_enter+0x7c/0x81
[   16.044500]  bpf_trampoline_6442477838_0+0x36/0x1000
[   16.044836]  __x64_sys_nanosleep+0x5/0x140
[   16.045119]  do_syscall_64+0x59/0x80
[   16.045377]  ? lock_is_held_type+0xe4/0x140
[   16.045670]  ? irqentry_exit_to_user_mode+0xa/0x40
[   16.046001]  ? mark_held_locks+0x24/0x90
[   16.046287]  ? asm_exc_page_fault+0x1e/0x30
[   16.046569]  ? asm_exc_page_fault+0x8/0x30
[   16.046851]  ? lockdep_hardirqs_on+0x7e/0x100
[   16.047137]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   16.047405] RIP: 0033:0x7f9e4831718d
[   16.047602] Code: b4 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b3 6c 0c 00 f7 d8 64 89 01 48
[   16.048764] RSP: 002b:00007fff488086b8 EFLAGS: 00000206 ORIG_RAX: 0000000000000023
[   16.049275] RAX: ffffffffffffffda RBX: 00007f9e48683740 RCX: 00007f9e4831718d
[   16.049747] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007fff488086d0
[   16.050225] RBP: 00007fff488086f0 R08: 00007fff488085d7 R09: 00007f9e4cb594a0
[   16.050648] R10: 0000000000000000 R11: 0000000000000206 R12: 00007f9e484cde30
[   16.051124] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   16.051608]  </TASK>
[   16.051762] ==================================================================

Fixes: 68134668c1 ("bpf: Add map side support for bpf timers.")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220209070324.1093182-2-memxor@gmail.com
2022-02-11 13:13:04 -08:00
Linus Torvalds
883fd0aba1 ACPI fixes for 5.17-rc4
- Revert a recent change that attempted to avoid issues with
    conflicting address ranges during PCI initialization, because it
    turned out to introduce a regression (Hans de Goede).
 
  - Revert a change that limited EC GPE wakeups from suspend-to-idle
    to systems based on Intel hardware, because it turned out that
    systems based on hardware from other vendors depended on that
    functionality too (Mario Limonciello).
 
  - Fix two issues related to the handling of wakeup interrupts and
    wakeup events signaled through the EC GPE during suspend-to-idle
    on x86 (Rafael Wysocki).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmIGkq4SHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxs3MQAKhmnM5mHOqFIM7VM9SoutEoOIXPWLCz
 CNjjqfwA63GvxTp//Dks4KGCLF90mfFOv8Mt7vlzVHyQDvTv2w0YddmBxQAgXTU4
 v1yfHcRssND1w2kgIg7wThWSxSKvP/4XgqwrhTmEKHb1qPFrpq9pt7+4RIco9362
 pacxW1QPX2+5yNDs67JuTsjh2mksKC8CUgiA8BUa+57jrIXvQYWqqZ490PpkbXW0
 Nh1naNwT1xDqte5U98PrzZZRt2qUMuKrG2ro0lE97rb067zSoIQo2XCODlo7T12O
 7vFzypOTvNJZjd57U9SoKyEDCzDnHkNh2O2jiNKaqzMJHqgh0bkMrMHysPbdTwMu
 VH0fw+VElRONyQ5knUcvTG3IRfEuTBl2iqDoO+hLb+cmkL48KL9yJXH7dTRMrNJ8
 zqHSCqpON8rZgLfDzrxoVvMXv9Al5ra5wM41EljiUWUDFEB1HbtleysHIMFMxEKy
 gBvO6zAyEoh5ie2pxYdSAyY+pq+d1JFeAgpbXpF3miQFE2gchC1BjNjUCguekfty
 t0kwOefLOphgShpIm/04F2WBsYNM+QSIoJWRvE5LsEVTdD/j59MweCTJN2KNSIpj
 vapl8ymAr6umMrPhxpjY6K4DD000+LYpV9WJ8Q1Wv2HWV9OkevL5KnEXoQ94czpb
 ybb/Nd4SwHV+
 =aGYp
 -----END PGP SIGNATURE-----

Merge tag 'acpi-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "These revert two commits that turned out to be problematic and fix two
  issues related to wakeup from suspend-to-idle on x86.

  Specifics:

   - Revert a recent change that attempted to avoid issues with
     conflicting address ranges during PCI initialization, because it
     turned out to introduce a regression (Hans de Goede).

   - Revert a change that limited EC GPE wakeups from suspend-to-idle to
     systems based on Intel hardware, because it turned out that systems
     based on hardware from other vendors depended on that functionality
     too (Mario Limonciello).

   - Fix two issues related to the handling of wakeup interrupts and
     wakeup events signaled through the EC GPE during suspend-to-idle on
     x86 (Rafael Wysocki)"

* tag 'acpi-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  x86/PCI: revert "Ignore E820 reservations for bridge windows on newer systems"
  PM: s2idle: ACPI: Fix wakeup interrupts handling
  ACPI: PM: s2idle: Cancel wakeup before dispatching EC GPE
  ACPI: PM: Revert "Only mark EC GPE for wakeup on Intel systems"
2022-02-11 11:48:13 -08:00
Linus Torvalds
f1baf68e13 Networking fixes for 5.17-rc4, including fixes from netfilter and can.
Current release - new code bugs:
 
  - sparx5: fix get_stat64 out-of-bound access and crash
 
  - smc: fix netdev ref tracker misuse
 
 Previous releases - regressions:
 
  - eth: ixgbevf: require large buffers for build_skb on 82599VF,
    avoid overflows
 
  - eth: ocelot: fix all IP traffic getting trapped to CPU with PTP
    over IP
 
  - bonding: fix rare link activation misses in 802.3ad mode
 
 Previous releases - always broken:
 
  - tcp: fix tcp sock mem accounting in zero-copy corner cases
 
  - remove the cached dst when uncloning an skb dst and its metadata,
    since we only have one ref it'd lead to an UaF
 
  - netfilter:
    - conntrack: don't refresh sctp entries in closed state
    - conntrack: re-init state for retransmitted syn-ack, avoid
      connection establishment getting stuck with strange stacks
    - ctnetlink: disable helper autoassign, avoid it getting lost
    - nft_payload: don't allow transport header access for fragments
 
  - dsa: fix use of devres for mdio throughout drivers
 
  - eth: amd-xgbe: disable interrupts during pci removal
 
  - eth: dpaa2-eth: unregister netdev before disconnecting the PHY
 
  - eth: ice: fix IPIP and SIT TSO offload
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmIFcW0ACgkQMUZtbf5S
 IrsloQ/+LrzlXgYrWf60DEFT4+AfJ20YxeakB+SpT0cZdvylU6NHF/U4rSHdUJRn
 xZiHfhfKzNWV5miJ2wzOuvufUvoz173dtrdgJJmp6G43qfAFyqiqtxrbVuuTQk0G
 C2i5M66zJ2svSj3EO5dYKV8zeydd14eVeoaJqfN5rjMARTmvpT/ssdzw0LTf0aXp
 87CF/WyeH8NyfQxQwPmbGRxDpnV2RqDJYSNdA4kOtDrnQmoKet32rhE6liHeP2jD
 OFQo70QXEMVyIZEh4wT17lMqA4M1zIEQtgrB0NsmVNU3jQnDI4UQ0aA2gHzwK31S
 glORZWGqTGGrhVy9uQBGKUK29RW8vb0D2ojZpi3zd0htWpIqpfFf6wuMAdUwEZag
 mTlZ1Yi6YgzAQEdSALoXLKps1GiHQzQNYPK6fNxoSOxnmoQTMQL9KxU/HB9d8W+L
 hjuYOGDw9vtGyUpNF7lktCQR/sWjFCmevLk97d2mhofFqfHBJYzFHGyd0GgNVzgl
 o3CMWnyiqTVapLRMTFPnEADEXGWq4DYAjRPkfjDCHeITB9Neg9S85Geth+xwfdi1
 cwSu64xFb7k7hiAbnymB3xg5sBQjXHrIQo03GHVymQE5p6XxIhn+5UD9CYHGJ2TF
 7PGyHeJJOZyWGQ/iSZJbo+BpsVGthIgk+9tTKPeTToej0tic27w=
 =jlhJ
 -----END PGP SIGNATURE-----

Merge tag 'net-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter and can.

Current release - new code bugs:

   - sparx5: fix get_stat64 out-of-bound access and crash

   - smc: fix netdev ref tracker misuse

  Previous releases - regressions:

   - eth: ixgbevf: require large buffers for build_skb on 82599VF, avoid
     overflows

   - eth: ocelot: fix all IP traffic getting trapped to CPU with PTP
     over IP

   - bonding: fix rare link activation misses in 802.3ad mode

  Previous releases - always broken:

   - tcp: fix tcp sock mem accounting in zero-copy corner cases

   - remove the cached dst when uncloning an skb dst and its metadata,
     since we only have one ref it'd lead to an UaF

   - netfilter:
      - conntrack: don't refresh sctp entries in closed state
      - conntrack: re-init state for retransmitted syn-ack, avoid
        connection establishment getting stuck with strange stacks
      - ctnetlink: disable helper autoassign, avoid it getting lost
      - nft_payload: don't allow transport header access for fragments

   - dsa: fix use of devres for mdio throughout drivers

   - eth: amd-xgbe: disable interrupts during pci removal

   - eth: dpaa2-eth: unregister netdev before disconnecting the PHY

   - eth: ice: fix IPIP and SIT TSO offload"

* tag 'net-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits)
  net: dsa: mv88e6xxx: fix use-after-free in mv88e6xxx_mdios_unregister
  net: mscc: ocelot: fix mutex lock error during ethtool stats read
  ice: Avoid RTNL lock when re-creating auxiliary device
  ice: Fix KASAN error in LAG NETDEV_UNREGISTER handler
  ice: fix IPIP and SIT TSO offload
  ice: fix an error code in ice_cfg_phy_fec()
  net: mpls: Fix GCC 12 warning
  dpaa2-eth: unregister the netdev before disconnecting from the PHY
  skbuff: cleanup double word in comment
  net: macb: Align the dma and coherent dma masks
  mptcp: netlink: process IPv6 addrs in creating listening sockets
  selftests: mptcp: add missing join check
  net: usb: qmi_wwan: Add support for Dell DW5829e
  vlan: move dev_put into vlan_dev_uninit
  vlan: introduce vlan_dev_free_egress_priority
  ax25: fix UAF bugs of net_device caused by rebinding operation
  net: dsa: fix panic when DSA master device unbinds on shutdown
  net: amd-xgbe: disable interrupts during pci removal
  tipc: rate limit warning for received illegal binding update
  net: mdio: aspeed: Add missing MODULE_DEVICE_TABLE
  ...
2022-02-10 16:01:22 -08:00
Linus Torvalds
f4bc5bbb5f Notable bug fixes:
Ensure that NFS clients cannot send file size or offset values that
 can cause the NFS server to crash or to return incorrect or
 surprising results. In particular, fix how the NFS server handles
 values larger than OFFSET_MAX.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmIDzr4ACgkQM2qzM29m
 f5eJRBAAikdh0PYOlZbvy9M1eY6wq3k+Y10JsnCZk4T8Uq0NJF/7CJ3R4/4+xGOh
 ZA/2vE1dN4IfqnIOdxw1cXbzzgAO5p/nDLMo9wC6NimrVLkE+S8j38oWvEHOCJXC
 TzUbIKkxqBBcfDw4pO4BT42iHx+cqVUuRFd2qkob1ZRoe+BKI+F4+7QNVc8iEw5z
 j85i2/h6JohsItzekRbMO1q1iXxBc+IZRYafjibtVRWxRuNUWP8C1cv0eXrlSy3O
 L07kZRwzrd52PAi1Q8K07Ip+yTHUMZptyHoB6S863uuz/mOzlpXewvXHMGA1btlr
 POHYG/lBXpDS0e2pjksyXXp2I7HJV/HuaMyyLveWRO0qleBc3G5PsvIJNBW7xl5f
 NPGdgfaa+8ZeOCGolvPruykL9Eh7QAyWTdPKz1J+NuhjkAB4p6ba9QcKVwP7kYTi
 I8zdeUPgbjuFW35hal0ZIlNi2RfcuSGk1FKjotrQ6J3XNIaqPkUWK+1Zz3MzqPUW
 +1ElzoXQugJASPBkEZuf1aXr8/vRjKT16l8EX1kbtJ5wjj2OPbnWWZk03ZncLVfv
 CzbJTZLqiM0JuRqXvYpUGAQdryWcwvTCAuWxcqrt4ALNWW6Z4Y35Vl8H4sTh8wkr
 Q3m6bAVYJx3FmFop7y5ubVH137k1SFJ0NzGJJK0mYoZQSMZoPZI=
 =64n/
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull more nfsd fixes from Chuck Lever:
 "Ensure that NFS clients cannot send file size or offset values that
  can cause the NFS server to crash or to return incorrect or surprising
  results.

  In particular, fix how the NFS server handles values larger than
  OFFSET_MAX"

* tag 'nfsd-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: Deprecate NFS_OFFSET_MAX
  NFSD: Fix offset type in I/O trace points
  NFSD: COMMIT operations must not return NFS?ERR_INVAL
  NFSD: Clamp WRITE offsets
  NFSD: Fix NFSv3 SETATTR/CREATE's handling of large file sizes
  NFSD: Fix ia_size underflow
  NFSD: Fix the behavior of READ near OFFSET_MAX
2022-02-09 09:56:57 -08:00
Chuck Lever
c306d73769 NFSD: Deprecate NFS_OFFSET_MAX
NFS_OFFSET_MAX was introduced way back in Linux v2.3.y before there
was a kernel-wide OFFSET_MAX value. As a clean up, replace the last
few uses of it with its generic equivalent, and get rid of it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-09 09:24:40 -05:00
Antoine Tenart
9eeabdf17f net: fix a memleak when uncloning an skb dst and its metadata
When uncloning an skb dst and its associated metadata, a new
dst+metadata is allocated and later replaces the old one in the skb.
This is helpful to have a non-shared dst+metadata attached to a specific
skb.

The issue is the uncloned dst+metadata is initialized with a refcount of
1, which is increased to 2 before attaching it to the skb. When
tun_dst_unclone returns, the dst+metadata is only referenced from a
single place (the skb) while its refcount is 2. Its refcount will never
drop to 0 (when the skb is consumed), leading to a memory leak.

Fix this by removing the call to dst_hold in tun_dst_unclone, as the
dst+metadata refcount is already 1.

Fixes: fc4099f172 ("openvswitch: Fix egress tunnel info.")
Cc: Pravin B Shelar <pshelar@ovn.org>
Reported-by: Vlad Buslov <vladbu@nvidia.com>
Tested-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09 11:41:47 +00:00
Antoine Tenart
cfc56f85e7 net: do not keep the dst cache when uncloning an skb dst and its metadata
When uncloning an skb dst and its associated metadata a new dst+metadata
is allocated and the tunnel information from the old metadata is copied
over there.

The issue is the tunnel metadata has references to cached dst, which are
copied along the way. When a dst+metadata refcount drops to 0 the
metadata is freed including the cached dst entries. As they are also
referenced in the initial dst+metadata, this ends up in UaFs.

In practice the above did not happen because of another issue, the
dst+metadata was never freed because its refcount never dropped to 0
(this will be fixed in a subsequent patch).

Fix this by initializing the dst cache after copying the tunnel
information from the old metadata to also unshare the dst cache.

Fixes: d71785ffc7 ("net: add dst_cache to ovs vxlan lwtunnel")
Cc: Paolo Abeni <pabeni@redhat.com>
Reported-by: Vlad Buslov <vladbu@nvidia.com>
Tested-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09 11:41:47 +00:00