-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmaVN3MACgkQxWXV+ddt
WDtpIRAAl+1NjsEj8e5V/UYn8Jr06ujTOnrkR3PCTICxDHbUaMLkQEw21H0K/ogQ
3fOiEVpSlZOfKdYXtXaMQbC0jd/Af2eA10Uht96nAEjAtxu1uJ4cFZGu2meNdXZP
xUioivJ/CElMPH2aluG6FaQvUTqmhrEr8tSoYbxzQmUd434q9kqqyjtw1tfzYDG1
VDn2f7ykhpB/8P0aoqgWSshWTmaCzG0GkuI28o1o0iZUIF/P9TKdzxlLRW6BVHE7
T2oGLEQjN1GQbCH75L4IeNJDkCBVfcDcbZkUDJ/ae4Pt/jJQTFY53YIP9wXFZQnd
mdfHmK7Atpsk75ATftYSq+ENkbQ5fsuut5CD63u54gAqA4M1FncDXTAWS1Y30F76
P8juSCmsSy0o3gTflDIo/IMdntoh/JmncwwStF6oKzmyUZZzzarsqM8mc1P03ZNt
3ttlnbY7lC1TDAlD5J2wXE0INCT2pN+4C9IToWdRypeuLu6qrI7cQ0oylyp9OVQM
t9umTXm0B6s1cyqEDjJf0xJZS/JTHYwu7S4EmAJwicgiLpOjABVTmO8021rVmDJy
TAUu6yEhSsrTT6Dxm7/2Et1EEOKFF5hhsG1SiGD9oUIZK6B5+0waT+rbkEWl7osR
4/TAv2zX6tuCc7HIW0fQloM/6/Gyd5wcDVaQNDUzFA075uKstwY=
=k5d3
-----END PGP SIGNATURE-----
Merge tag 'for-6.11-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"The highlights are new logic behind background block group reclaim,
automatic removal of qgroup after removing a subvolume and new
'rescue=' mount options.
The rest is optimizations, cleanups and refactoring.
User visible features:
- dynamic block group reclaim:
- tunable framework to avoid situations where eager data
allocations prevent creating new metadata chunks due to lack of
unallocated space
- reuse sysfs knob bg_reclaim_threshold (otherwise used only in
zoned mode) for a fixed value threshold
- new on/off sysfs knob "dynamic_reclaim" calculating the value
based on heuristics, aiming to keep spare working space for
relocating chunks but not to needlessly relocate partially
utilized block groups or reclaim newly allocated ones
- stats are exported in sysfs per block group type, files
"reclaim_*"
- this may increase IO load at unexpected times but the corner
case of no allocatable block groups is known to be worse
- automatically remove qgroup of deleted subvolumes:
- adjust qgroup removal conditions, make sure all related
subvolume data are already removed, or return EBUSY, also take
into account setting of sysfs drop_subtree_threshold
- also works in squota mode
- mount option updates: new modes of 'rescue=' that allow to mount
images (read-only) that could have been partially converted by user
space tools
- ignoremetacsums - invalid metadata checksums are ignored
- ignoresuperflags - super block flags that track conversion in
progress (like UUID or checksums)
Core:
- size of struct btrfs_inode is now below 1024 (on a release config),
improved memory packing and other secondary effects
- switch tracking of open inodes from rb-tree to xarray, minor
performance improvement
- reduce number of empty transaction commits when there are no dirty
data/metadata
- memory allocation optimizations (reduced numbers, reordering out of
critical sections)
- extent map structure optimizations and refactoring, more sanity
checks
- more subpage in zoned mode preparations or fixes
- general snapshot code cleanups, improvements and documentation
- tree-checker updates: more file extent ram_bytes fixes, continued
- raid-stripe-tree update (not backward compatible):
- remove extent encoding field from the structure, can be inferred
from other information
- requires btrfs-progs 6.9.1 or newer
- cleanups and refactoring
- error message updates
- error handling improvements
- return type and parameter cleanups and improvements"
* tag 'for-6.11-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (152 commits)
btrfs: fix extent map use-after-free when adding pages to compressed bio
btrfs: fix bitmap leak when loading free space cache on duplicate entry
btrfs: remove the BUG_ON() inside extent_range_clear_dirty_for_io()
btrfs: move extent_range_clear_dirty_for_io() into inode.c
btrfs: enhance compression error messages
btrfs: fix data race when accessing the last_trans field of a root
btrfs: rename the extra_gfp parameter of btrfs_alloc_page_array()
btrfs: remove the extra_gfp parameter from btrfs_alloc_folio_array()
btrfs: introduce new "rescue=ignoresuperflags" mount option
btrfs: introduce new "rescue=ignoremetacsums" mount option
btrfs: output the unrecognized super block flags as hex
btrfs: remove unused Opt enums
btrfs: tree-checker: add extra ram_bytes and disk_num_bytes check
btrfs: fix the ram_bytes assignment for truncated ordered extents
btrfs: make validate_extent_map() catch ram_bytes mismatch
btrfs: ignore incorrect btrfs_file_extent_item::ram_bytes
btrfs: cleanup the bytenr usage inside btrfs_extent_item_to_extent_map()
btrfs: fix typo in error message in btrfs_validate_super()
btrfs: move the direct IO code into its own file
btrfs: pass a btrfs_inode to btrfs_set_prop()
...
- Revise the glock reference counting model
- Several quota related fixes
- Clean up the glock demote logic
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmaVQ4gUHGFncnVlbmJh
QHJlZGhhdC5jb20ACgkQ1b+f6wMTZToLog//T6Iljxro4CMkNvGJx2B3puo2rtbd
mToxam0ZTkE/xXcxwRJDMFjxLdQ74xtiZLFJF8l/OwpiUpkKjh+hXdH4IABZG9Xm
hNSvYBFiUCt86pcDKc/ia7dH/xSBN3nH1IpNtr6dCFBXHkc1tK1v+QJ+RnFDZ9Re
kgMMYjmGKRfBRuR+r0uxF0V09jQYHmQ5K/o4arF5NX6ifUKX0tnnr8wB8bfXCznp
uXG6Jf8TWSGDcJI+phi7o5tNUN4187RRlODPewHsBmS0bdZla5buu5q9ATBDk1Ca
Btst+Oa6uIc6MHv9e9e59mVIp1NScYNnfDedFLLfxigskcGo2f7kaTlTNmccrMrm
sQNyVhWG5zlUJL7OcdonKP64XAJZbFt5I29RJOiqY1Z4OxCB3OH7Rl7MhbCATq/o
6jjN6+1DOGDKy3vbxIwaIsjC1E9H5hzIsmbzjEya0TpjrHENFPOCAFvWH2PdV2yc
hNhAiIKn6ihZs5QiXDGs9F46Qxb8C4nMDI/UAm2S7qAABlsD7m34PIZcfsVd+ouF
ySZOhf2xfLFovk1+QqAzaOGxtxUHqNkUhpakKPcjocb2NIdwUo1k8QM+plPOvnrm
1arppql5u1DAdzAvKI2LLjeKd64Wl8TG7SJuQQq8PiBMSa/MRSlKB5N3LFiqnbzC
iM/MYm1dPQ088wA=
=BfXp
-----END PGP SIGNATURE-----
Merge tag 'gfs2-v6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
"Fixes and cleanups:
- Revise the glock reference counting model and LRU list handling to
be more sensible
- Several quota related fixes: clean up the quota code, add some
missing locking, and work around the on-disk corruption that the
reverted patch "gfs2: ignore negated quota changes" causes
- Clean up the glock demote logic in glock_work_func()"
* tag 'gfs2-v6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (29 commits)
gfs2: Clean up glock demote logic
gfs2: Revert "check for no eligible quota changes"
gfs2: Be more careful with the quota sync generation
gfs2: Get rid of some unnecessary quota locking
gfs2: Add some missing quota locking
gfs2: Fold qd_fish into gfs2_quota_sync
gfs2: quota need_sync cleanup
gfs2: Fix and clean up function do_qc
gfs2: Revert "Add quota_change type"
gfs2: Revert "ignore negated quota changes"
gfs2: qd_check_sync cleanups
gfs2: Revert "introduce qd_bh_get_or_undo"
gfs2: Check quota consistency on mount
gfs2: Minor gfs2_quota_init error path cleanup
gfs2: Get rid of demote_ok checks
Revert "GFS2: Don't add all glocks to the lru"
gfs2: Revise glock reference counting model
gfs2: Switch to a per-filesystem glock workqueue
gfs2: Report when glocks cannot be freed for a long time
gfs2: gfs2_glock_get cleanup
...
- New flag DLM_LSFL_SOFTIRQ_SAFE can be set by code using dlm
to indicate callbacks can be run from softirq.
- Change md-cluster to set DLM_LSFL_SOFTIRQ_SAFE.
- Clean up for previous changes, e.g. unused code and parameters.
- Remove custom pre-allocation of rsb structs which is unnecessary
with kmem caches.
- Change idr to xarray for lkb structs in use.
- Change idr to xarray for rsb structs being recovered.
- Change outdated naming related to internal rsb states.
- Fix some incorrect add/remove of rsb on scan list.
- Use rcu to free rsb structs.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEcGkeEvkvjdvlR90nOBtzx/yAaaoFAmaVUlgACgkQOBtzx/yA
aapCfQ//eqs19no6+TUagkzboIGxGbrPEqmJNj4Vu1sCSH3tVC4IrkI2IqqPJL9N
tYHUQvp3BYOdenBZzw6tmbs6cvoA7Fps7YMqqkEKYfBCHcV9KtejqvwBdJfqiN6A
RniImAph0qvvI6GK4Y+6nDyxU2n8enOhgnZMRDUS/KYV8frc70SxreqzPSkPMWLh
ZnDgTIF4zahUBFEkILlXYArbbRk5FKL+SMkSDZyDd78bVnjX24KgtOt7HpDX9X70
/9DrDz3uI+XShXzpIint4Ee4ghZr1lM9g9LXDazuY62SBDknhGTzY0BYVxZ2U3NG
ocUh2KbJoP29sncNxLf9Nev5JPc+Wx3iCTEgLKkOEc4Yf0jAZg+1xbopWDT+qjRV
djsgTCQ1gjpHgQxrlUUo7N5ilo5ocgSXSHGJ8b886tG5eZaxiN1y3TB4T4JtO+FH
Q4IkFJiaYDL44xYR85wpfOcct/5mR7kPvhuxouexKobO+lKXaUONP9Wj7pRgG/M5
qhrWY4EU8jcO/nPunPxvhJdL68T3WoHDN42tWt/7kYQqY2svvfmr6NEImde6GxqX
PB3hW20cvD4wULumLM+h0rQacIWuuMQ5ahIX9og6jM7Yx/ucks1pgnRo0M0R1aUc
OopoTAekSdRtgbRXr5IQPRxpKB6BFUp3Va/Yo+2g0fi5QywcVZc=
=dDCi
-----END PGP SIGNATURE-----
Merge tag 'dlm-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
- New flag DLM_LSFL_SOFTIRQ_SAFE can be set by code using dlm to
indicate callbacks can be run from softirq
- Change md-cluster to set DLM_LSFL_SOFTIRQ_SAFE
- Clean up for previous changes, e.g. unused code and parameters
- Remove custom pre-allocation of rsb structs which is unnecessary with
kmem caches
- Change idr to xarray for lkb structs in use
- Change idr to xarray for rsb structs being recovered
- Change outdated naming related to internal rsb states
- Fix some incorrect add/remove of rsb on scan list
- Use rcu to free rsb structs
* tag 'dlm-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
dlm: add rcu_barrier before destroy kmem cache
dlm: remove DLM_LSFL_SOFTIRQ from exflags
fs: dlm: remove unused struct 'dlm_processed_nodes'
md-cluster: use DLM_LSFL_SOFTIRQ for dlm_new_lockspace()
dlm: implement LSFL_SOFTIRQ_SAFE
dlm: introduce DLM_LSFL_SOFTIRQ_SAFE
dlm: use LSFL_FS to check for kernel lockspace
dlm: use rcu to avoid an extra rsb struct lookup
dlm: fix add_scan and del_scan usage
dlm: change list and timer names
dlm: move recover idr to xarray datastructure
dlm: move lkb idr to xarray datastructure
dlm: drop own rsb pre allocation mechanism
dlm: remove ls_local_handle from struct dlm_ls
dlm: remove unused parameter in dlm_midcomms_addr
dlm: don't kref_init rsbs created for toss list
dlm: remove scand leftovers
- More folio conversions for compressed inodes;
- Stream decompressor (LZMA/DEFLATE/ZSTD) cleanups;
- Minor tracepoint cleanup.
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmaUtrsRHHhpYW5nQGtl
cm5lbC5vcmcACgkQUXZn5Zlu5qpu6A//V8ZIE+yCuiqLI+VdTfb0nL1qKVrLUIAO
HffARXPiONyJqZtTFBo2FISkOouuSEYnrdApkDQ9V8suL+FlJ10FcRrP+Qnrvn/v
H6bKDQDvAWd27Z/jBpcma2b6tQG7GcmSaWC/eBdWEvQk+bgP1C/GD8bEDZkAAUIp
Ta5i349BtRlGaHoDRw+xsNUmzPxdC2GsTyBJXrYmhgyyD1QrdEMACkTMh/PjMV8h
LPVn5vo3Gf4tLpkOluqhFIjOkeEMw1LyICtvY23tweN2uzYJzEcmpmP8FjRXLmgQ
X9rTFJQ5OSWfNvstgGLncPvp2C3eE2IarYTNG20JWNJDIRmK3ae6RIRkMO4B4hXM
/cAkBF5eSJWRcTCPdY4T4XuWI2WdfpPMKBD4p/QZf7uyZoHnSRjY5D4fnkb/nLTh
sAgItTKmdPZ1GySVJf9EUzm4SWmxbTiCdgqEKCFn/RFbkbiK7xWbUaNcITY1FrOA
fgeeQe95//H90FiKfPvQK70vPcxxPfkLyxQNSQOXDZqiLzPNOvIDzro2pclk0qbk
/mcIG1JosMlVA/WeYoa0lulKxLzBykLBMM0fAFH1MNoTKFJhi9eMfO5jeUUj08wj
g51OmA2YPB4jAnvxW7Z77KRFo9V0bLqetFOJaxQpdFc9YKlfOph/XjVMq8xomTeZ
b2YHe51/BB0=
=LxxN
-----END PGP SIGNATURE-----
Merge tag 'erofs-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"Updates for folio conversions for compressed inodes: While large folio
support for compressed data could work now, it remains disabled since
the stress test could hang due to page migration in a few hours after
enabling it. I need more time to investigate further before enabling
this feature.
Additionally, clean up stream decompressors and tracepoints for
simplicity.
Summary:
- More folio conversions for compressed inodes
- Stream decompressor (LZMA/DEFLATE/ZSTD) cleanups
- Minor tracepoint cleanup"
* tag 'erofs-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: silence uninitialized variable warning in z_erofs_scan_folio()
erofs: avoid refcounting short-lived pages
erofs: get rid of z_erofs_map_blocks_iter_* tracepoints
erofs: tidy up stream decompressors
erofs: refine z_erofs_{init,exit}_subsystem()
erofs: move each decompressor to its own source file
erofs: tidy up `struct z_erofs_bvec`
erofs: teach z_erofs_scan_folios() to handle multi-page folios
erofs: convert z_erofs_read_fragment() to folios
erofs: convert z_erofs_pcluster_readmore() to folios
This is a light release containing optimizations, code clean-ups,
and minor bug fixes. This development cycle focused on work outside
of upstream kernel development:
1. Continuing to build upstream CI for NFSD based on kdevops
2. Continuing to focus on the quality of NFSD in LTS kernels
3. Participation in IETF nfsv4 WG discussions about NFSv4 ACLs,
directory delegation, and NFSv4.2 COPY offload
Notable features in v6.11 that were not pulled through the NFSD tree
include NFS server-side support for the new pNFS NVMe layout type
[RFC9561]. Functional testing for pNFS block layouts like this one
has been introduced to our kdevops CI harness. Work on improving
the resolution of file attribute time stamps in local filesystems
is also ongoing tree-wide.
As always I am grateful to NFSD contributors, reviewers, testers,
and bug reporters who participated during this cycle.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmaVM0cACgkQM2qzM29m
f5fzOQ//c5CXIF3zCLIUofm5eZSP2zIszmHR75rVTEnf0Ehm2BJRF6VZiTvWXRpz
bOuswxfV1Bds+TofbPIP8jqDcMp8NIXemdb6+QMwh4FDY4M8t1v6TRYt35L6Ulrq
bSV81aRS622ofQ35sRzwxpGX6rB6YbB+5L4EKuxdEqRKSB8rCxQcjPy2qypcWlRC
hEAGDe3IiVxTz4VQBpASRqbH9Udw/XEqIhv5c8aLtPvl8i+yWyV5m2G5FMRdBj49
u8rCLoPi/mON8TDs2U4pbhcdgfBWWvGS6woFp6qrqM0wzXIPLalWsPGK3DUtuFUg
onxsClJXMWUvW4k4hbjiqosduLGY/kMeX62Lx1dCj/gktrJpU0GDNR/XbBhHU+hj
UT2CL8AfedC4FQekdyJri/rDgPiTMsf8UE0lgtchHMUXH0ztrjaRxMGiIFMm5vCl
dJBMGJfCkKR/+U1YrGRQI0tPL8CJKYI8klOEhLoOsCr/WC9p4nvvAzSg4W9mNK5P
ni4+KU4f/bj8U0Ap2bUacTpXj6W8VcwJWeuHahVA1Slo+eqXO401hj4W88dQmm9O
ZDR5h+6PI6KoL/KL6I4EyOv+sIEtW3s18cEWbSSu3N/CPuhSGTx8d2J201shJXRN
uDdMkvbwv48x20pgD2oTkPrZbJHOL3BK5/WPBg7pwpfkoRrBAhY=
=Xd5e
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"This is a light release containing optimizations, code clean-ups, and
minor bug fixes.
This development cycle focused on work outside of upstream kernel
development:
- Continuing to build upstream CI for NFSD based on kdevops
- Continuing to focus on the quality of NFSD in LTS kernels
- Participation in IETF nfsv4 WG discussions about NFSv4 ACLs,
directory delegation, and NFSv4.2 COPY offload
Notable features for v6.11 that do not come through the NFSD tree
include NFS server-side support for the new pNFS NVMe layout type
[RFC9561]. Functional testing for pNFS block layouts like this one has
been introduced to our kdevops CI harness. Work on improving the
resolution of file attribute time stamps in local filesystems is also
ongoing tree-wide.
As always I am grateful to NFSD contributors, reviewers, testers, and
bug reporters who participated during this cycle"
* tag 'nfsd-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: nfsd_file_lease_notifier_call gets a file_lease as an argument
gss_krb5: Fix the error handling path for crypto_sync_skcipher_setkey
MAINTAINERS: Add a bugzilla link for NFSD
nfsd: new netlink ops to get/set server pool_mode
sunrpc: refactor pool_mode setting code
nfsd: allow passing in array of thread counts via netlink
nfsd: make nfsd_svc take an array of thread counts
sunrpc: fix up the special handling of sv_nrpools == 1
SUNRPC: Add a trace point in svc_xprt_deferred_close
NFSD: Support write delegations in LAYOUTGET
lockd: Use *-y instead of *-objs in Makefile
NFSD: Fix nfsdcld warning
svcrdma: Handle ADDR_CHANGE CM event properly
svcrdma: Refactor the creation of listener CMA ID
NFSD: remove unused structs 'nfsd3_voidargs'
NFSD: harden svcxdr_dupstr() and svcxdr_tmpalloc() against integer overflows
nfs_read_folio is a bit hard to follow because it mixes highlevel logic
with the actual data read. Split the latter into a helper and update
the comments to be more accurate.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
nfs_folio_length is unsafe to use without having the folio locked and a
check for a NULL ->f_mapping that protects against truncations and can
lead to kernel crashes. E.g. when running xfstests generic/065 with
all nfs trace points enabled.
Follow the model of the XFS trace points and pass in an explіcit offset
and length. This has the additional benefit that these values can
be more accurate as some of the users touch partial folio ranges.
Fixes: eb5654b3b8 ("NFS: Enable tracing of nfs_invalidate_folio() and nfs_launder_folio()")
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
kmsg_dump doesn't forward the panic reason string to the kmsg_dumper
callback.
This patch adds a new struct kmsg_dump_detail, that will hold the
reason and description, and pass it to the dump() callback.
To avoid updating all kmsg_dump() call, it adds a kmsg_dump_desc()
function and a macro for backward compatibility.
I've written this for drm_panic, but it can be useful for other
kmsg_dumper.
It allows to see the panic reason, like "sysrq triggered crash"
or "VFS: Unable to mount root fs on xxxx" on the drm panic screen.
v2:
* Use a struct kmsg_dump_detail to hold the reason and description
pointer, for more flexibility if we want to add other parameters.
(Kees Cook)
* Fix powerpc/nvram_64 build, as I didn't update the forward
declaration of oops_to_nvram()
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
Acked-by: Petr Mladek <pmladek@suse.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Kees Cook <kees@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240702122639.248110-1-jfalempe@redhat.com
Since the original virtio_find_vqs() is no longer present, rename
virtio_find_vqs_info() back to virtio_find_vqs().
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Message-Id: <20240708074814.1739223-20-jiri@resnulli.us>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Instead of passing separate names and callbacks arrays
to virtio_find_vqs(), allocate one of virtual_queue_info structs and
pass it to virtio_find_vqs_info().
Suggested-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Message-Id: <20240708074814.1739223-16-jiri@resnulli.us>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
patchsets (devmem among them) did not make it in time.
Core & protocols
----------------
- Use local_lock in addition to local_bh_disable() to protect per-CPU
resources in networking, a step closer for local_bh_disable() not
to act as a big lock on PREEMPT_RT.
- Use flex array for netdevice priv area, ensure its cache alignment.
- Add a sysctl knob to allow user to specify a default rto_min at socket
init time. Bit of a big hammer but multiple companies were
independently carrying such patch downstream so clearly it's useful.
- Support scheduling transmission of packets based on CLOCK_TAI.
- Un-pin TCP TIMEWAIT timer to avoid it firing on CPUs later cordoned off
using cpusets.
- Support multiple L2TPv3 UDP tunnels using the same 5-tuple address.
- Allow configuration of multipath hash seed, to both allow synchronizing
hashing of two routers, and preventing partial accidental sync.
- Improve TCP compliance with RFC 9293 for simultaneous connect().
- Support sending NAT keepalives in IPsec ESP in UDP states. Userspace
IKE daemon had to do this before, but the kernel can better keep
track of it.
- Support sending supervision HSR frames with MAC addresses stored in
ProxyNodeTable when RedBox (i.e. HSR-SAN) is enabled.
- Introduce IPPROTO_SMC for selecting SMC when socket is created.
- Allow UDP GSO transmit from devices with no checksum offload.
- openvswitch: add packet sampling via psample, separating the sampled
traffic from "upcall" packets sent to user space for forwarding.
- nf_tables: shrink memory consumption for transaction objects.
Things we sprinkled into general kernel code
--------------------------------------------
- Power Sequencing subsystem (used by Qualcomm Bluetooth driver
for QCA6390).
- Add IRQ information in sysfs for auxiliary bus.
- Introduce guard definition for local_lock.
- Add aligned flavor of __cacheline_group_{begin, end}() markings for
grouping fields in structures.
BPF
---
- Notify user space (via epoll) when a struct_ops object is getting
detached/unregistered.
- Add new kfuncs for a generic, open-coded bits iterator.
- Enable BPF programs to declare arrays of kptr, bpf_rb_root, and
bpf_list_head.
- Support resilient split BTF which cuts down on duplication and makes
BTF as compact as possible WRT BTF from modules.
- Add support for dumping kfunc prototypes from BTF which enables both
detecting as well as dumping compilable prototypes for kfuncs.
- riscv64 BPF JIT improvements in particular to add 12-argument support
for BPF trampolines and to utilize bpf_prog_pack for the latter.
- Add the capability to offload the netfilter flowtable in XDP layer
through kfuncs.
Driver API
----------
- Allow users to configure IRQ tresholds between which automatic IRQ
moderation can choose.
- Expand Power Sourcing (PoE) status with power, class and failure
reason. Support setting power limits.
- Track additional RSS contexts in the core, make sure configuration
changes don't break them.
- Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated ESP
data paths.
- Support updating firmware on SFP modules.
Tests and tooling
-----------------
- mptcp: use net/lib.sh to manage netns.
- TCP-AO and TCP-MD5: replace debug prints used by tests with
tracepoints.
- openvswitch: make test self-contained (don't depend on OvS CLI tools).
Drivers
-------
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- increase the max total outstanding PTP TX packets to 4
- add timestamping statistics support
- implement netdev_queue_mgmt_ops
- support new RSS context API
- Intel (100G, ice, idpf):
- implement FEC statistics and dumping signal quality indicators
- support E825C products (with 56Gbps PHYs)
- nVidia/Mellanox:
- support HW-GRO
- mlx4/mlx5: support per-queue statistics via netlink
- obey the max number of EQs setting in sub-functions
- AMD/Solarflare:
- support new RSS context API
- AMD/Pensando:
- ionic: rework fix for doorbell miss to lower overhead
and skip it on new HW
- Wangxun:
- txgbe: support Flow Director perfect filters
- Ethernet NICs consumer, embedded and virtual:
- Add driver for Tehuti Networks TN40xx chips
- Add driver for Meta's internal NIC chips
- Add driver for Ethernet MAC on Airoha EN7581 SoCs
- Add driver for Renesas Ethernet-TSN devices
- Google cloud vNIC:
- flow steering support
- Microsoft vNIC:
- support page sizes other than 4KB on ARM64
- vmware vNIC:
- support latency measurement (update to version 9)
- VirtIO net:
- support for Byte Queue Limits
- support configuring thresholds for automatic IRQ moderation
- support for AF_XDP Rx zero-copy
- Synopsys (stmmac):
- support for STM32MP13 SoC
- let platforms select the right PCS implementation
- TI:
- icssg-prueth: add multicast filtering support
- icssg-prueth: enable PTP timestamping and PPS
- Renesas:
- ravb: improve Rx performance 30-400% by using page pool,
theaded NAPI and timer-based IRQ coalescing
- ravb: add MII support for R-Car V4M
- Cadence (macb):
- macb: add ARP support to Wake-On-LAN
- Cortina:
- use phylib for RX and TX pause configuration
- Ethernet switches:
- nVidia/Mellanox:
- support configuration of multipath hash seed
- report more accurate max MTU
- use page_pool to improve Rx performance
- MediaTek:
- mt7530: add support for bridge port isolation
- Qualcomm:
- qca8k: add support for bridge port isolation
- Microchip:
- lan9371/2: add 100BaseTX PHY support
- NXP:
- vsc73xx: implement VLAN operations
- Ethernet PHYs:
- aquantia: enable support for aqr115c
- aquantia: add support for PHY LEDs
- realtek: add support for rtl8224 2.5Gbps PHY
- xpcs: add memory-mapped device support
- add BroadR-Reach link mode and support in Broadcom's PHY driver
- CAN:
- add document for ISO 15765-2 protocol support
- mcp251xfd: workaround for erratum DS80000789E, use timestamps
to catch when device returns incorrect FIFO status
- WiFi:
- mac80211/cfg80211:
- parse Transmit Power Envelope (TPE) data in mac80211 instead of
in drivers
- improvements for 6 GHz regulatory flexibility
- multi-link improvements
- support multiple radios per wiphy
- remove DEAUTH_NEED_MGD_TX_PREP flag
- Intel (iwlwifi):
- bump FW API to 91 for BZ/SC devices
- report 64-bit radiotap timestamp
- enable P2P low latency by default
- handle Transmit Power Envelope (TPE) advertised by AP
- remove support for older FW for new devices
- fast resume (keeping the device configured)
- mvm: re-enable Multi-Link Operation (MLO)
- aggregation (A-MSDU) optimizations
- MediaTek (mt76):
- mt7925 Multi-Link Operation (MLO) support
- Qualcomm (ath10k):
- LED support for various chipsets
- Qualcomm (ath12k):
- remove unsupported Tx monitor handling
- support channel 2 in 6 GHz band
- support Spatial Multiplexing Power Save (SMPS) in 6 GHz band
- supprt multiple BSSID (MBSSID) and Enhanced Multi-BSSID
Advertisements (EMA)
- support dynamic VLAN
- add panic handler for resetting the firmware state
- DebugFS support for datapath statistics
- WCN7850: support for Wake on WLAN
- Microchip (wilc1000):
- read MAC address during probe to make it visible to user space
- suspend/resume improvements
- TI (wl18xx):
- support newer firmware versions
- RealTek (rtw89):
- preparation for RTL8852BE-VT support
- Wake on WLAN support for WiFi 6 chips
- 36-bit PCI DMA support
- RealTek (rtlwifi):
- RTL8192DU support
- Broadcom (brcmfmac):
- Management Frame Protection support (to enable WPA3)
- Bluetooth:
- qualcomm: use the power sequencer for QCA6390
- btusb: mediatek: add ISO data transmission functions
- hci_bcm4377: add BCM4388 support
- btintel: add support for BlazarU core
- btintel: add support for Whale Peak2
- btnxpuart: add support for AW693 A1 chipset
- btnxpuart: add support for IW615 chipset
- btusb: add Realtek RTL8852BE support ID 0x13d3:0x3591
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmaWjBwACgkQMUZtbf5S
IrvuSRAAkJuEzTRqgURBCe4eNEQde6mJJig7l2CKHwCbFiHZpRkFHf8qKbcGWbL6
uLW33SWnKtJVDhxVKWHLq635XW7BAa80YhqGw21GDi+mIEhWXZglHj3xbXNxsMfE
4eg/kG4BkfYWFmHaXOwVWV/mr7nXf6j7WmXNeXEi32ufE1j0OL+YlQenKnMj8yP2
j9JmYa2Chwppng1SblHmcjmGkdNVwFhStKeCG+2K7v06wdDH/QYBlbgUv9gw/cxp
NlW//wgiaeX40U4O3kDwt9C+LDoh+0VrDDeVdQ+IsScLtY3PhAzEoKolFYTq2HSr
I1JpoaHNnyNsJq3DZrACQ5WlH4yDn6C2EUB6dxNnFaI9F1ZPsi+7MTl6Sei1AklD
TuQTj/lxOACBwW2Q77NU72uoxiIUauesGPHcnrAFuoCIEhZF0mso7k59BvrXhsOP
QwcLbQdc1YHNkqv/Vc7NBY+ruMsYB+5Ubbhhj2p27dp/CWFIwxI29fze4dn2uhO6
ejHN3mbqwPdSzg12YJtM6Iq61Cnwo2eVSvhTxl+ZVSZtI4nu2arzR+y7QTYmNrXP
6tkgVN9UsWeLl2xJ8wyyqL5mcvNHP2rPXWZ2X56iTaa26m+UlleeQ7YRaYtQAAr0
Ec/vlDMX64SwHhd+qwE99DXGQf2g+KklHKSLsnajJUVrWFTlRI0=
=opz8
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Not much excitement - a handful of large patchsets (devmem among them)
did not make it in time.
Core & protocols:
- Use local_lock in addition to local_bh_disable() to protect per-CPU
resources in networking, a step closer for local_bh_disable() not
to act as a big lock on PREEMPT_RT
- Use flex array for netdevice priv area, ensure its cache alignment
- Add a sysctl knob to allow user to specify a default rto_min at
socket init time. Bit of a big hammer but multiple companies were
independently carrying such patch downstream so clearly it's useful
- Support scheduling transmission of packets based on CLOCK_TAI
- Un-pin TCP TIMEWAIT timer to avoid it firing on CPUs later cordoned
off using cpusets
- Support multiple L2TPv3 UDP tunnels using the same 5-tuple address
- Allow configuration of multipath hash seed, to both allow
synchronizing hashing of two routers, and preventing partial
accidental sync
- Improve TCP compliance with RFC 9293 for simultaneous connect()
- Support sending NAT keepalives in IPsec ESP in UDP states.
Userspace IKE daemon had to do this before, but the kernel can
better keep track of it
- Support sending supervision HSR frames with MAC addresses stored in
ProxyNodeTable when RedBox (i.e. HSR-SAN) is enabled
- Introduce IPPROTO_SMC for selecting SMC when socket is created
- Allow UDP GSO transmit from devices with no checksum offload
- openvswitch: add packet sampling via psample, separating the
sampled traffic from "upcall" packets sent to user space for
forwarding
- nf_tables: shrink memory consumption for transaction objects
Things we sprinkled into general kernel code:
- Power Sequencing subsystem (used by Qualcomm Bluetooth driver for
QCA6390) [ Already merged separately - Linus ]
- Add IRQ information in sysfs for auxiliary bus
- Introduce guard definition for local_lock
- Add aligned flavor of __cacheline_group_{begin, end}() markings for
grouping fields in structures
BPF:
- Notify user space (via epoll) when a struct_ops object is getting
detached/unregistered
- Add new kfuncs for a generic, open-coded bits iterator
- Enable BPF programs to declare arrays of kptr, bpf_rb_root, and
bpf_list_head
- Support resilient split BTF which cuts down on duplication and
makes BTF as compact as possible WRT BTF from modules
- Add support for dumping kfunc prototypes from BTF which enables
both detecting as well as dumping compilable prototypes for kfuncs
- riscv64 BPF JIT improvements in particular to add 12-argument
support for BPF trampolines and to utilize bpf_prog_pack for the
latter
- Add the capability to offload the netfilter flowtable in XDP layer
through kfuncs
Driver API:
- Allow users to configure IRQ tresholds between which automatic IRQ
moderation can choose
- Expand Power Sourcing (PoE) status with power, class and failure
reason. Support setting power limits
- Track additional RSS contexts in the core, make sure configuration
changes don't break them
- Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated
ESP data paths
- Support updating firmware on SFP modules
Tests and tooling:
- mptcp: use net/lib.sh to manage netns
- TCP-AO and TCP-MD5: replace debug prints used by tests with
tracepoints
- openvswitch: make test self-contained (don't depend on OvS CLI
tools)
Drivers:
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- increase the max total outstanding PTP TX packets to 4
- add timestamping statistics support
- implement netdev_queue_mgmt_ops
- support new RSS context API
- Intel (100G, ice, idpf):
- implement FEC statistics and dumping signal quality indicators
- support E825C products (with 56Gbps PHYs)
- nVidia/Mellanox:
- support HW-GRO
- mlx4/mlx5: support per-queue statistics via netlink
- obey the max number of EQs setting in sub-functions
- AMD/Solarflare:
- support new RSS context API
- AMD/Pensando:
- ionic: rework fix for doorbell miss to lower overhead and
skip it on new HW
- Wangxun:
- txgbe: support Flow Director perfect filters
- Ethernet NICs consumer, embedded and virtual:
- Add driver for Tehuti Networks TN40xx chips
- Add driver for Meta's internal NIC chips
- Add driver for Ethernet MAC on Airoha EN7581 SoCs
- Add driver for Renesas Ethernet-TSN devices
- Google cloud vNIC:
- flow steering support
- Microsoft vNIC:
- support page sizes other than 4KB on ARM64
- vmware vNIC:
- support latency measurement (update to version 9)
- VirtIO net:
- support for Byte Queue Limits
- support configuring thresholds for automatic IRQ moderation
- support for AF_XDP Rx zero-copy
- Synopsys (stmmac):
- support for STM32MP13 SoC
- let platforms select the right PCS implementation
- TI:
- icssg-prueth: add multicast filtering support
- icssg-prueth: enable PTP timestamping and PPS
- Renesas:
- ravb: improve Rx performance 30-400% by using page pool,
theaded NAPI and timer-based IRQ coalescing
- ravb: add MII support for R-Car V4M
- Cadence (macb):
- macb: add ARP support to Wake-On-LAN
- Cortina:
- use phylib for RX and TX pause configuration
- Ethernet switches:
- nVidia/Mellanox:
- support configuration of multipath hash seed
- report more accurate max MTU
- use page_pool to improve Rx performance
- MediaTek:
- mt7530: add support for bridge port isolation
- Qualcomm:
- qca8k: add support for bridge port isolation
- Microchip:
- lan9371/2: add 100BaseTX PHY support
- NXP:
- vsc73xx: implement VLAN operations
- Ethernet PHYs:
- aquantia: enable support for aqr115c
- aquantia: add support for PHY LEDs
- realtek: add support for rtl8224 2.5Gbps PHY
- xpcs: add memory-mapped device support
- add BroadR-Reach link mode and support in Broadcom's PHY driver
- CAN:
- add document for ISO 15765-2 protocol support
- mcp251xfd: workaround for erratum DS80000789E, use timestamps to
catch when device returns incorrect FIFO status
- WiFi:
- mac80211/cfg80211:
- parse Transmit Power Envelope (TPE) data in mac80211 instead
of in drivers
- improvements for 6 GHz regulatory flexibility
- multi-link improvements
- support multiple radios per wiphy
- remove DEAUTH_NEED_MGD_TX_PREP flag
- Intel (iwlwifi):
- bump FW API to 91 for BZ/SC devices
- report 64-bit radiotap timestamp
- enable P2P low latency by default
- handle Transmit Power Envelope (TPE) advertised by AP
- remove support for older FW for new devices
- fast resume (keeping the device configured)
- mvm: re-enable Multi-Link Operation (MLO)
- aggregation (A-MSDU) optimizations
- MediaTek (mt76):
- mt7925 Multi-Link Operation (MLO) support
- Qualcomm (ath10k):
- LED support for various chipsets
- Qualcomm (ath12k):
- remove unsupported Tx monitor handling
- support channel 2 in 6 GHz band
- support Spatial Multiplexing Power Save (SMPS) in 6 GHz band
- supprt multiple BSSID (MBSSID) and Enhanced Multi-BSSID
Advertisements (EMA)
- support dynamic VLAN
- add panic handler for resetting the firmware state
- DebugFS support for datapath statistics
- WCN7850: support for Wake on WLAN
- Microchip (wilc1000):
- read MAC address during probe to make it visible to user space
- suspend/resume improvements
- TI (wl18xx):
- support newer firmware versions
- RealTek (rtw89):
- preparation for RTL8852BE-VT support
- Wake on WLAN support for WiFi 6 chips
- 36-bit PCI DMA support
- RealTek (rtlwifi):
- RTL8192DU support
- Broadcom (brcmfmac):
- Management Frame Protection support (to enable WPA3)
- Bluetooth:
- qualcomm: use the power sequencer for QCA6390
- btusb: mediatek: add ISO data transmission functions
- hci_bcm4377: add BCM4388 support
- btintel: add support for BlazarU core
- btintel: add support for Whale Peak2
- btnxpuart: add support for AW693 A1 chipset
- btnxpuart: add support for IW615 chipset
- btusb: add Realtek RTL8852BE support ID 0x13d3:0x3591"
* tag 'net-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1589 commits)
eth: fbnic: Fix spelling mistake "tiggerring" -> "triggering"
tcp: Replace strncpy() with strscpy()
wifi: ath12k: fix build vs old compiler
tcp: Don't access uninit tcp_rsk(req)->ao_keyid in tcp_create_openreq_child().
eth: fbnic: Write the TCAM tables used for RSS control and Rx to host
eth: fbnic: Add L2 address programming
eth: fbnic: Add basic Rx handling
eth: fbnic: Add basic Tx handling
eth: fbnic: Add link detection
eth: fbnic: Add initial messaging to notify FW of our presence
eth: fbnic: Implement Rx queue alloc/start/stop/free
eth: fbnic: Implement Tx queue alloc/start/stop/free
eth: fbnic: Allocate a netdevice and napi vectors with queues
eth: fbnic: Add FW communication mechanism
eth: fbnic: Add message parsing for FW messages
eth: fbnic: Add register init to set PCIe/Ethernet device config
eth: fbnic: Allocate core device specific structures and devlink interface
eth: fbnic: Add scaffolding for Meta's NIC driver
PCI: Add Meta Platforms vendor ID
net/sched: cls_flower: propagate tca[TCA_OPTIONS] to NL_REQ_ATTR_CHECK
...
Summary
* Remove "->procname == NULL" check when iterating through sysctl table arrays
Removing sentinels in ctl_table arrays reduces the build time size and
runtime memory consumed by ~64 bytes per array. With all ctl_table
sentinels gone, the additional check for ->procname == NULL that worked in
tandem with the ARRAY_SIZE to calculate the size of the ctl_table arrays is
no longer needed and has been removed. The sysctl register functions now
returns an error if a sentinel is used.
* Preparation patches for sysctl constification
Constifying ctl_table structs prevents the modification of proc_handler
function pointers as they would reside in .rodata. The ctl_table arguments
in sysctl utility functions are const qualified in preparation for a future
treewide proc_handler argument constification commit.
* Misc fixes
Increase robustness of set_ownership by providing sane default ownership
values in case the callee doesn't set them. Bound check proc_dou8vec_minmax
to avoid loading buggy modules and give sysctl testing module a name to
avoid compiler complaints.
Testing
* This got push to linux-next in v6.10-rc2, so it has had more than a month
of testing
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEErkcJVyXmMSXOyyeQupfNUreWQU8FAmaWdz4ACgkQupfNUreW
QU/WKQwAkSuUz42yCQye77BK+Z8ANcTF1f3aI/wfv2nahq1GaSrNBpqUiXvEe9Tt
KD2lM1PWiQfizVLIDPh96yxa5q69GQrPPOA/V1jwIXmk/HRpjjoONCFNNXVRCTls
VCqDz/RatuXvzO35Yn87MnWnxv6PiX7X/zq/3WikVsUI381kvTgC6OwZxdFM52w4
ESwOa3LeOovtRnqV5dpHr6DCQKyd0N52nPxgXvaerjlsJsv7PlezN7z9YyLOOfmW
xUD7X6LQcJq7HcEukaB6I9o2GQOi4yYXL2YOzed7qu9Thu+lasEoN3Bd7P+ilXkc
JY6EXJ5o+d69PewKRuJ1QvD7wrHIkhNMNbMtvehNay124wAHDy3KtonFzyvlX4wE
qCHBYc6rySJNhSqwVp9MoksOZfDM99pVIOs9YVIjc90Zzu5J7tORgYWRVOHTcAtj
fd8nMdkK3+ZANapygFCyew6GueIzaqlQwveVgLGw4vc5L3ClknmURit3y487Pzdg
B+BEVlsp
=bs2G
-----END PGP SIGNATURE-----
Merge tag 'sysctl-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl updates from Joel Granados:
- Remove "->procname == NULL" check when iterating through sysctl table
arrays
Removing sentinels in ctl_table arrays reduces the build time size
and runtime memory consumed by ~64 bytes per array. With all
ctl_table sentinels gone, the additional check for ->procname == NULL
that worked in tandem with the ARRAY_SIZE to calculate the size of
the ctl_table arrays is no longer needed and has been removed. The
sysctl register functions now returns an error if a sentinel is used.
- Preparation patches for sysctl constification
Constifying ctl_table structs prevents the modification of
proc_handler function pointers as they would reside in .rodata. The
ctl_table arguments in sysctl utility functions are const qualified
in preparation for a future treewide proc_handler argument
constification commit.
- Misc fixes
Increase robustness of set_ownership by providing sane default
ownership values in case the callee doesn't set them. Bound check
proc_dou8vec_minmax to avoid loading buggy modules and give sysctl
testing module a name to avoid compiler complaints.
* tag 'sysctl-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
sysctl: Warn on an empty procname element
sysctl: Remove ctl_table sentinel code comments
sysctl: Remove "child" sysctl code comments
sysctl: Remove superfluous empty allocations from sysctl internals
sysctl: Replace nr_entries with ctl_table_size in new_links
sysctl: Remove check for sentinel element in ctl_table arrays
mm profiling: Remove superfluous sentinel element from ctl_table
locking: Remove superfluous sentinel element from kern_lockdep_table
sysctl: Add module description to sysctl-testing
sysctl: constify ctl_table arguments of utility function
utsname: constify ctl_table arguments of utility function
sysctl: move the extra1/2 boundary check of u8 to sysctl_check_table_array
sysctl: always initialize i_uid/i_gid
- Use value of kernel.randomize_va_space once per exec (Alexey Dobriyan)
- Honor PT_LOAD alignment for static PIE
- Make bprm->argmin only visible under CONFIG_MMU
- Add KUnit testing of bprm_stack_limits()
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmaVTFAACgkQiXL039xt
wCYyZw//ZcPV2hu48WqqOImL8LI9HIUaZqKQpixGQRD5VcTRb5MKg8g3Wi4EBHz+
Kg6QvTEOQdg6NbfE9fH8VIIwcp3dAxdWN6g+3A0HHDSRdb8Ye1ucnzB2kgmEkM1l
huBRn5tnoS0vn2fxafu1O5tj330kKAvTsemsy316cxmbKNs7ckHdfwuVgZHcuyEt
OrOA3ZSTWwjkSiA9tatsi5iAQ34tQYGwDEosf06avlnPkQqsRzn3wNlohAPjQF6V
kjRfX/Mxz2EHa0mjXy2OkhNyPSn6wu0OcmF0ympySHzxm726uRggG+olT5ziUc+2
DW6Gz6TJ1P8Gu+uTEoz6AY+l5Bpo9ZLYSBm+Mp88sxAT6+Xcc68XeZsFZHmefJzs
6g6EdmwhDEP/Xd3sIsNphdkS5q1RMgc7tdAtyK8GCaACsHUlU4CfzRYh2mWxpIg6
hA7oM5KF9FuToLtaIS6K/yYQIVsTKAaA7t+5K/a1RUyKzcJ0O7UpMx1oEge2sPEK
RnETCYhQs0Cxm11iJ/eqEFzWm0Puxjsjz/P/j5H5U8usx9VUoz0HuS91fNEIY3S9
y7bn09wxuUv4QddKYgltkurxCCB//Nv7jPYo96pKIW3T56XkfsrYLvNH2W95cCNz
OMvZImA1J/vQubSODrgeQsfMRsaJodHU3acWyYQ90HmmoWx4JS4=
=bO7x
-----END PGP SIGNATURE-----
Merge tag 'execve-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve updates from Kees Cook:
- Use value of kernel.randomize_va_space once per exec (Alexey
Dobriyan)
- Honor PT_LOAD alignment for static PIE
- Make bprm->argmin only visible under CONFIG_MMU
- Add KUnit testing of bprm_stack_limits()
* tag 'execve-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
exec: Avoid pathological argc, envc, and bprm->p values
execve: Keep bprm->argmin behind CONFIG_MMU
ELF: fix kernel.randomize_va_space double read
exec: Add KUnit test for bprm_stack_limits()
binfmt_elf: Honor PT_LOAD alignment for static PIE
binfmt_elf: Calculate total_size earlier
selftests/exec: Build both static and non-static load_address tests
VM Service Module (SVSM).
When running over a SVSM, different services can run at different
protection levels, apart from the guest OS but still within the
secure SNP environment. They can provide services to the guest, like
a vTPM, for example.
This series adds the required facilities to interface with such a SVSM
module.
- The usual fixlets, refactoring and cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmaWQuoACgkQEsHwGGHe
VUrmEw/+KqM5DK5cfpue3gn0RfH6OYUoFxOdYhGkG53qUMc3c3ka5zPVqLoHPkzp
WPXha0Z5pVdrcD9mKtVUW9RIuLjInCM/mnoNc3tIUL+09xxemAjyG1+O+4kodiU7
sZ5+HuKUM2ihoC4Rrm+ApRrZfH4+WcgQNvFky77iObWVBo4yIscS7Pet/MYFvuuz
zNaGp2SGGExDeoX/pMQNI3S9FKYD26HR17AUI3DHpS0teUl2npVi4xDjFVYZh0dQ
yAhTKbSX3Q6ekDDkvAQUbxvWTJw9qoIsvLO9dvZdx6SSWmzF9IbuECpQKGQwYcp+
pVtcHb+3MwfB+nh5/fHyssRTOZp1UuI5GcmLHIQhmhQwCqPgzDH6te4Ud1ovkxOu
3GoBre7KydnQIyv12I+56/ZxyPbjHWmn8Fg106nAwGTdGbBJhfcVYfPmPvwpI4ib
nXpjypvM8FkLzLAzDK6GE9QiXqJJlxOn7t66JiH/FkXR4gnY3eI8JLMfnm5blAb+
97LC7oyeqtstWth9/4tpCILgPR2tirrMQGjUXttgt+2VMzqnEamnFozsKvR95xok
4j6ulKglZjdpn0ixHb2vAzAcOJvD7NP147jtCmXH7M6/f9H1Lih3MKdxX98MVhWB
wSp16udXHzu5lF45J0BJG8uejSgBI2y51jc92HLX7kRULOGyaEo=
=u15r
-----END PGP SIGNATURE-----
Merge tag 'x86_sev_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV updates from Borislav Petkov:
- Add support for running the kernel in a SEV-SNP guest, over a Secure
VM Service Module (SVSM).
When running over a SVSM, different services can run at different
protection levels, apart from the guest OS but still within the
secure SNP environment. They can provide services to the guest, like
a vTPM, for example.
This series adds the required facilities to interface with such a
SVSM module.
- The usual fixlets, refactoring and cleanups
[ And as always: "SEV" is AMD's "Secure Encrypted Virtualization".
I can't be the only one who gets all the newer x86 TLA's confused,
can I?
- Linus ]
* tag 'x86_sev_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Documentation/ABI/configfs-tsm: Fix an unexpected indentation silly
x86/sev: Do RMP memory coverage check after max_pfn has been set
x86/sev: Move SEV compilation units
virt: sev-guest: Mark driver struct with __refdata to prevent section mismatch
x86/sev: Allow non-VMPL0 execution when an SVSM is present
x86/sev: Extend the config-fs attestation support for an SVSM
x86/sev: Take advantage of configfs visibility support in TSM
fs/configfs: Add a callback to determine attribute visibility
sev-guest: configfs-tsm: Allow the privlevel_floor attribute to be updated
virt: sev-guest: Choose the VMPCK key based on executing VMPL
x86/sev: Provide guest VMPL level to userspace
x86/sev: Provide SVSM discovery support
x86/sev: Use the SVSM to create a vCPU when not in VMPL0
x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0
x86/sev: Use kernel provided SVSM Calling Areas
x86/sev: Check for the presence of an SVSM in the SNP secrets page
x86/irqflags: Provide native versions of the local_irq_save()/restore()
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmaOTd8QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgppqIEACUr8Vv2FtezvT3OfVSlYWHHLXzkRhwEG5s
vdk0o7Ow6U54sMjfymbHTgLD0ZOJf3uJ6BI95FQuW41jPzDFVbx4Hy8QzqonMkw9
1D/YQ4zrVL2mOKBzATbKpoGJzMOzGeoXEueFZ1AYPAX7RrDtP4xPQNfrcfkdE2zF
LycJN70Vp6lrZZMuI9yb9ts1tf7TFzK0HJANxOAKTgSiPmBmxesjkJlhrdUrgkAU
qDVyjj7u/ssndBJAb9i6Bl95Do8s9t4DeJq5/6wgKqtf5hClMXzPVB8Wy084gr6E
rTRsCEhOug3qEZSqfAgAxnd3XFRNc/p2KMUe5YZ4mAqux4hpSmIQQDM/5X5K9vEv
f4MNqUGlqyqntZx+KPyFpf7kLHFYS1qK4ub0FojWJEY4GrbBPNjjncLJ9+ozR0c8
kNDaFjMNAjalBee1FxNNH8LdVcd28rrCkPxRLEfO/gvBMUmvJf4ZyKmSED0v5DhY
vZqKlBqG+wg0EXvdiWEHMDh9Y+q/2XBIkS6NN/Bhh61HNu+XzC838ts1X7lR+4o2
AM5Vapw+v0q6kFBMRP3IcJI/c0UcIU8EQU7axMyzWtvhog8kx8x01hIj1L4UyYYr
rUdWrkugBVXJbywFuH/QIJxWxS/z4JdSw5VjASJLIrXy+aANmmG9Wonv95eyhpUv
5iv+EdRSNA==
=wVi8
-----END PGP SIGNATURE-----
Merge tag 'for-6.11/block-20240710' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:
- NVMe updates via Keith:
- Device initialization memory leak fixes (Keith)
- More constants defined (Weiwen)
- Target debugfs support (Hannes)
- PCIe subsystem reset enhancements (Keith)
- Queue-depth multipath policy (Redhat and PureStorage)
- Implement get_unique_id (Christoph)
- Authentication error fixes (Gaosheng)
- MD updates via Song
- sync_action fix and refactoring (Yu Kuai)
- Various small fixes (Christoph Hellwig, Li Nan, and Ofir Gal, Yu
Kuai, Benjamin Marzinski, Christophe JAILLET, Yang Li)
- Fix loop detach/open race (Gulam)
- Fix lower control limit for blk-throttle (Yu)
- Add module descriptions to various drivers (Jeff)
- Add support for atomic writes for block devices, and statx reporting
for same. Includes SCSI and NVMe (John, Prasad, Alan)
- Add IO priority information to block trace points (Dongliang)
- Various zone improvements and tweaks (Damien)
- mq-deadline tag reservation improvements (Bart)
- Ignore direct reclaim swap writes in writeback throttling (Baokun)
- Block integrity improvements and fixes (Anuj)
- Add basic support for rust based block drivers. Has a dummy null_blk
variant for now (Andreas)
- Series converting driver settings to queue limits, and cleanups and
fixes related to that (Christoph)
- Cleanup for poking too deeply into the bvec internals, in preparation
for DMA mapping API changes (Christoph)
- Various minor tweaks and fixes (Jiapeng, John, Kanchan, Mikulas,
Ming, Zhu, Damien, Christophe, Chaitanya)
* tag 'for-6.11/block-20240710' of git://git.kernel.dk/linux: (206 commits)
floppy: add missing MODULE_DESCRIPTION() macro
loop: add missing MODULE_DESCRIPTION() macro
ublk_drv: add missing MODULE_DESCRIPTION() macro
xen/blkback: add missing MODULE_DESCRIPTION() macro
block/rnbd: Constify struct kobj_type
block: take offset into account in blk_bvec_map_sg again
block: fix get_max_segment_size() warning
loop: Don't bother validating blocksize
virtio_blk: Don't bother validating blocksize
null_blk: Don't bother validating blocksize
block: Validate logical block size in blk_validate_limits()
virtio_blk: Fix default logical block size fallback
nvmet-auth: fix nvmet_auth hash error handling
nvme: implement ->get_unique_id
block: pass a phys_addr_t to get_max_segment_size
block: add a bvec_phys helper
blk-lib: check for kill signal in ioctl BLKZEROOUT
block: limit the Write Zeroes to manually writing zeroes fallback
block: refacto blkdev_issue_zeroout
block: move read-only and supported checks into (__)blkdev_issue_zeroout
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZpEHLQAKCRCRxhvAZXjc
ot3sAP9TBUM+vzUcQ5SVcUnSX+y3dhOGYnquORBbRc/Y6AzLMAEAu3TcsvdoaWfy
6ImUaju6iLqy9cCY3uDlNmJR16E4IgE=
=Bwpy
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.11.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull iomap updates from Christian Brauner:
"This contains some minor work for the iomap subsystem:
- Add documentation on the design of iomap and how to port to it
- Optimize iomap_read_folio()
- Bring back the change to iomap_write_end() to no increase i_size.
This is accompanied by a change to xfs to reserve blocks for
truncating large realtime inodes to avoid exposing stale data when
iomap_write_end() stops increasing i_size"
* tag 'vfs-6.11.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
iomap: don't increase i_size in iomap_write_end()
xfs: reserve blocks for truncating large realtime inode
Documentation: the design of iomap and how to port
iomap: Optimize iomap_read_folio
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZpEHIgAKCRCRxhvAZXjc
ovTvAQDvxpq1CIJz4arkf6lkI1VX1PcSfyV1+aIsXkrGF01tfwD+PekJH0xJ7RqU
ysuMo1uG3i1OO2xIdrdwCXJDng4QggE=
=LtRf
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.11.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull pidfs updates from Christian Brauner:
"This contains work to make it possible to derive namespace file
descriptors from pidfd file descriptors.
Right now it is already possible to use a pidfd with setns() to
atomically change multiple namespaces at the same time. In other
words, it is possible to switch to the namespace context of a process
using a pidfd. There is no need to first open namespace file
descriptors via procfs.
The work included here is an extension of these abilities by allowing
to open namespace file descriptors using a pidfd. This means it is now
possible to interact with namespaces without ever touching procfs.
To this end a new set of ioctls() on pidfds is introduced covering all
supported namespace types"
* tag 'vfs-6.11.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
pidfs: allow retrieval of namespace file descriptors
nsfs: add open_namespace()
nsproxy: add helper to go from arbitrary namespace to ns_common
nsproxy: add a cleanup helper for nsproxy
file: add take_fd() cleanup helper
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZpEHFAAKCRCRxhvAZXjc
olb9AQDsA6PLSHsRIVGO3E+syvL+lXC21QdsbAkSgADbqbSC5wEA+nfG2adiWKXc
8CKGMrqXb3j75UfIRIHnM6D03wm0ywo=
=ybN0
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.11.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull namespace-fs updates from Christian Brauner:
"This adds ioctls allowing to translate PIDs between PID namespaces.
The motivating use-case comes from LXCFS which is a tiny fuse
filesystem used to virtualize various aspects of procfs. LXCFS is run
on the host. The files and directories it creates can be bind-mounted
by e.g. a container at startup and mounted over the various procfs
files the container wishes to have virtualized.
When e.g. a read request for uptime is received, LXCFS will receive
the pid of the reader. In order to virtualize the corresponding read,
LXCFS needs to know the pid of the init process of the reader's pid
namespace.
In order to do this, LXCFS first needs to fork() two helper processes.
The first helper process setns() to the readers pid namespace. The
second helper process is needed to create a process that is a proper
member of the pid namespace.
The second helper process then creates a ucred message with ucred.pid
set to 1 and sends it back to LXCFS. The kernel will translate the
ucred.pid field to the corresponding pid number in LXCFS's pid
namespace. This way LXCFS can learn the init pid number of the
reader's pid namespace and can go on to virtualize.
Since these two forks() are costly LXCFS maintains an init pid cache
that caches a given pid for a fixed amount of time. The cache is
pruned during new read requests. However, even with the cache the hit
of the two forks() is singificant when a very large number of
containers are running.
So this adds a simple set of ioctls that let's a caller translate PIDs
from and into a given PID namespace. This significantly improves
performance with a very simple change.
To protect against races pidfds can be used to check whether the
process is still valid"
* tag 'vfs-6.11.nsfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
nsfs: add pid translation ioctls
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZpEHCgAKCRCRxhvAZXjc
om+gAQCC4nJqJqs9bJZIItRtZ71GnxZQO3HVohhIlNM2KKh0VgEA47JhD0O0Srfb
CleII4cQWqB32BfB/vMeff6hpNa7SA4=
=7ltk
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.11.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount query updates from Christian Brauner:
"This contains work to extend the abilities of listmount() and
statmount() and various fixes and cleanups.
Features:
- Allow iterating through mounts via listmount() from newest to
oldest. This makes it possible for mount(8) to keep iterating the
mount table in reverse order so it gets newest mounts first.
- Relax permissions on listmount() and statmount().
It's not necessary to have capabilities in the initial namespace:
it is sufficient to have capabilities in the owning namespace of
the mount namespace we're located in to list unreachable mounts in
that namespace.
- Extend both listmount() and statmount() to list and stat mounts in
foreign mount namespaces.
Currently the only way to iterate over mount entries in mount
namespaces that aren't in the caller's mount namespace is by
crawling through /proc in order to find /proc/<pid>/mountinfo for
the relevant mount namespace.
This is both very clumsy and hugely inefficient. So extend struct
mnt_id_req with a new member that allows to specify the mount
namespace id of the mount namespace we want to look at.
Luckily internally we already have most of the infrastructure for
this so we just need to expose it to userspace. Give userspace a
way to retrieve the id of a mount namespace via statmount() and
through a new nsfs ioctl() on mount namespace file descriptor.
This comes with appropriate selftests.
- Expose mount options through statmount().
Currently if userspace wants to get mount options for a mount and
with statmount(), they still have to open /proc/<pid>/mountinfo to
parse mount options. Simply the information through statmount()
directly.
Afterwards it's possible to only rely on statmount() and
listmount() to retrieve all and more information than
/proc/<pid>/mountinfo provides.
This comes with appropriate selftests.
Fixes:
- Avoid copying to userspace under the namespace semaphore in
listmount.
Cleanups:
- Simplify the error handling in listmount by relying on our newly
added cleanup infrastructure.
- Refuse invalid mount ids early for both listmount and statmount"
* tag 'vfs-6.11.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: reject invalid last mount id early
fs: refuse mnt id requests with invalid ids early
fs: find rootfs mount of the mount namespace
fs: only copy to userspace on success in listmount()
sefltests: extend the statmount test for mount options
fs: use guard for namespace_sem in statmount()
fs: export mount options via statmount()
fs: rename show_mnt_opts -> show_vfsmnt_opts
selftests: add a test for the foreign mnt ns extensions
fs: add an ioctl to get the mnt ns id from nsfs
fs: Allow statmount() in foreign mount namespace
fs: Allow listmount() in foreign mount namespace
fs: export the mount ns id via statmount
fs: keep an index of current mount namespaces
fs: relax permissions for statmount()
listmount: allow listing in reverse order
fs: relax permissions for listmount()
fs: simplify error handling
fs: don't copy to userspace under namespace semaphore
path: add cleanup helper
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZpEG2wAKCRCRxhvAZXjc
ooW/AQDzyY+xNGt4OPMvlyFUHd5RcyiLsMhYrkKc3FaIFjesVgD+PFW5PPW12c0V
Z4VHg9w1HDDuUn4XvELs7OXZpek7RgU=
=eDC8
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.11.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs inode / dentry updates from Christian Brauner:
"This contains smaller performance improvements to inodes and dentries:
inode:
- Add rcu based inode lookup variants.
They avoid one inode hash lock acquire in the common case thereby
significantly reducing contention. We already support RCU-based
operations but didn't take advantage of them during inode
insertion.
Callers of iget_locked() get the improvement without any code
changes. Callers that need a custom callback can switch to
iget5_locked_rcu() as e.g., did btrfs.
With 20 threads each walking a dedicated 1000 dirs * 1000 files
directory tree to stat(2) on a 32 core + 24GB ram vm:
before: 3.54s user 892.30s system 1966% cpu 45.549 total
after: 3.28s user 738.66s system 1955% cpu 37.932 total (-16.7%)
Long-term we should pick up the effort to introduce more
fine-grained locking and possibly improve on the currently used
hash implementation.
- Start zeroing i_state in inode_init_always() instead of doing it in
individual filesystems.
This allows us to remove an unneeded lock acquire in new_inode()
and not burden individual filesystems with this.
dcache:
- Move d_lockref out of the area used by RCU lookup to avoid
cacheline ping poing because the embedded name is sharing a
cacheline with d_lockref.
- Fix dentry size on 32bit with CONFIG_SMP=y so it does actually end
up with 128 bytes in total"
* tag 'vfs-6.11.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: fix dentry size
vfs: move d_lockref out of the area used by RCU lookup
bcachefs: remove now spurious i_state initialization
xfs: remove now spurious i_state initialization in xfs_inode_alloc
vfs: partially sanitize i_state zeroing on inode creation
xfs: preserve i_state around inode_init_always in xfs_reinit_inode
btrfs: use iget5_locked_rcu
vfs: add rcu-based find_inode variants for iget ops
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZpEGjAAKCRCRxhvAZXjc
okXfAP4tFUYszUsSqYdsgy9UvXw3Dr5zOIzQmN++NdjGkbU5fgEAs2ystqEfJgr3
v7XvGbu65CvL4/slNhBZOU4yekGx5Qc=
=C4QD
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.11.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount API updates from Christian Brauner:
- Add a generic helper to parse uid and gid mount options.
Currently we open-code the same logic in various filesystems which is
error prone, especially since the verification of uid and gid mount
options is a sensitive operation in the face of idmappings.
Add a generic helper and convert all filesystems over to it. Make
sure that filesystems that are mountable in unprivileged containers
verify that the specified uid and gid can be represented in the
owning namespace of the filesystem.
- Convert hostfs to the new mount api.
* tag 'vfs-6.11.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fuse: Convert to new uid/gid option parsing helpers
fuse: verify {g,u}id mount options correctly
fat: Convert to new uid/gid option parsing helpers
fat: Convert to new mount api
fat: move debug into fat_mount_options
vboxsf: Convert to new uid/gid option parsing helpers
tracefs: Convert to new uid/gid option parsing helpers
smb: client: Convert to new uid/gid option parsing helpers
tmpfs: Convert to new uid/gid option parsing helpers
ntfs3: Convert to new uid/gid option parsing helpers
isofs: Convert to new uid/gid option parsing helpers
hugetlbfs: Convert to new uid/gid option parsing helpers
ext4: Convert to new uid/gid option parsing helpers
exfat: Convert to new uid/gid option parsing helpers
efivarfs: Convert to new uid/gid option parsing helpers
debugfs: Convert to new uid/gid option parsing helpers
autofs: Convert to new uid/gid option parsing helpers
fs_parse: add uid & gid option option parsing helpers
hostfs: Add const qualifier to host_root in hostfs_fill_super()
hostfs: convert hostfs to use the new mount API
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZpEGnAAKCRCRxhvAZXjc
om4UAP4xnLigNBMHqoWTvfbBjxfoPcWTq53M9CM/T2t2iz4AsQD+NiUQLmNO/LDJ
8aZ0K5TI2+uWYUHbyLabmKWwAm33UQ0=
=ZeeN
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.11.casefold' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs casefolding updates from Christian Brauner:
"This contains some work to simplify the handling of casefolded names:
- Simplify the handling of casefolded names in f2fs and ext4 by
keeping the names as a qstr to avoiding unnecessary conversions
- Introduce a new generic_ci_match() libfs case-insensitive lookup
helper and use it in both f2fs and ext4 allowing to remove the
filesystem specific implementations
- Remove a bunch of ifdefs by making the unicode build checks part of
the code flow"
* tag 'vfs-6.11.casefold' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
f2fs: Move CONFIG_UNICODE defguards into the code flow
ext4: Move CONFIG_UNICODE defguards into the code flow
f2fs: Reuse generic_ci_match for ci comparisons
ext4: Reuse generic_ci_match for ci comparisons
libfs: Introduce case-insensitive string comparison helper
f2fs: Simplify the handling of cached casefolded names
ext4: Simplify the handling of cached casefolded names
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZpEGSgAKCRCRxhvAZXjc
opvwAQCBfq5sxn/P34MNheHAVJOkQlozaflLIRM/CRN60HXV3AEAiph0RJBszvDu
VhJ9VZ21zypvpS34enBfPKp1ZmyHPwI=
=hNqR
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.11.pg_error' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull PG_error removal updates from Christian Brauner:
"This contains work to remove almost all remaining users of PG_error
from filesystems and filesystem helper libraries. An additional patch
will be coming in via the jfs tree which tests the PG_error bit.
Afterwards nothing will be testing it anymore and it's safe to remove
all places which set or clear the PG_error bit.
The goal is to fully remove PG_error by the next merge window"
* tag 'vfs-6.11.pg_error' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
buffer: Remove calls to set and clear the folio error flag
iomap: Remove calls to set and clear folio error flag
vboxsf: Convert vboxsf_read_folio() to use a folio
ufs: Remove call to set the folio error flag
romfs: Convert romfs_read_folio() to use a folio
reiserfs: Remove call to folio_set_error()
orangefs: Remove calls to set/clear the error flag
nfs: Remove calls to folio_set_error
jffs2: Remove calls to set/clear the folio error flag
hostfs: Convert hostfs_read_folio() to use a folio
isofs: Convert rock_ridge_symlink_read_folio to use a folio
hpfs: Convert hpfs_symlink_read_folio to use a folio
efs: Convert efs_symlink_read_folio to use a folio
cramfs: Convert cramfs_read_folio to use a folio
coda: Convert coda_symlink_filler() to use folio_end_read()
befs: Convert befs_symlink_read_folio() to use folio_end_read()
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZpEF0AAKCRCRxhvAZXjc
oq0TAQDjfTLN75RwKQ34RIFtRun2q+OMfBQtSegtaccqazghyAD/QfmPuZDxB5DL
rsI/5k5O4VupIKrEdIaqvNxmkmDsSAc=
=bf7E
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.11.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"Features:
- Support passing NULL along AT_EMPTY_PATH for statx().
NULL paths with any flag value other than AT_EMPTY_PATH go the
usual route and end up with -EFAULT to retain compatibility (Rust
is abusing calls of the sort to detect availability of statx)
This avoids path lookup code, lockref management, memory allocation
and in case of NULL path userspace memory access (which can be
quite expensive with SMAP on x86_64)
- Don't block i_writecount during exec. Remove the
deny_write_access() mechanism for executables
- Relax open_by_handle_at() permissions in specific cases where we
can prove that the caller had sufficient privileges to open a file
- Switch timespec64 fields in struct inode to discrete integers
freeing up 4 bytes
Fixes:
- Fix false positive circular locking warning in hfsplus
- Initialize hfs_inode_info after hfs_alloc_inode() in hfs
- Avoid accidental overflows in vfs_fallocate()
- Don't interrupt fallocate with EINTR in tmpfs to avoid constantly
restarting shmem_fallocate()
- Add missing quote in comment in fs/readdir
Cleanups:
- Don't assign and test in an if statement in mqueue. Move the
assignment out of the if statement
- Reflow the logic in may_create_in_sticky()
- Remove the usage of the deprecated ida_simple_xx() API from procfs
- Reject FSCONFIG_CMD_CREATE_EXCL requets that depend on the new
mount api early
- Rename variables in copy_tree() to make it easier to understand
- Replace WARN(down_read_trylock, ...) abuse with proper asserts in
various places in the VFS
- Get rid of user_path_at_empty() and drop the empty argument from
getname_flags()
- Check for error while copying and no path in one branch in
getname_flags()
- Avoid redundant smp_mb() for THP handling in do_dentry_open()
- Rename parent_ino to d_parent_ino and make it use RCU
- Remove unused header include in fs/readdir
- Export in_group_capable() helper and switch f2fs and fuse over to
it instead of open-coding the logic in both places"
* tag 'vfs-6.11.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits)
ipc: mqueue: remove assignment from IS_ERR argument
vfs: rename parent_ino to d_parent_ino and make it use RCU
vfs: support statx(..., NULL, AT_EMPTY_PATH, ...)
stat: use vfs_empty_path() helper
fs: new helper vfs_empty_path()
fs: reflow may_create_in_sticky()
vfs: remove redundant smp_mb for thp handling in do_dentry_open
fuse: Use in_group_or_capable() helper
f2fs: Use in_group_or_capable() helper
fs: Export in_group_or_capable()
vfs: reorder checks in may_create_in_sticky
hfs: fix to initialize fields of hfs_inode_info after hfs_alloc_inode()
proc: Remove usage of the deprecated ida_simple_xx() API
hfsplus: fix to avoid false alarm of circular locking
Improve readability of copy_tree
vfs: shave a branch in getname_flags
vfs: retire user_path_at_empty and drop empty arg from getname_flags
vfs: stop using user_path_at_empty in do_readlinkat
tmpfs: don't interrupt fallocate with EINTR
fs: don't block i_writecount during exec
...
This is the last - for now - of the "look, we generated some
questionable code for basic pathname lookup operations" set of
branches.
This is mainly just re-organizing the name hashing code in
link_path_walk(), mostly by improving the calling conventions to
the inlined helper functions and moving some of the code around
to allow for more straightforward code generation.
The profiles - and the generated code - look much more palatable
to me now.
* link_path_walk:
vfs: link_path_walk: move more of the name hashing into hash_name()
vfs: link_path_walk: improve may_lookup() code generation
vfs: link_path_walk: do '.' and '..' detection while hashing
vfs: link_path_walk: clarify and improve name hashing interface
vfs: link_path_walk: simplify name hash flow
While Documentation/kbuild/kconfig-language.rst provides a brief
explanation, there are recurring confusions regarding the usage of a
prompt followed by 'if <expr>'. This conditional controls _only_ the
prompt.
A typical usage is as follows:
menuconfig BLOCK
bool "Enable the block layer" if EXPERT
default y
When EXPERT=n, the prompt is hidden, but this config entry is still
active, and BLOCK is set to its default value 'y'. This is reasonable
because you are likely want to enable the block device support. When
EXPERT=y, the prompt is shown, allowing you to toggle BLOCK.
Please note that it is different from 'depends on EXPERT', which would
enable and disable the entire config entry.
However, this conditional prompt has never worked in a choice block.
The following two work in the same way: when EXPERT is disabled, the
choice block is entirely disabled.
[Test Code 1]
choice
prompt "choose" if EXPERT
config A
bool "A"
config B
bool "B"
endchoice
[Test Code 2]
choice
prompt "choose"
depends on EXPERT
config A
bool "A"
config B
bool "B"
endchoice
I believe the first case should hide only the prompt, producing the
default:
CONFIG_A=y
# CONFIG_B is not set
The next commit will change (fix) the behavior of the conditional prompt
in choice blocks.
I see several choice blocks wrongly using a conditional prompt, where
'depends on' makes more sense.
To preserve the current behavior, this commit converts such misuses.
I did not touch the following entry in arch/x86/Kconfig:
choice
prompt "Memory split" if EXPERT
default VMSPLIT_3G
This is truly the correct use of the conditional prompt; when EXPERT=n,
this choice block should silently select the reasonable VMSPLIT_3G,
although the resulting PAGE_OFFSET will not be affected anyway.
Presumably, the one in fs/jffs2/Kconfig is also correct, but I converted
it to 'depends on' to avoid any potential behavioral change.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Merge runtime constants infrastructure with implementations for x86 and
arm64.
This is one of four branches that came out of me looking at profiles of
my kernel build filesystem load on my 128-core Altra arm64 system, where
pathname walking and the user copies (particularly strncpy_from_user()
for fetching the pathname from user space) is very hot.
This is a very specialized "instruction alternatives" model where the
dentry hash pointer and hash count will be constants for the lifetime of
the kernel, but the allocation are not static but done early during the
kernel boot. In order to avoid the pointer load and dynamic shift, we
just rewrite the constants in the instructions in place.
We can't use the "generic" alternative instructions infrastructure,
because different architectures do it very differently, and it's
actually simpler to just have very specific helpers, with a fallback to
the generic ("old") model of just using variables for architectures that
do not implement the runtime constant patching infrastructure.
Link: https://lore.kernel.org/all/CAHk-=widPe38fUNjUOmX11ByDckaeEo9tN4Eiyke9u1SAtu9sA@mail.gmail.com/
* runtime-constants:
arm64: add 'runtime constant' support
runtime constants: add x86 architecture support
runtime constants: add default dummy infrastructure
vfs: dcache: move hashlen_hash() from callers into d_hash()
When accessing a file with more entries than ES_MAX_ENTRY_NUM, the bh-array
is allocated in __exfat_get_entry_set. The problem is that the bh-array is
allocated with GFP_KERNEL. It does not make sense. In the following cases,
a deadlock for sbi->s_lock between the two processes may occur.
CPU0 CPU1
---- ----
kswapd
balance_pgdat
lock(fs_reclaim)
exfat_iterate
lock(&sbi->s_lock)
exfat_readdir
exfat_get_uniname_from_ext_entry
exfat_get_dentry_set
__exfat_get_dentry_set
kmalloc_array
...
lock(fs_reclaim)
...
evict
exfat_evict_inode
lock(&sbi->s_lock)
To fix this, let's allocate bh-array with GFP_NOFS.
Fixes: a3ff29a95f ("exfat: support dynamic allocate bh for exfat_entry_set_cache")
Cc: stable@vger.kernel.org # v6.2+
Reported-by: syzbot+412a392a2cd4a65e71db@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/000000000000fef47e0618c0327f@google.com
Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Pass the idmapped mount information to the different helper
functions. Adapt the uid/gid checks in exfat_setattr to use the
vfsuid/vfsgid helpers.
Based on the fat implementation in commit 4b78993681
("fat: handle idmapped mounts") by Christian Brauner.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Launch ksmbd-durable-scavenger kernel thread to scan durable fps that
have not been reclaimed by a client within the configured time.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
memcpy's second parameter must not be NULL, even if size is zero.
Signed-off-by: Tavian Barnes <tavianator@tavianator.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The expired durable opens should not be reclaimed by client.
This patch add ->durable_scavenger_timeout to fp and check it in
ksmbd_lookup_durable_fd().
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
'struct ksmbd_transport_ops' is not modified in these drivers.
Constifying this structure moves some data to a read-only section, so
increase overall security.
On a x86_64, with allmodconfig, as an example:
Before:
======
text data bss dec hex filename
52184 2085 256 54525 d4fd fs/smb/server/transport_rdma.o
After:
=====
text data bss dec hex filename
52260 2021 256 54537 d509 fs/smb/server/transport_rdma.o
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
perusal of /sys/kernel/debug/bcachefs/*/btree_transaction_stats shows
that the read path has been acculumalating unneeded paths on the reflink
btree, which we don't want.
The solution is to call bch2_trans_begin(), which drops paths not used
on previous loop iteration.
bch2_readahead:
Max mem used: 0
Transaction duration:
count: 194235
since mount recent
duration of events
min: 150 ns
max: 9 ms
total: 838 ms
mean: 4 us 6 us
stddev: 34 us 7 us
time between events
min: 10 ns
max: 15 h
mean: 2 s 12 s
stddev: 2 s 3 ms
Maximum allocated btree paths (193):
path: idx 2 ref 0:0 P btree=extents l=0 pos 270943112:392:U32_MAX locks 0
path: idx 3 ref 1:0 S btree=extents l=0 pos 270943112:24578:U32_MAX locks 1
path: idx 4 ref 0:0 P btree=reflink l=0 pos 0:24773509:0 locks 0
path: idx 5 ref 0:0 P S btree=reflink l=0 pos 0:24773631:0 locks 1
path: idx 6 ref 0:0 P S btree=reflink l=0 pos 0:24773759:0 locks 1
path: idx 7 ref 0:0 P S btree=reflink l=0 pos 0:24773887:0 locks 1
path: idx 8 ref 0:0 P S btree=reflink l=0 pos 0:24774015:0 locks 1
path: idx 9 ref 0:0 P S btree=reflink l=0 pos 0:24774143:0 locks 1
path: idx 10 ref 0:0 P S btree=reflink l=0 pos 0:24774271:0 locks 1
<many more reflink paths>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
If label is not set, the Label tag in superblock info show '(none)'.
```
[Before]
Device index: 0
Label:
Version: 1.4: member_seq
[After]
Device index: 0
Label: (none)
Version: 1.4: member_seq
```
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
highly damaged filesystems, or filesystems that have been damaged and
repair and damaged again, may have sequence numbers we can't fully trust
- which in itself is something we need to debug.
Add a journal_seq fallback so that repair doesn't get stuck.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This adds lockdep tracking for held btree locks with a single dep_map in
btree_trans, i.e. tracking all held btree locks as one object.
This is more practical and more useful than having lockdep track held
btree locks individually, because
- we can take more locks than lockdep can track (unbounded, now that we
have dynamically resizable btree paths)
- there's no lock ordering between btree locks for lockdep to track (we
do cycle detection)
- and this makes it easy to teach lockdep that btree locks are not safe
to hold while invoking memory reclaim.
The last rule is one that lockdep would never learn, because we only do
trylock() from within shrinkers - but we very much do not want to be
invoking memory reclaim while holding btree node locks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Add a new helper to disable lockdep tracking entirely for a given class.
This is needed for bcachefs, which takes too many btree node locks for
lockdep to track. Instead, we have a single lockdep_map for "btree_trans
has any btree nodes locked", which makes more since given that we have
centralized lock management and a cycle detector.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This repurposes the promote path, which already knows how to call
data_update() after a read: we now automatically rewrite bad data when
we get a read error and then successfully retry from a different
replica.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>