linux

mirror of synced 2025-03-06 20:59:54 +01:00

History

Linus Torvalds 2622f29041 bcachefs updates for 6.14-rc1 Lots of scalability work, another big on disk format change. On disk format version goes from 1.13 to 1.20. Like 6.11, this is another big and expensive automatic/required on disk format upgrade. This is planned to be the last big on disk format upgrade before the experimental label comes off. There will be one more minor on disk format update for a few things that couldn't make this release. Headline improvements: - Fix mount time regression that some users encountered post the 6.11 disk accounting rewrite. Accounting keys were encoded little endian (typetag in the low bits) - which didn't anticipate adding accounting keys for every inode, which aren't stored in memory and we don't want to scan at mount time. - fsck time on large filesystems is improved by multiple orders of magnitude. Previously, 100TB was about the practical max filesystem size, where users were reporting fsck times of a day+. With the new changes (which nearly eliminate backpointers fsck overhead), we fsck'd a filesystem with 10PB of data in 1.5 hours. The problematic fsck passes were walking every extent and checking for missing backpointers, and walking every backpointer to check for dangling backpointers. As we've been adding more and more runtime self healing there was no reason to keep around the backpointers -> extents pass; dangling backpointers are just deleted, and we can do that when using them - thus, backpointers -> extents is now only run in debug mode. extents -> backpointers does need to exist, since missing backpointers would mean we can't find data to move it (for e.g. copygc, device evacuate, scrub). But the new on disk format version makes possible a new strategy where we sum up backpointers within a bucket and check it against the bucket sector counts, and then only scan for missing backpointers if the counts are off (and then, only for specific buckets). Full list of on disk format changes: - 1.14: backpointer_bucket_gen Backpointers now have a field for the bucket generation number, replacing the obsolete bucket_offset field. This is needed for the new "sum up backpointers within a bucket" code, since backpointers use the btree write buffer - meaning we will see stale reads, and this runs online, with the filesystem in full rw mode. - 1.15: disk_accounting_big_endian As previously described, fix the endianness of accounting keys so that accounting keys with the same typetag sort together, and accounting read can skip types it's not interested in. - 1.16: reflink_p_may_update_opts: This version indicates that a new reflink pointer field is understood and may be used; the field indicates whether the reflink pointer has permissions to update IO path options (e.g. compression, replicas) may be updated on the indirect extent it points to. This completes the rebalance/reflink data path option handling from the 6.13 pull request. - 1.17: inode_depth Add a new inode field, bi_depth, to accelerate the check_directory_structure fsck path, which checks for loops in the filesystem heirarchy. check_inodes and check_dirents check connectivity, so check_directory_structure only has to check for loops - by walking back up to the root from every directory. But a path can't be a loop if it has a counter that increases monotonically from root to leaf - adding a depth counter means that we can check for loops with only local (parent -> child) checks. We might need to occasionally renumber the depth field in fsck if directories have been moved around, but then future fsck runs will be much faster. - 1.18: persistent_inode_cursors Previously, the cursor used for inode allocation was only kept in memory, which meant that users with large filesystems and lots of files were reporting that the first create after mounting would take awhile - since it had to scan from the start. Inode allocation cursors are now persistent, and also include a generation field (incremented on wraparound, which will only happen if inode allocation is restricted to 32 bit inodes), so that we don't have to leave inode_generation keys around after a delete. The option for 32 bit inode numbers may now also be set on individual directories, and non-32 bit inode allocations are disallowed from allocating from the 32 bit part of the inode number space. - 1.19: autofix_errors Runtime self healing is now the default.o - 1.20: directory size (from Hongbo) directory i_size is now meaningful, and not 0. Release notes from the previous 6.13 pull request: - Self healing work: Allocator and reflink now run the exact same check/repair code that fsck does at runtime, where applicable. The long term goal here is to remove inconsistent() errors (that cause us to go emergency read only) by lifting fsck code up to normal runtime paths; we should only go emergency read-only if we detect an inconsistency that was due to a runtime bug - or truly catastrophic damage (corrupted btree roots/interior nodes). - Reflink repair no longer deletes reflink pointers: instead we flip an error bit and log the error, and they can still be deleted by file deletion. This means a temporary failure to find an indirect extent (perhaps repaired later by btree node scan) won't result in unnecessary data loss - Improvements to rebalance data path option handling: we can now correctly apply changed filesystem-level io path options to pending rebalance work, and soon we'll be able to apply file-level io path option changes to indirect extents. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmeOiboACgkQE6szbY3K bnY8zQ//Yoy+5ZA07tQV+Fi0JV0DZ6w3xotxNhAUeaKgCKHgp37gcKa47TFir4pd 6ha7PQV3GimFwHoIUfOY5X4Y+bEm16XblyfK3VU6IgGiE3cUg+1q8b8WrD2eHmLJ qIT8DWWpAM2AcZ/f5G37hH8pxn2t0TUuzJ1Sz7wEhJUNZEP+z+qaacnGhwuc8yQ3 Srj7Cc/NSd9T+6G2yKhERFITUrXmqVGgGihhVZqs0hCAPt8bwn5K8d1H2IKoj1N6 jJ3MQfmPIzUk0mfIjHrBlqrA+3tjtt5LGU+QpOWs8g509xHCP0BfGGOXQhjMjHVI JVSqAuIENK4V1ubz7BZcSoPAVncPeFl8Ly5Qdw5FlDBux9kKsch8wJPjn1A1gkPt Fb9VBTRkCK7WqUzkmbQh152SNC/0plb/8qFjywHNkvYyGMMlJME8zDIg40RN+0Ql ckXjlvdVGm0GbyM2GLth4gbOSXDzKrq12i3rWROnOLZ0Q2SBKfJe5K0UdRat1/nu 2sWWJNJqDzaaP1Gd/qk3Yht06GWnhI/17Bl/Znt5M8rxtSBbbxO58vi3gxasbccS l3qozuNouvAMNRBqE4ayVtjV+Aj69j1IBJnAfCareDDDf6ugjooLqu27BQkLOPg7 wswq633T6WG+UfQ44GvseiCaDW5MMh0aq7vxzjnBUoTz5usMfxg= =d0Zb -----END PGP SIGNATURE----- Merge tag 'bcachefs-2025-01-20.2' of git://evilpiepirate.org/bcachefs Pull bcachefs updates from Kent Overstreet: "Lots of scalability work, another big on-disk format change. On-disk format version goes from 1.13 to 1.20. Like 6.11, this is another big and expensive automatic/required on disk format upgrade. This is planned to be the last big on disk format upgrade before the experimental label comes off. There will be one more minor on disk format update for a few things that couldn't make this release. Headline improvements: - Self healing work: Allocator and reflink now run the exact same check/repair code that fsck does at runtime, where applicable. The long term goal here is to remove inconsistent() errors (that cause us to go emergency read only) by lifting fsck code up to normal runtime paths; we should only go emergency read-only if we detect an inconsistency that was due to a runtime bug - or truly catastrophic damage (corrupted btree roots/interior nodes). - Reflink repair no longer deletes reflink pointers: Instead we flip an error bit and log the error, and they can still be deleted by file deletion. This means a temporary failure to find an indirect extent (perhaps repaired later by btree node scan) won't result in unnecessary data loss - Improvements to rebalance data path option handling: We can now correctly apply changed filesystem-level io path options to pending rebalance work, and soon we'll be able to apply file-level io path option changes to indirect extents - Fix mount time regression that some users encountered post the 6.11 disk accounting rewrite. Accounting keys were encoded little endian (typetag in the low bits) - which didn't anticipate adding accounting keys for every inode, which aren't stored in memory and we don't want to scan at mount time. - fsck time on large filesystems is improved by multiple orders of magnitude. Previously, 100TB was about the practical max filesystem size, where users were reporting fsck times of a day+. With the new changes (which nearly eliminate backpointers fsck overhead), we fsck'd a filesystem with 10PB of data in 1.5 hours. The problematic fsck passes were walking every extent and checking for missing backpointers, and walking every backpointer to check for dangling backpointers. As we've been adding more and more runtime self healing there was no reason to keep around the backpointers -> extents pass; dangling backpointers are just deleted, and we can do that when using them - thus, backpointers -> extents is now only run in debug mode. extents -> backpointers does need to exist, since missing backpointers would mean we can't find data to move it (for e.g. copygc, device evacuate, scrub). But the new on disk format version makes possible a new strategy where we sum up backpointers within a bucket and check it against the bucket sector counts, and then only scan for missing backpointers if the counts are off (and then, only for specific buckets). Full list of on disk format changes: - 1.14: backpointer_bucket_gen Backpointers now have a field for the bucket generation number, replacing the obsolete bucket_offset field. This is needed for the new "sum up backpointers within a bucket" code, since backpointers use the btree write buffer - meaning we will see stale reads, and this runs online, with the filesystem in full rw mode. - 1.15: disk_accounting_big_endian As previously described, fix the endianness of accounting keys so that accounting keys with the same typetag sort together, and accounting read can skip types it's not interested in. - 1.16: reflink_p_may_update_opts: This version indicates that a new reflink pointer field is understood and may be used; the field indicates whether the reflink pointer has permissions to update IO path options (e.g. compression, replicas) may be updated on the indirect extent it points to. This completes the rebalance/reflink data path option handling from the 6.13 pull request. - 1.17: inode_depth Add a new inode field, bi_depth, to accelerate the check_directory_structure fsck path, which checks for loops in the filesystem heirarchy. check_inodes and check_dirents check connectivity, so check_directory_structure only has to check for loops - by walking back up to the root from every directory. But a path can't be a loop if it has a counter that increases monotonically from root to leaf - adding a depth counter means that we can check for loops with only local (parent -> child) checks. We might need to occasionally renumber the depth field in fsck if directories have been moved around, but then future fsck runs will be much faster. - 1.18: persistent_inode_cursors Previously, the cursor used for inode allocation was only kept in memory, which meant that users with large filesystems and lots of files were reporting that the first create after mounting would take awhile - since it had to scan from the start. Inode allocation cursors are now persistent, and also include a generation field (incremented on wraparound, which will only happen if inode allocation is restricted to 32 bit inodes), so that we don't have to leave inode_generation keys around after a delete. The option for 32 bit inode numbers may now also be set on individual directories, and non-32 bit inode allocations are disallowed from allocating from the 32 bit part of the inode number space. - 1.19: autofix_errors Runtime self healing is now the default.o - 1.20: directory size (from Hongbo) directory i_size is now meaningful, and not 0" * tag 'bcachefs-2025-01-20.2' of git://evilpiepirate.org/bcachefs: (268 commits) bcachefs: Fix check_inode_hash_info_matches_root() bcachefs: Document issue with bch_stripe layout bcachefs: Fix self healing on read error bcachefs: Pop all the transactions from the abort one bcachefs: Only abort the transactions in the cycle bcachefs: Introduce lock_graph_pop_from bcachefs: Convert open-coded lock_graph_pop_all to helper bcachefs: Do not allow no fail lock request to fail bcachefs: Merge the condition to avoid additional invocation Revert "bcachefs: Fix bch2_btree_node_upgrade()" bcachefs: bcachefs_metadata_version_directory_size bcachefs: make directory i_size meaningful bcachefs: check_unreachable_inodes is not actually PASS_ONLINE yet bcachefs: Don't use BTREE_ITER_cached when walking alloc btree during fsck bcachefs: Check for dirents to overwritten inodes bcachefs: bch2_btree_iter_peek_slot() handles navigating to nonexistent depth bcachefs: Don't set btree_path to updtodate if we don't fill bcachefs: __bch2_btree_pos_to_text() bcachefs: printbuf_reset() handles tabstops bcachefs: Silence read-only errors when deleting snapshots ...		2025-01-20 13:55:19 -08:00
..
bcachefs	docs: filesystems: bcachefs: fixed some spelling mistakes in the bcachefs coding style page	2024-12-21 01:36:14 -05:00
caching	doc: correcting the debug path for cachefiles	2024-10-24 13:50:27 +02:00
ext4	Documentation: Fix typos	2023-08-18 11:29:03 -06:00
iomap	vfs-6.13.untorn.writes	2024-11-18 11:30:09 -08:00
nfs	Merge patch series "Fixup NLM and kNFSD file lock callbacks"	2024-10-02 07:52:07 +02:00
smb	ksmbd: fix spelling mistakes in documentation	2024-08-18 17:02:36 -05:00
spufs	Documentation: spufs: correct a duplicate word typo	2022-09-27 13:21:44 -06:00
xfs	docs: describe xfs directory tree online fsck	2024-04-15 14:59:01 -07:00
9p.rst	USB/Thunderbolt update for 6.12-rc1	2024-09-26 09:45:36 -07:00
adfs.rst	docs: filesystems: convert adfs.txt to ReST	2020-03-02 13:58:44 -07:00
affs.rst	affs: fix basic permission bits to actually work	2020-08-31 12:20:31 +02:00
afs.rst	afs: Documentation: correct reference to CONFIG_AFS_FS	2023-07-21 13:46:02 -06:00
api-summary.rst	doc: split buffer.rst out of api-summary.rst	2024-05-05 17:53:40 -07:00
autofs-mount-control.rst	autofs: use flexible array in ioctl structure	2023-05-30 16:42:00 -07:00
autofs.rst	Documentation: filesystems: update filename extensions	2024-11-22 10:31:04 -07:00
automount-support.rst	docs: filesystems: convert automount-support.txt to ReST	2020-05-05 09:22:21 -06:00
befs.rst	Documentation: Fix typos	2023-08-18 11:29:03 -06:00
bfs.rst	docs: filesystems: convert bfs.txt to ReST	2020-03-02 14:01:26 -07:00
btrfs.rst	MAINTAINERS: remove links to obsolete btrfs.wiki.kernel.org	2023-09-08 14:21:27 +02:00
buffer.rst	doc: split buffer.rst out of api-summary.rst	2024-05-05 17:53:40 -07:00
ceph.rst	doc: ceph: update userspace command to get CephFS metadata	2024-05-23 10:35:47 +02:00
coda.rst	Documentation: coda: annotate duplicated words	2020-07-13 10:02:32 -06:00
configfs.rst	Documentation: Fix typos	2023-08-18 11:29:03 -06:00
cramfs.rst	docs: filesystems: convert cramfs.txt to ReST	2020-03-02 14:02:07 -07:00
dax.rst	Documentation: Fix typos	2023-08-18 11:29:03 -06:00
debugfs.rst	debugfs: small Documentation cleaning	2022-11-09 13:58:55 -07:00
devpts.rst	Documentation: Fix typos	2023-08-18 11:29:03 -06:00
directory-locking.rst	Docs: typos/spelling	2024-05-02 10:02:29 -06:00
dlmfs.rst	Documentation: filesystems: update filename extensions	2024-11-22 10:31:04 -07:00
dnotify.rst	docs: filesystems: convert dnotify.txt to ReST	2020-05-05 09:22:22 -06:00
ecryptfs.rst	docs: prevent warnings due to autosectionlabel	2020-03-20 17:01:29 -06:00
efivarfs.rst	Documentation: Mark the 'efivars' sysfs interface as removed	2024-04-13 10:33:02 +02:00
erofs.rst	erofs: allow large folios for compressed files	2024-08-19 16:10:04 +08:00
ext2.rst	ext2: remove nobh support	2022-08-02 12:34:04 -04:00
ext3.rst	docs: filesystems: convert ext3.txt to ReST	2020-03-02 14:03:16 -07:00
f2fs.rst	f2fs: introduce device aliasing file	2024-11-01 01:19:00 +00:00
fiemap.rst	fiemap: use kernel-doc includes in fiemap docbook	2024-12-22 11:29:50 +01:00
files.rst	docs: filesystems: fix typo in docs	2024-02-09 10:37:20 +01:00
fscrypt.rst	fscrypt: write CBC-CTS instead of CTS-CBC	2024-02-23 21:38:59 -08:00
fsverity.rst	Documentation: filesystems: update filename extensions	2024-11-22 10:31:04 -07:00
fuse-io.rst	docs/fuse-io: Document the usage of DIRECT_IO_ALLOW_MMAP	2023-12-04 10:16:53 +01:00
fuse.rst	fuse: Add module param for CAP_SYS_ADMIN access bypassing allow_other	2022-07-21 16:06:19 +02:00
gfs2-glocks.rst	gfs2: Get rid of demote_ok checks	2024-05-29 15:34:55 +02:00
gfs2-uevents.rst	docs: filesystems: convert gfs2-uevents.txt to ReST	2020-03-02 14:03:35 -07:00
gfs2.rst	Documentation: Update filesystems/gfs2.rst	2020-12-01 00:25:20 +01:00
hfs.rst	Replace HTTP links with HTTPS ones: Documentation/filesystems	2020-06-26 11:14:12 -06:00
hfsplus.rst	docs: filesystems: convert hfsplus.txt to ReST	2020-03-02 14:03:47 -07:00
hpfs.rst	Replace HTTP links with HTTPS ones: Documentation/filesystems	2020-06-26 11:14:12 -06:00
idmappings.rst	doc: correcting the idmapping mount example	2024-08-30 08:22:37 +02:00
index.rst	Documentation: add a new file documenting multigrain timestamps	2024-10-10 10:20:52 +02:00
inotify.rst	docs: filesystems: convert inotify.txt to ReST	2020-03-02 14:03:55 -07:00
isofs.rst	docs: filesystems: convert isofs.txt to ReST	2020-03-02 14:04:06 -07:00
journalling.rst	docs:filesystems: fix spelling and grammar mistakes	2024-09-10 15:36:50 -06:00
locking.rst	fs: Convert aops->write_begin to take a folio	2024-08-07 11:33:21 +02:00
locks.rst	docs: fs: locks.rst: update comment about mandatory file locking	2021-10-19 06:48:21 -04:00
mount_api.rst	fs_parser: update mount_api doc to match function signature	2024-11-26 10:32:20 +01:00
multigrain-ts.rst	Documentation: add a new file documenting multigrain timestamps	2024-10-10 10:20:52 +02:00
netfs_library.rst	netfs: fix documentation build error	2024-10-08 10:39:38 +02:00
nilfs2.rst	Documentation: Fix typos	2023-08-18 11:29:03 -06:00
ntfs3.rst	Documentation: Fix typos	2023-08-18 11:29:03 -06:00
ocfs2-online-filecheck.rst	docs: filesystems: convert ocfs2-online-filecheck.txt to ReST	2020-03-02 14:04:06 -07:00
ocfs2.rst	docs: update ocfs2-devel mailing list address	2023-07-08 09:29:29 -07:00
omfs.rst	Replace HTTP links with HTTPS ones: OMFS	2020-07-13 11:24:43 -06:00
orangefs.rst	Documentation: Fix typos	2023-08-18 11:29:03 -06:00
overlayfs.rst	Documentation,ovl: document new file descriptor based layers	2024-10-14 16:31:16 +02:00
path-lookup.rst	Documentation: filesystems: update filename extensions	2024-11-22 10:31:04 -07:00
path-lookup.txt	Documentation: filesystems: update filename extensions	2024-11-22 10:31:04 -07:00
porting.rst	reiserfs: The last commit	2024-10-21 16:29:38 +02:00
proc.rst	mm: Define VM_SHADOW_STACK for arm64 when we support GCS	2024-10-04 12:04:36 +01:00
qnx6.rst	Documentation: Fix typos	2023-08-18 11:29:03 -06:00
quota.rst	quota: Fixup http links in quota doc	2020-07-09 08:14:01 +02:00
ramfs-rootfs-initramfs.rst	Documentation: filesystems: update filename extensions	2024-11-22 10:31:04 -07:00
relay.rst	docs: filesystems: convert relay.txt to ReST	2020-03-02 14:04:41 -07:00
romfs.rst	docs: filesystems: convert romfs.txt to ReST	2020-03-02 14:04:41 -07:00
seq_file.rst	Documentation: Fix typos	2023-08-18 11:29:03 -06:00
sharedsubtree.rst	Documentation/filesystems: sharedsubtree: add section headings	2023-05-16 12:50:05 -06:00
splice.rst	docs: Bring some order to filesystem documentation	2019-03-06 09:46:10 -07:00
squashfs.rst	docs: filesystems: document the squashfs specific mount options	2023-12-10 17:21:31 -08:00
sysfs.rst	driver core: bus: mark the struct bus_type for sysfs callbacks as constant	2023-03-23 13:20:40 +01:00
sysv-fs.rst	docs: filesystems: convert sysv-fs.txt to ReST	2020-03-02 14:04:41 -07:00
tmpfs.rst	docs: tmpfs: Add casefold options	2024-10-28 13:36:55 +01:00
ubifs-authentication.rst	Documentation: Fix typos	2023-08-18 11:29:03 -06:00
ubifs.rst	Documentation: ubifs: Fix compression idiom	2022-10-10 13:01:10 -06:00
udf.rst	udf: Replace HTTP links with HTTPS ones	2020-07-14 14:37:39 +02:00
vfat.rst	Documentation: Fix typos	2023-08-18 11:29:03 -06:00
vfs.rst	ALong with the usual shower of singleton patches, notable patch series in	2024-09-21 07:29:05 -07:00
virtiofs.rst	virtiofs: Add mount option and atime behavior to the doc	2020-04-20 17:01:34 +02:00
zonefs.rst	Documentation: Fix typos	2023-08-18 11:29:03 -06:00