1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
linux/Documentation/filesystems
Linus Torvalds 2622f29041 bcachefs updates for 6.14-rc1
Lots of scalability work, another big on disk format change. On disk
 format version goes from 1.13 to 1.20.
 
 Like 6.11, this is another big and expensive automatic/required on disk
 format upgrade. This is planned to be the last big on disk format
 upgrade before the experimental label comes off. There will be one more
 minor on disk format update for a few things that couldn't make this
 release.
 
 Headline improvements:
 - Fix mount time regression that some users encountered post the 6.11
   disk accounting rewrite.
 
   Accounting keys were encoded little endian (typetag in the low bits) -
   which didn't anticipate adding accounting keys for every inode, which
   aren't stored in memory and we don't want to scan at mount time.
 
 - fsck time on large filesystems is improved by multiple orders of
   magnitude. Previously, 100TB was about the practical max filesystem
   size, where users were reporting fsck times of a day+. With the new
   changes (which nearly eliminate backpointers fsck overhead), we fsck'd
   a filesystem with 10PB of data in 1.5 hours.
 
   The problematic fsck passes were walking every extent and checking for
   missing backpointers, and walking every backpointer to check for
   dangling backpointers. As we've been adding more and more runtime self
   healing there was no reason to keep around the backpointers -> extents
   pass; dangling backpointers are just deleted, and we can do that when
   using them - thus, backpointers -> extents is now only run in debug
   mode.
 
   extents -> backpointers does need to exist, since missing backpointers
   would mean we can't find data to move it (for e.g. copygc, device
   evacuate, scrub). But the new on disk format version makes possible a
   new strategy where we sum up backpointers within a bucket and check it
   against the bucket sector counts, and then only scan for missing
   backpointers if the counts are off (and then, only for specific
   buckets).
 
 Full list of on disk format changes:
 - 1.14: backpointer_bucket_gen
   Backpointers now have a field for the bucket generation number,
   replacing the obsolete bucket_offset field. This is needed for the
   new "sum up backpointers within a bucket" code, since backpointers use
   the btree write buffer - meaning we will see stale reads, and this
   runs online, with the filesystem in full rw mode.
 
 - 1.15: disk_accounting_big_endian
   As previously described, fix the endianness of accounting keys so that
   accounting keys with the same typetag sort together, and accounting
   read can skip types it's not interested in.
 
 - 1.16: reflink_p_may_update_opts:
   This version indicates that a new reflink pointer field is understood
   and may be used; the field indicates whether the reflink pointer has
   permissions to update IO path options (e.g. compression, replicas) may
   be updated on the indirect extent it points to.
 
   This completes the rebalance/reflink data path option handling from
   the 6.13 pull request.
 
 - 1.17: inode_depth
   Add a new inode field, bi_depth, to accelerate the
   check_directory_structure fsck path, which checks for loops in the
   filesystem heirarchy.
 
   check_inodes and check_dirents check connectivity, so
   check_directory_structure only has to check for loops - by walking
   back up to the root from every directory.
 
   But a path can't be a loop if it has a counter that increases
   monotonically from root to leaf - adding a depth counter means that we
   can check for loops with only local (parent -> child) checks. We might
   need to occasionally renumber the depth field in fsck if directories
   have been moved around, but then future fsck runs will be much faster.
 
 - 1.18: persistent_inode_cursors
 
   Previously, the cursor used for inode allocation was only kept in
   memory, which meant that users with large filesystems and lots of
   files were reporting that the first create after mounting would take
   awhile - since it had to scan from the start.
 
   Inode allocation cursors are now persistent, and also include a
   generation field (incremented on wraparound, which will only happen if
   inode allocation is restricted to 32 bit inodes), so that we don't
   have to leave inode_generation keys around after a delete.
 
   The option for 32 bit inode numbers may now also be set on individual
   directories, and non-32 bit inode allocations are disallowed from
   allocating from the 32 bit part of the inode number space.
 
 - 1.19: autofix_errors
 
   Runtime self healing is now the default.o
 
 - 1.20: directory size (from Hongbo)
 
   directory i_size is now meaningful, and not 0.
 
 Release notes from the previous 6.13 pull request:
 
 - Self healing work:
   Allocator and reflink now run the exact same check/repair code that
   fsck does at runtime, where applicable.
 
   The long term goal here is to remove inconsistent() errors (that cause
   us to go emergency read only) by lifting fsck code up to normal
   runtime paths; we should only go emergency read-only if we detect an
   inconsistency that was due to a runtime bug - or truly catastrophic
   damage (corrupted btree roots/interior nodes).
 
 - Reflink repair no longer deletes reflink pointers: instead we flip an
   error bit and log the error, and they can still be deleted by file
   deletion. This means a temporary failure to find an indirect extent
   (perhaps repaired later by btree node scan) won't result in
   unnecessary data loss
 
 - Improvements to rebalance data path option handling: we can now
   correctly apply changed filesystem-level io path options to pending
   rebalance work, and soon we'll be able to apply file-level io path
   option changes to indirect extents.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmeOiboACgkQE6szbY3K
 bnY8zQ//Yoy+5ZA07tQV+Fi0JV0DZ6w3xotxNhAUeaKgCKHgp37gcKa47TFir4pd
 6ha7PQV3GimFwHoIUfOY5X4Y+bEm16XblyfK3VU6IgGiE3cUg+1q8b8WrD2eHmLJ
 qIT8DWWpAM2AcZ/f5G37hH8pxn2t0TUuzJ1Sz7wEhJUNZEP+z+qaacnGhwuc8yQ3
 Srj7Cc/NSd9T+6G2yKhERFITUrXmqVGgGihhVZqs0hCAPt8bwn5K8d1H2IKoj1N6
 jJ3MQfmPIzUk0mfIjHrBlqrA+3tjtt5LGU+QpOWs8g509xHCP0BfGGOXQhjMjHVI
 JVSqAuIENK4V1ubz7BZcSoPAVncPeFl8Ly5Qdw5FlDBux9kKsch8wJPjn1A1gkPt
 Fb9VBTRkCK7WqUzkmbQh152SNC/0plb/8qFjywHNkvYyGMMlJME8zDIg40RN+0Ql
 ckXjlvdVGm0GbyM2GLth4gbOSXDzKrq12i3rWROnOLZ0Q2SBKfJe5K0UdRat1/nu
 2sWWJNJqDzaaP1Gd/qk3Yht06GWnhI/17Bl/Znt5M8rxtSBbbxO58vi3gxasbccS
 l3qozuNouvAMNRBqE4ayVtjV+Aj69j1IBJnAfCareDDDf6ugjooLqu27BQkLOPg7
 wswq633T6WG+UfQ44GvseiCaDW5MMh0aq7vxzjnBUoTz5usMfxg=
 =d0Zb
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2025-01-20.2' of git://evilpiepirate.org/bcachefs

Pull bcachefs updates from Kent Overstreet:
 "Lots of scalability work, another big on-disk format change. On-disk
  format version goes from 1.13 to 1.20.

  Like 6.11, this is another big and expensive automatic/required on
  disk format upgrade. This is planned to be the last big on disk format
  upgrade before the experimental label comes off. There will be one
  more minor on disk format update for a few things that couldn't make
  this release.

  Headline improvements:

   - Self healing work:

     Allocator and reflink now run the exact same check/repair code that
     fsck does at runtime, where applicable.

     The long term goal here is to remove inconsistent() errors (that
     cause us to go emergency read only) by lifting fsck code up to
     normal runtime paths; we should only go emergency read-only if we
     detect an inconsistency that was due to a runtime bug - or truly
     catastrophic damage (corrupted btree roots/interior nodes).

   - Reflink repair no longer deletes reflink pointers:

     Instead we flip an error bit and log the error, and they can still
     be deleted by file deletion. This means a temporary failure to find
     an indirect extent (perhaps repaired later by btree node scan)
     won't result in unnecessary data loss

   - Improvements to rebalance data path option handling:

     We can now correctly apply changed filesystem-level io path options
     to pending rebalance work, and soon we'll be able to apply
     file-level io path option changes to indirect extents

   - Fix mount time regression that some users encountered post the 6.11
     disk accounting rewrite.

     Accounting keys were encoded little endian (typetag in the low
     bits) - which didn't anticipate adding accounting keys for every
     inode, which aren't stored in memory and we don't want to scan at
     mount time.

   - fsck time on large filesystems is improved by multiple orders of
     magnitude. Previously, 100TB was about the practical max filesystem
     size, where users were reporting fsck times of a day+. With the new
     changes (which nearly eliminate backpointers fsck overhead), we
     fsck'd a filesystem with 10PB of data in 1.5 hours.

     The problematic fsck passes were walking every extent and checking
     for missing backpointers, and walking every backpointer to check
     for dangling backpointers. As we've been adding more and more
     runtime self healing there was no reason to keep around the
     backpointers -> extents pass; dangling backpointers are just
     deleted, and we can do that when using them - thus, backpointers ->
     extents is now only run in debug mode.

     extents -> backpointers does need to exist, since missing
     backpointers would mean we can't find data to move it (for e.g.
     copygc, device evacuate, scrub). But the new on disk format version
     makes possible a new strategy where we sum up backpointers within a
     bucket and check it against the bucket sector counts, and then only
     scan for missing backpointers if the counts are off (and then, only
     for specific buckets).

  Full list of on disk format changes:

   - 1.14: backpointer_bucket_gen

     Backpointers now have a field for the bucket generation number,
     replacing the obsolete bucket_offset field. This is needed for the
     new "sum up backpointers within a bucket" code, since backpointers
     use the btree write buffer - meaning we will see stale reads, and
     this runs online, with the filesystem in full rw mode.

   - 1.15: disk_accounting_big_endian

     As previously described, fix the endianness of accounting keys so
     that accounting keys with the same typetag sort together, and
     accounting read can skip types it's not interested in.

   - 1.16: reflink_p_may_update_opts:

     This version indicates that a new reflink pointer field is
     understood and may be used; the field indicates whether the reflink
     pointer has permissions to update IO path options (e.g.
     compression, replicas) may be updated on the indirect extent it
     points to.

     This completes the rebalance/reflink data path option handling from
     the 6.13 pull request.

   - 1.17: inode_depth

     Add a new inode field, bi_depth, to accelerate the
     check_directory_structure fsck path, which checks for loops in the
     filesystem heirarchy.

     check_inodes and check_dirents check connectivity, so
     check_directory_structure only has to check for loops - by walking
     back up to the root from every directory.

     But a path can't be a loop if it has a counter that increases
     monotonically from root to leaf - adding a depth counter means that
     we can check for loops with only local (parent -> child) checks. We
     might need to occasionally renumber the depth field in fsck if
     directories have been moved around, but then future fsck runs will
     be much faster.

   - 1.18: persistent_inode_cursors

     Previously, the cursor used for inode allocation was only kept in
     memory, which meant that users with large filesystems and lots of
     files were reporting that the first create after mounting would
     take awhile - since it had to scan from the start.

     Inode allocation cursors are now persistent, and also include a
     generation field (incremented on wraparound, which will only happen
     if inode allocation is restricted to 32 bit inodes), so that we
     don't have to leave inode_generation keys around after a delete.

     The option for 32 bit inode numbers may now also be set on
     individual directories, and non-32 bit inode allocations are
     disallowed from allocating from the 32 bit part of the inode number
     space.

   - 1.19: autofix_errors

     Runtime self healing is now the default.o

   - 1.20: directory size (from Hongbo)

     directory i_size is now meaningful, and not 0"

* tag 'bcachefs-2025-01-20.2' of git://evilpiepirate.org/bcachefs: (268 commits)
  bcachefs: Fix check_inode_hash_info_matches_root()
  bcachefs: Document issue with bch_stripe layout
  bcachefs: Fix self healing on read error
  bcachefs: Pop all the transactions from the abort one
  bcachefs: Only abort the transactions in the cycle
  bcachefs: Introduce lock_graph_pop_from
  bcachefs: Convert open-coded lock_graph_pop_all to helper
  bcachefs: Do not allow no fail lock request to fail
  bcachefs: Merge the condition to avoid additional invocation
  Revert "bcachefs: Fix bch2_btree_node_upgrade()"
  bcachefs: bcachefs_metadata_version_directory_size
  bcachefs: make directory i_size meaningful
  bcachefs: check_unreachable_inodes is not actually PASS_ONLINE yet
  bcachefs: Don't use BTREE_ITER_cached when walking alloc btree during fsck
  bcachefs: Check for dirents to overwritten inodes
  bcachefs: bch2_btree_iter_peek_slot() handles navigating to nonexistent depth
  bcachefs: Don't set btree_path to updtodate if we don't fill
  bcachefs: __bch2_btree_pos_to_text()
  bcachefs: printbuf_reset() handles tabstops
  bcachefs: Silence read-only errors when deleting snapshots
  ...
2025-01-20 13:55:19 -08:00
..
bcachefs docs: filesystems: bcachefs: fixed some spelling mistakes in the bcachefs coding style page 2024-12-21 01:36:14 -05:00
caching doc: correcting the debug path for cachefiles 2024-10-24 13:50:27 +02:00
ext4 Documentation: Fix typos 2023-08-18 11:29:03 -06:00
iomap vfs-6.13.untorn.writes 2024-11-18 11:30:09 -08:00
nfs Merge patch series "Fixup NLM and kNFSD file lock callbacks" 2024-10-02 07:52:07 +02:00
smb ksmbd: fix spelling mistakes in documentation 2024-08-18 17:02:36 -05:00
spufs Documentation: spufs: correct a duplicate word typo 2022-09-27 13:21:44 -06:00
xfs docs: describe xfs directory tree online fsck 2024-04-15 14:59:01 -07:00
9p.rst USB/Thunderbolt update for 6.12-rc1 2024-09-26 09:45:36 -07:00
adfs.rst docs: filesystems: convert adfs.txt to ReST 2020-03-02 13:58:44 -07:00
affs.rst affs: fix basic permission bits to actually work 2020-08-31 12:20:31 +02:00
afs.rst afs: Documentation: correct reference to CONFIG_AFS_FS 2023-07-21 13:46:02 -06:00
api-summary.rst doc: split buffer.rst out of api-summary.rst 2024-05-05 17:53:40 -07:00
autofs-mount-control.rst autofs: use flexible array in ioctl structure 2023-05-30 16:42:00 -07:00
autofs.rst Documentation: filesystems: update filename extensions 2024-11-22 10:31:04 -07:00
automount-support.rst docs: filesystems: convert automount-support.txt to ReST 2020-05-05 09:22:21 -06:00
befs.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
bfs.rst docs: filesystems: convert bfs.txt to ReST 2020-03-02 14:01:26 -07:00
btrfs.rst MAINTAINERS: remove links to obsolete btrfs.wiki.kernel.org 2023-09-08 14:21:27 +02:00
buffer.rst doc: split buffer.rst out of api-summary.rst 2024-05-05 17:53:40 -07:00
ceph.rst doc: ceph: update userspace command to get CephFS metadata 2024-05-23 10:35:47 +02:00
coda.rst Documentation: coda: annotate duplicated words 2020-07-13 10:02:32 -06:00
configfs.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
cramfs.rst docs: filesystems: convert cramfs.txt to ReST 2020-03-02 14:02:07 -07:00
dax.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
debugfs.rst debugfs: small Documentation cleaning 2022-11-09 13:58:55 -07:00
devpts.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
directory-locking.rst Docs: typos/spelling 2024-05-02 10:02:29 -06:00
dlmfs.rst Documentation: filesystems: update filename extensions 2024-11-22 10:31:04 -07:00
dnotify.rst docs: filesystems: convert dnotify.txt to ReST 2020-05-05 09:22:22 -06:00
ecryptfs.rst docs: prevent warnings due to autosectionlabel 2020-03-20 17:01:29 -06:00
efivarfs.rst Documentation: Mark the 'efivars' sysfs interface as removed 2024-04-13 10:33:02 +02:00
erofs.rst erofs: allow large folios for compressed files 2024-08-19 16:10:04 +08:00
ext2.rst ext2: remove nobh support 2022-08-02 12:34:04 -04:00
ext3.rst docs: filesystems: convert ext3.txt to ReST 2020-03-02 14:03:16 -07:00
f2fs.rst f2fs: introduce device aliasing file 2024-11-01 01:19:00 +00:00
fiemap.rst fiemap: use kernel-doc includes in fiemap docbook 2024-12-22 11:29:50 +01:00
files.rst docs: filesystems: fix typo in docs 2024-02-09 10:37:20 +01:00
fscrypt.rst fscrypt: write CBC-CTS instead of CTS-CBC 2024-02-23 21:38:59 -08:00
fsverity.rst Documentation: filesystems: update filename extensions 2024-11-22 10:31:04 -07:00
fuse-io.rst docs/fuse-io: Document the usage of DIRECT_IO_ALLOW_MMAP 2023-12-04 10:16:53 +01:00
fuse.rst fuse: Add module param for CAP_SYS_ADMIN access bypassing allow_other 2022-07-21 16:06:19 +02:00
gfs2-glocks.rst gfs2: Get rid of demote_ok checks 2024-05-29 15:34:55 +02:00
gfs2-uevents.rst docs: filesystems: convert gfs2-uevents.txt to ReST 2020-03-02 14:03:35 -07:00
gfs2.rst Documentation: Update filesystems/gfs2.rst 2020-12-01 00:25:20 +01:00
hfs.rst Replace HTTP links with HTTPS ones: Documentation/filesystems 2020-06-26 11:14:12 -06:00
hfsplus.rst docs: filesystems: convert hfsplus.txt to ReST 2020-03-02 14:03:47 -07:00
hpfs.rst Replace HTTP links with HTTPS ones: Documentation/filesystems 2020-06-26 11:14:12 -06:00
idmappings.rst doc: correcting the idmapping mount example 2024-08-30 08:22:37 +02:00
index.rst Documentation: add a new file documenting multigrain timestamps 2024-10-10 10:20:52 +02:00
inotify.rst docs: filesystems: convert inotify.txt to ReST 2020-03-02 14:03:55 -07:00
isofs.rst docs: filesystems: convert isofs.txt to ReST 2020-03-02 14:04:06 -07:00
journalling.rst docs:filesystems: fix spelling and grammar mistakes 2024-09-10 15:36:50 -06:00
locking.rst fs: Convert aops->write_begin to take a folio 2024-08-07 11:33:21 +02:00
locks.rst docs: fs: locks.rst: update comment about mandatory file locking 2021-10-19 06:48:21 -04:00
mount_api.rst fs_parser: update mount_api doc to match function signature 2024-11-26 10:32:20 +01:00
multigrain-ts.rst Documentation: add a new file documenting multigrain timestamps 2024-10-10 10:20:52 +02:00
netfs_library.rst netfs: fix documentation build error 2024-10-08 10:39:38 +02:00
nilfs2.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
ntfs3.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
ocfs2-online-filecheck.rst docs: filesystems: convert ocfs2-online-filecheck.txt to ReST 2020-03-02 14:04:06 -07:00
ocfs2.rst docs: update ocfs2-devel mailing list address 2023-07-08 09:29:29 -07:00
omfs.rst Replace HTTP links with HTTPS ones: OMFS 2020-07-13 11:24:43 -06:00
orangefs.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
overlayfs.rst Documentation,ovl: document new file descriptor based layers 2024-10-14 16:31:16 +02:00
path-lookup.rst Documentation: filesystems: update filename extensions 2024-11-22 10:31:04 -07:00
path-lookup.txt Documentation: filesystems: update filename extensions 2024-11-22 10:31:04 -07:00
porting.rst reiserfs: The last commit 2024-10-21 16:29:38 +02:00
proc.rst mm: Define VM_SHADOW_STACK for arm64 when we support GCS 2024-10-04 12:04:36 +01:00
qnx6.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
quota.rst quota: Fixup http links in quota doc 2020-07-09 08:14:01 +02:00
ramfs-rootfs-initramfs.rst Documentation: filesystems: update filename extensions 2024-11-22 10:31:04 -07:00
relay.rst docs: filesystems: convert relay.txt to ReST 2020-03-02 14:04:41 -07:00
romfs.rst docs: filesystems: convert romfs.txt to ReST 2020-03-02 14:04:41 -07:00
seq_file.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
sharedsubtree.rst Documentation/filesystems: sharedsubtree: add section headings 2023-05-16 12:50:05 -06:00
splice.rst docs: Bring some order to filesystem documentation 2019-03-06 09:46:10 -07:00
squashfs.rst docs: filesystems: document the squashfs specific mount options 2023-12-10 17:21:31 -08:00
sysfs.rst driver core: bus: mark the struct bus_type for sysfs callbacks as constant 2023-03-23 13:20:40 +01:00
sysv-fs.rst docs: filesystems: convert sysv-fs.txt to ReST 2020-03-02 14:04:41 -07:00
tmpfs.rst docs: tmpfs: Add casefold options 2024-10-28 13:36:55 +01:00
ubifs-authentication.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
ubifs.rst Documentation: ubifs: Fix compression idiom 2022-10-10 13:01:10 -06:00
udf.rst udf: Replace HTTP links with HTTPS ones 2020-07-14 14:37:39 +02:00
vfat.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00
vfs.rst ALong with the usual shower of singleton patches, notable patch series in 2024-09-21 07:29:05 -07:00
virtiofs.rst virtiofs: Add mount option and atime behavior to the doc 2020-04-20 17:01:34 +02:00
zonefs.rst Documentation: Fix typos 2023-08-18 11:29:03 -06:00