linux

mirror of synced 2025-03-06 20:59:54 +01:00

Author	SHA1	Message	Date
Kent Overstreet	fa8e94faee	bcachefs: Heap allocate printbufs This patch changes printbufs dynamically allocate and reallocate a buffer as needed. Stack usage has become a bit of a problem, and a major cause of that has been static size string buffers on the stack. The most involved part of this refactoring is that printbufs must now be exited with printbuf_exit(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:25 -04:00
Kent Overstreet	b66b2bc0f6	bcachefs: Revert "Ensure journal doesn't get stuck in nochanges mode" This patch was originally to work around the journal geting stuck in nochanges mode - but that was just a hack, we needed to fix the actual bug. It should be fixed now, so revert it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:25 -04:00
Kent Overstreet	bf7e49a4ae	bcachefs: Change bch2_dev_lookup() to not use lookup_bdev() bch2_dev_lookup() is used from the extended attribute set methods, for setting the target options, where we're already holding an inode lock - it turns out pathname lookups also take inode locks, so that was susceptible to deadlocks. Fortunately we already stash the device name in ca->name. This does change user-visible behaviour though: instead of specifying e.g. /dev/sda1, user must now specify sda1. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:24 -04:00
Kent Overstreet	c45c866761	bcachefs: bch2_gc_gens() no longer uses bucket array Like the previous patches, this converts bch2_gc_gens() to use the alloc btree directly, and private arrays of generation numbers for its own recalculation of oldest_gen. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:23 -04:00
Kent Overstreet	7c8f6f980d	bcachefs: btree_id_cached() Add a new helper that returns true if the given btree ID uses the btree key cache. This enables some new cleanups, since the helper can check the options for whether caching is enabled on a given btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:23 -04:00
Kent Overstreet	21aec962df	bcachefs: New data structure for buckets waiting on journal commit Implement a hash table, using cuckoo hashing, for empty buckets that are waiting on a journal commit before they can be reused. This replaces the journal_seq field of bucket_mark, and is part of eventually getting rid of the in memory bucket array. We may need to make bch2_bucket_needs_journal_commit() lockless, pending profiling and testing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:22 -04:00
Kent Overstreet	9b6e2f1e70	Revert "bcachefs: Delete some obsolete journal_seq_blacklist code" This reverts commit f95b61228efd04c9c158123da5827c96e9773b29. It turns out, we're seeing filesystems in the wild end up with blacklisted btree node bsets - this should not be happening, and until we understand why and fix it we need to keep this code around. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:21 -04:00
Kent Overstreet	03ea3962ab	bcachefs: Log & error message improvements - Add a shim uuid_unparse_lower() in the kernel, since %pU doesn't work in userspace - We don't need to print the bcachefs: or the filesystem name prefix in userspace - Improve a few error messages Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:21 -04:00
Kent Overstreet	efe68e1d65	bcachefs: Improved superblock-related error messages This patch converts bch2_sb_validate() and the .validate methods for the various superblock sections to take printbuf, to which they can print detailed error messages, including printing the entire section that was invalid. This is a great improvement over the previous situation, where we could only return static strings that didn't have precise information about what was wrong. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:21 -04:00
Kent Overstreet	eacb2574f0	bcachefs: bch_dev->dev Add a field to bch_dev for the dev_t of the underlying block device - this fixes a null ptr deref in tracepoints. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:21 -04:00
Kent Overstreet	d248ee5637	bcachefs: Add iter_flags arg to bch2_btree_delete_range() Will be used by the new snapshot tests, to pass in BTREE_ITER_ALL_SNAPSHOTS. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:20 -04:00
Kent Overstreet	e853692588	bcachefs: Improve error messages in device add path This converts the error messages in the device add to a better style, and adds some missing ones. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:20 -04:00
Kent Overstreet	04f0f77df2	bcachefs: Delete some obsolete journal_seq_blacklist code Since metadata version bcachefs_metadata_version_btree_ptr_sectors_written, we haven't needed the journal seq blacklist mechanism for ignoring blacklisted btree node writes - we now only need it for ignoring journal entries that were written after the newest flush journal entry, and then we only need to keep those blacklist entries around until journal replay is finished. That means we can delete the code for scanning btree nodes to GC journal_seq_blacklist entries. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:20 -04:00
Kent Overstreet	77170d0dd7	bcachefs: bch2_bucket_alloc_new_fs() no longer depends on bucket marks Now that bch2_bucket_alloc_new_fs() isn't looking at bucket marks to decide what buckets are eligible to allocate, we can clean up the filesystem initialization and device add paths. Previously, we had to use ancient code to mark superblock/journal buckets in the in memory bucket marks as we allocated them, and then zero that out and re-do that marking using the newer transational bucket mark paths. Now, we can simply delete the in-memory bucket marking. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:19 -04:00
Kent Overstreet	09943313d7	bcachefs: Rewrite bch2_bucket_alloc_new_fs() This changes bch2_bucket_alloc_new_fs() to a simple bump allocator that doesn't need to use the in memory bucket array, part of a larger patch series to entirely get rid of the in memory bucket array, except for gc/fsck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:19 -04:00
Kent Overstreet	8244f3209b	bcachefs: Option improvements This adds flags for options that must be a power of two (block size and btree node size), and options that are stored in the superblock as a power of two (encoded extent max). Also: options are now stored in memory in the same units they're displayed in (bytes): we now convert when getting and setting from the superblock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:19 -04:00
Kent Overstreet	991ba02112	bcachefs: Add more time_stats This adds more latency/event measurements and breaks some apart into more events. Journal writes are broken apart into flush writes and noflush writes, btree compactions are broken out from btree splits, btree mergers are added, as well as btree_interior_updates - foreground and total. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:18 -04:00
Kent Overstreet	2430e72f42	bcachefs: Convert journal sysfs params to regular options This converts journal_write_delay, journal_flush_disabled, and journal_reclaim_delay to normal filesystems options, and also adds them to the superblock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:18 -04:00
Kent Overstreet	e2b605601a	bcachefs: Clean up error reporting in the startup path It used to be that error reporting in the startup path was done by returning strings describing the error, but that turned out to be a rather silly idea - if there's something we can describe about the error, just print it right away. This converts a good chunk of code to returning error codes, as is more typical style. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:16 -04:00
Chris Webb	7be9ab637f	bcachefs: Return -ENOKEY/EINVAL when mount decryption fails bch2_fs_encryption_init() correctly passes back -ENOKEY from request_key() when no unlock key is found, or -EINVAL if superblock decryption fails because of an invalid key. However, these get absorbed into a generic NULL return from bch2_fs_alloc() and later returned to user space as -ENOMEM, leading to a misleading error from mount(1): mount(2) system call failed: Out of memory. Return explicit error pointers out of bch2_fs_alloc() and handle them in both callers, so the user instead sees mount(2) system call failed: Required key not available. when attempting to mount a filesystem which is still locked. Signed-off-by: Chris Webb <chris@arachsys.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:16 -04:00
Kent Overstreet	fae1157d18	bcachefs: Ensure journal doesn't get stuck in nochanges mode This tweaks the journal code to always act as if there's space available in nochanges mode, when we're not going to be doing any writes. This helps in recovering filesystems that won't mount because they need journal replay and the journal has gotten stuck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:15 -04:00
Kent Overstreet	f124345e2b	bcachefs: Drop bch2_journal_meta() call when going RW Back when we relied on the journal sequence number blacklist machinery for consistency between btree and the journal, we needed to ensure a new journal entry was written before any btree writes were done. But, this had the side effect of consuming some space in the journal prior to doing journal replay - which could lead to a very wedged filesystem, since we don't yet have a way to grow the journal prior to going RW. Fortunately, the journal sequence number blacklist machinery isn't needed anymore, as btree node pointers now record the numer of sectors currently written to that node - that code should all be ripped out. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:15 -04:00
Kent Overstreet	114eea75c7	bcachefs: Fix dev accounting after device add This is a hacky but effective fix to device usage stats for superblock and journal being wrong on a newly added device (following the comment that already told us how it needed to be done!) Reported-by: Chris Webb <chris@arachsys.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:14 -04:00
Kent Overstreet	a9cb0a6706	bcachefs: Fix bch2_dev_remove_alloc() It was missing a lockrestart_do(), to call bch2_trans_begin() and also handle transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:13 -04:00
Kent Overstreet	14b393ee76	bcachefs: Subvolumes, snapshots This patch adds subvolume.c - support for the subvolumes and snapshots btrees and related data types and on disk data structures. The next patches will start hooking up this new code to existing code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:12 -04:00
Kent Overstreet	67e0dd8f0d	bcachefs: btree_path This splits btree_iter into two components: btree_iter is now the externally visible componont, and it points to a btree_path which is now reference counted. This means we no longer have to clone iterators up front if they might be mutated - btree_path can be shared by multiple iterators, and cloned if an iterator would mutate a shared btree_path. This will help us use iterators more efficiently, as well as slimming down the main long lived state in btree_trans, and significantly cleans up the logic for iterator lifetimes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:11 -04:00
Brett Holman	8dd6ed9451	bcachefs: add progress stats to sysfs This adds progress stats to sysfs for copygc, rebalance, recovery, and the cmd_job ioctls. Signed-off-by: Brett Holman <bholman.devel@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:10 -04:00
Kent Overstreet	9f1833cadd	bcachefs: Update btree ptrs after every write This closes a significant hole (and last known hole) in our ability to verify metadata. Previously, since btree nodes are log structured, we couldn't detect lost btree writes that weren't the first write to a given node. Additionally, this seems to have lead to some significant metadata corruption on multi device filesystems with metadata replication: since a write may have made it to one device and not another, if we read that btree node back from the replica that did have that write and started appending after that point, the other replica would have a gap in the bset entries and reading from that replica wouldn't find the rest of the bsets. But, since updates to interior btree nodes are now journalled, we can close this hole by updating pointers to btree nodes after every write with the currently written number of sectors, without negatively affecting performance. This means we will always detect lost or corrupt metadata - it also means that our btree is now a curious hybrid of COW and non COW btrees, with all the benefits of both (excluding complexity). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:08 -04:00
Kent Overstreet	224ec3e677	bcachefs: Don't mark superblocks past end of usable space bcachefs-tools recently started putting a backup superblock at the end of the device. This causes a problem if the bucket size doesn't divide the device size - but we can fix it by just skipping marking that part. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:05 -04:00
Kent Overstreet	c0ebe3e48c	bcachefs: Assorted endianness fixes Found by sparse Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:05 -04:00
Kent Overstreet	9f2772c454	bcachefs: Split out btree_error_wq We can't use btree_update_wq becuase btree updates may be waiting on btree writes to complete. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:04 -04:00
Kent Overstreet	731bdd2eff	bcachefs: Add a workqueue for btree io completions Also, clean up workqueue usage - we shouldn't be using system workqueues, pretty much everything we do needs to be on our own WQ_MEM_RECLAIM workqueues. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:04 -04:00
Kent Overstreet	ef1b20924b	bcachefs: Ratelimiting for writeback IOs Writeback throttling is a kernel config option and not always enabled. When it's not enabled we need a fallback, to avoid unbounded memory pinning and work item backlogs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Dan Robertson	ec4ab9d2fc	bcachefs: Fix possible null deref on mount Ensure that the block device pointer in a superblock handle is not null before dereferencing it in bch2_dev_to_fs. The block device pointer may be null when mounting a new bcachefs filesystem given another mounted bcachefs filesystem exists that has at least one device that is offline. Signed-off-by: Dan Robertson <dan@dlrobertson.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Kent Overstreet	3a402c8dab	bcachefs: Fix some refcounting bugs We really need debug mode assertions that ca->ref and ca->io_ref are used correctly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Kent Overstreet	aae15aafcd	bcachefs: New and improved topology repair code This splits out btree topology repair into a separate pass, and makes some improvements: - When we have to pick which of two overlapping nodes to drop keys from, we use the btree node header sequence number to preserve the newer node - the gc code has been changed so that it doesn't bail out if we're continuing/ignoring on fsck error - this way the dump tool can skip running the repair pass but still walk all reachable metadata - add a new superblock flag indicating when a filesystem is known to have btree topology issues, and the topology repair pass should be run - changing the start/end of a node might mean keys in that node have to be deleted: this patch handles that better by splitting it out into a separate function and running it explicitly in the topology repair code, previously those keys were only being dropped when the btree node was read in. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	4932e07ea0	bcachefs: Fix key cache assertion Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	d62ab355d7	bcachefs: Fix bch2_trans_mark_dev_sb() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	9d8022db1c	bcachefs: Eliminate more PAGE_SIZE uses In userspace, we don't really have a well defined PAGE_SIZE and shouln't be relying on it. This is some more incremental work to remove references to it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	2c944fa12d	bcachefs: Add a print statement for when we go read-write Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:56 -04:00
Kent Overstreet	2436cb9fad	bcachefs: Use x-macros for more enums This patch standardizes all the enums that have associated string tables (probably more enums should have string tables). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:55 -04:00
Kent Overstreet	41f8b09edc	bcachefs: Rename BTREE_ID enums for consistency with other enums Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:55 -04:00
Kent Overstreet	9620c3ec2f	bcachefs: Add a mempool for the replicas delta list Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:54 -04:00
Kent Overstreet	9ae28f824e	bcachefs: Start journal reclaim thread earlier Especially in userspace, we sometime run into resource exhaustion issues with starting up threads after mark and sweep/fsck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:54 -04:00
Kent Overstreet	2ee47eec44	bcachefs: Fix for copygc getting stuck waiting for reserve to be filled This fixes a regression from the patch bcachefs: Fix copygc dying on startup In general only the allocator thread itself should be updating ca->allocator_state, the thread waking up the allocator setting it is an ugly hack only needed to avoid racing with the copygc threads when we're first starting up. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:54 -04:00
Kent Overstreet	51c66fedc0	bcachefs: Rip out copygc pd controller We have a separate mechanism for ratelimiting copygc now - the pd controller has only been causing problems. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:54 -04:00
Kent Overstreet	220d206232	bcachefs: Fix an allocator startup race Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:54 -04:00
Kent Overstreet	59a7405161	bcachefs: Create allocator threads when allocating filesystem We're seeing failures to mount because of a failure to start the allocator threads, which currently happens fairly late in the mount process, after walking all metadata, and kthread_create() fails if something has tried to kill the mount process, which is probably not what we want. This patch avoids this issue by creating, but not starting, the allocator threads when we preallocate all of our other in memory data structures. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	fcb3431be8	bcachefs: Redo checks for sufficient devices When the replicas mechanism was added, for tracking data by which drives it's replicated on, the check for whether we have sufficient devices was never updated to make use of it. This patch finally does that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	4b8f89afd4	bcachefs: Fixes/improvements for journal entry reservations This fixes some arithmetic bugs in "bcachefs: Journal updates to dev usage" - additionally, it cleans things up by switching everything that goes in every journal entry to the journal_entry_res mechanism. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00

... 2 3 4 5 6

292 commits