1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

97 commits

Author SHA1 Message Date
Kent Overstreet
8ef9831399 bcachefs: Improve bucket_alloc_fail tracepoint
We should be printing the number of free buckets, not just the number of
available buckets.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:36 -04:00
Kent Overstreet
3858536744 bcachefs: Bucket invalidate path improvements
- invalidate_one_bucket() now returns 1 when we don't have any buckets
   on this device to invalidate, ensuring we don't spin
 - the tracepoint invocation is moved to after the transaction commit,
   and we now include the number of cached sectors in the tracepoint

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
1f93726e63 bcachefs: Tracepoint improvements
Delete some obsolete tracepoints, organize alloc tracepoints better,
make a few tracepoints more consistent.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:32 -04:00
Kent Overstreet
7c7e071d90 bcachefs: Don't normalize to pages in btree cache shrinker
This behavior dates from the early, early days of bcache, and upon
further delving appears to not make any sense. The shrinker only works
in terms of 'objects' of unknown size; normalizing to pages only had the
effect of changing the batch size, which we could do directly - if we
wanted; we probably don't. Normalizing to pages meant our batch size was
very small, which seems to have been keeping us from doing as much
shrinking as we should be under heavy memory pressure; this patch
appears to alleviate some OOMs we've been seeing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:30 -04:00
Kent Overstreet
4254f5bf6e bcachefs: Add a tracepoint for superblock writes
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:30 -04:00
Kent Overstreet
59cc38b8d4 bcachefs: New discard implementation
In the old allocator code, buckets would be discarded just prior to
being used - this made sense in bcache where we were discarding buckets
just after invalidating the cached data they contain, but in a
filesystem where we typically have more free space we want to be
discarding buckets when they become empty.

This patch implements the new behaviour - it checks the need_discard
btree for buckets awaiting discards, and then clears the appropriate
bit in the alloc btree, which moves the buckets to the freespace btree.

Additionally, discards are now enabled by default.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:29 -04:00
Kent Overstreet
f25d8215f4 bcachefs: Kill allocator threads & freelists
Now that we have new persistent data structures for the allocator, this
patch converts the allocator to use them.

Now, foreground bucket allocation uses the freespace btree to find
buckets to allocate, instead of popping buckets off the freelist.

The background allocator threads are no longer needed and are deleted,
as well as the allocator freelists. Now we only need background tasks
for invalidating buckets containing cached data (when we are low on
empty buckets), and for issuing discards.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:29 -04:00
Kent Overstreet
5f41739403 bcachefs: bch2_btree_update_start() refactoring
This simplifies the logic in bch2_btree_update_start() a bit, handling
the unlock/block logic more locally.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:29 -04:00
Kent Overstreet
3e1547116f bcachefs: x-macroize alloc_reserve enum
This makes an array of strings available, like our other enums.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:29 -04:00
Kent Overstreet
f13fd87a39 bcachefs: Run overwrite triggers before insert
For backpointers, we'll need to delete old backpointers before adding
new backpointers - otherwise we'll run into spurious duplicate
backpointer errors.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:29 -04:00
Kent Overstreet
3a306f3c2c bcachefs: Fix large key cache keys
Previously, we'd go into an infinite loop when attempting to cache a
bkey in the key cache larger than 128 u64s - since we were only using a
u8 for the size field, it'd get rounded up to 256 then truncated to 0.
Oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:28 -04:00
Kent Overstreet
ddf11d8c60 bcachefs: Fix a use after free
This fixes a regression from "bcachefs: Stash a copy of key being
overwritten in btree_insert_entry". In btree_key_can_insert_cached(), we
may reallocate the key cache key, invalidating pointers previously
returned by peek() - fix it by issuing a transaction restart.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:26 -04:00
Kent Overstreet
8f9ad91a02 bcachefs: Fix failure to allocate btree node in cache
The error code when we fail to allocate a node in the btree node cache
doesn't make it to bch2_btree_path_traverse_all(). Instead, we need to
stash a flag in btree_trans so we know we have to take the cannibalize
lock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:24 -04:00
Kent Overstreet
4b59a319ad bcachefs: Fix slow tracepoints
Some of our tracepoints were calling snprintf("pS") - which does symbol
table lookups - in TP_fast_assign(), which turns out to be a really bad
idea.

This was done because perf trace wasn't correctly printing tracepoints
that use %pS anymore - but it turns out trace-cmd does handle it
correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:24 -04:00
Kent Overstreet
12ce5b7df1 bcachefs: Btree key cache coherency
- Updates to non key cache iterators will now be transparently
   redirected to the key cache for cached btrees.

 - Except when creating new keys: then the update goes to underlying
   btree

For for iterating over a cached btree to work, we need to ensure that if
a key exists in the key cache, it also exists in the btree - otherwise
the iterator code will skip past it and not check the key cache.

Otherwise, for consistency, all updates should go to the same place -
the key cache.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:23 -04:00
Kent Overstreet
bc82d08bae bcachefs: Tracepoint improvements
This improves the transaction restart tracepoints - adding distinct
tracepoints for all the locations and reasons a transaction might have
been restarted, and ensures that there's a tracepoint for every
transaction restart.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:22 -04:00
Kent Overstreet
669f87a5da bcachefs: Switch to __func__for recording where btree_trans was initialized
Symbol decoding, via %ps, isn't supported in userspace - this will also
be faster when we're using trans->fn in the fast path, as with the new
BCH_JSET_ENTRY_log journal messages.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:21 -04:00
Kent Overstreet
eacb2574f0 bcachefs: bch_dev->dev
Add a field to bch_dev for the dev_t of the underlying block device -
this fixes a null ptr deref in tracepoints.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:21 -04:00
Kent Overstreet
c7ce813fe4 bcachefs: Add a tracepoint for the btree cache shrinker
This is to help with diagnosing why the btree node can doesn't seem to
be shrinking - we've had issues in the past with granularity/batch size,
since btree nodes are so big.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:20 -04:00
Kent Overstreet
caaa66aa54 bcachefs: Better approach to write vs. read lock deadlocks
Instead of unconditionally upgrading read locks to intent locks in
do_bch2_trans_commit(), this patch changes the path that takes write
locks to first trylock, and then if trylock fails check if we have a
conflicting read lock, and restart the transaction if necessary.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:12 -04:00
Kent Overstreet
f48361b00c bcachefs: Drop some fast path tracepoints
These haven't turned out to be useful

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:11 -04:00
Kent Overstreet
67e0dd8f0d bcachefs: btree_path
This splits btree_iter into two components: btree_iter is now the
externally visible componont, and it points to a btree_path which is now
reference counted.

This means we no longer have to clone iterators up front if they might
be mutated - btree_path can be shared by multiple iterators, and cloned
if an iterator would mutate a shared btree_path. This will help us use
iterators more efficiently, as well as slimming down the main long lived
state in btree_trans, and significantly cleans up the logic for iterator
lifetimes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:11 -04:00
Kent Overstreet
2b4e4b8cfa bcachefs: Minor tracepoint improvements
Btree iterator tracepoints should print whether they're for the key
cache.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
531a0095c9 bcachefs: Improve btree iterator tracepoints
This patch adds some new tracepoints to the btree iterator code, and
adds new fields to the existing tracepoints - primarily for the iterator
position.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:05 -04:00
Kent Overstreet
ddc7dd62f0 bcachefs: Don't use uuid in tracepoints
%pU for printing out pointers to uuids doesn't work in perf trace

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:04 -04:00
Kent Overstreet
19d2819d2d bcachefs: Add a tracepoint for copygc waiting
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:04 -04:00
Kent Overstreet
4f6dad46cb bcachefs: Add a tracepoint for when we block on journal reclaim
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:02 -04:00
Kent Overstreet
3dea728ce6 bcachefs: New tracepoint for bch2_trans_get_iter()
Trying to debug an issue where after traverse_all() we shouldn't have to
traverse any iterators... yet we are

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:02 -04:00
Kent Overstreet
89baec780f bcachefs: Allocator refactoring
This uses the kthread_wait_freezable() macro to simplify a lot of the
allocator thread code, along with cleaning up bch2_invalidate_bucket2().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:01 -04:00
Kent Overstreet
73a117d2d8 bcachefs: Improve trans_restart_mem_realloced tracepoint
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:00 -04:00
Kent Overstreet
2527dd9158 bcachefs: Improve bch2_btree_iter_traverse_all()
By changing it to upgrade iterators to intent locks to avoid lock
restarts we can simplify __bch2_btree_node_lock() quite a bit - this
fixes a probable bug where it could potentially drop a lock on an
unrelated error but still succeed instead of causing a transaction
restart.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:00 -04:00
Kent Overstreet
d44a6e350e bcachefs: Drop old style btree node coalescing
We have foreground btree node merging now, and any future btree node
merging improvements are going to be based off of that code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:00 -04:00
Kent Overstreet
241e26369e bcachefs: Don't flush btree writes more aggressively because of btree key cache
We need to flush the btree key cache when it's too dirty, because
otherwise the shrinker won't be able to reclaim memory - this is done by
journal reclaim. But journal reclaim also kicks btree node writes: this
meant that btree node writes were getting kicked much too often just
because we needed to flush btree key cache keys.

This patch splits journal pins into two different lists, and teaches
journal reclaim to not flush btree node writes when it only needs to
flush key cache keys.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:59 -04:00
Kent Overstreet
d5425a3b22 bcachefs: Throttle updates when btree key cache is too dirty
This is needed to ensure we don't deadlock because journal reclaim and
thus memory reclaim isn't making forward progress.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:48 -04:00
Kent Overstreet
b3c2a06b7d bcachefs: Simplify transaction commit error path
The transaction restart path traverses all iterators, we don't need to
do it here.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:48 -04:00
Kent Overstreet
8a92e54559 bcachefs: Ensure journal reclaim runs when btree key cache is too dirty
Ensuring the key cache isn't too dirty is critical for ensuring that the
shrinker can reclaim memory.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:48 -04:00
Kent Overstreet
3dc5fcfcf5 bcachefs: Convert tracepoints to use %ps, not %pf
Symbol decoding was changed from %pf to %ps

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:47 -04:00
Kent Overstreet
a301dc38ef bcachefs: Improve tracing for transaction restarts
We have a bug where we can get stuck with a process spinning in
transaction restarts - need more information.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet
e6d1161530 bcachefs: Make copygc thread global
Per device copygc threads don't move data to different devices and they
make fragmentation works - they don't make much sense anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:42 -04:00
Kent Overstreet
297604c923 bcachefs: Add a few tracepoints
Transaction restart tracing should probably be overhaulled at some
point.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:38 -04:00
Kent Overstreet
20bceecb31 bcachefs: More work to avoid transaction restarts
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:22 -04:00
Kent Overstreet
ed8413fdab bcachefs: improved btree locking tracepoints
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:22 -04:00
Kent Overstreet
c43a6ef9a0 bcachefs: btree_bkey_cached_common
This is prep work for the btree key cache: btree iterators will point to
either struct btree, or a new struct bkey_cached.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:21 -04:00
Kent Overstreet
ba5c65576b bcachefs: Add actual tracepoints for transaction restarts
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:21 -04:00
Kent Overstreet
26609b619f bcachefs: Make bkey types globally unique
this lets us get rid of a lot of extra switch statements - in a lot of
places we dispatch on the btree node type, and then the key type, so
this is a nice cleanup across a lot of code.

Also improve the on disk format versioning stuff.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:12 -04:00
Kent Overstreet
b2be7c8b73 bcachefs: kill bucket mark sector count saturation
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:08 -04:00
Kent Overstreet
1c6fdbd8f2 bcachefs: Initial commit
Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write
filesystem with every feature you could possibly want.

Website: https://bcachefs.org

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:07 -04:00