Add a new helper that returns true if the given btree ID uses the btree
key cache. This enables some new cleanups, since the helper can check
the options for whether caching is enabled on a given btree.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
More prep work for getting rid of the in-memory bucket array: now that
we have BTREE_ITER_WITH_JOURNAL, the allocator code can do ntree lookups
before journal replay is finished, and there's no longer any need for it
to get allocation information from the in-memory bucket array.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
With BTREE_ITER_FILTER_SNAPSHOTS, we have to distinguish between the
path where the key was found, and the path for inserting into the
current snapshot. This adds a new field to struct btree_iter for saving
a path for the current snapshot, and plumbs it through
bch2_trans_update().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This splits bch2_btree_iter() up into two functions: an inner function
that handles BTREE_ITER_WITH_JOURNAL, BTREE_ITER_WITH_UPDATES, and
iterating acrcoss leaf nodes, and an outer one that implements
BTREE_ITER_FILTER_SNAPHSOTS.
This is prep work for remember a btree_path at our update position in
BTREE_ITER_FILTER_SNAPSHOTS mode.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This improves the transaction restart tracepoints - adding distinct
tracepoints for all the locations and reasons a transaction might have
been restarted, and ensures that there's a tracepoint for every
transaction restart.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Symbol decoding, via %ps, isn't supported in userspace - this will also
be faster when we're using trans->fn in the fast path, as with the new
BCH_JSET_ENTRY_log journal messages.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This adds a new btree iterator flag, BTREE_ITER_WITH_JOURNAL, that is
automatically enabled when initializing a btree iterator before journal
replay has completed - it overlays the contents of the journal with the
btree.
This lets us delete bch2_btree_and_journal_walk() and just use the
normal btree iterator interface instead - which also lets us delete a
significant amount of duplicated code.
Note that BTREE_ITER_WITH_JOURNAL is still unoptimized in this patch -
we're redoing the binary search over keys in the journal every time we
call bch2_btree_iter_peek().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This adds a flag to not mark the initial btree_path as preserve, for
paths that we expect to be cheap to reconstitute if necessary - this
solves a btree_path overflow caused by need_whiteout_for_snapshot().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This fixes some bugs when we hit an error very early in the filesystem
startup path, before most things have been initialized.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This puts the btree_transactions sysfs/debugfs file behind a separate
config option - it's highly useful, but not cheap enough to enable
permenantly.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This adds a new assertion to be used by bch2_inode_update_after_write(),
which updates the VFS inode based on the update to the btree inode we
just did - we require that the btree inode still be locked when we do
that update.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
- We should only be clearing should_be_locked in btree_path_set_pos() -
it's the responsiblity of the btree_path code, not the btree_iter
code.
- bch2_path_put() needs to pay attention to path->should_be_locked, to
ensure we don't drop locks we're supposed to be keeping.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
The bch2_btree_path_upgrade() call was failing and tripping an assert -
path->level + 1 is in this case not necessarily exactly what we want,
fix it by upgrading exactly the locks we want.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This helps to unify the interface between bch2_mark_key() and
bch2_trans_mark_key() - and it also gives access to the journal
reservation and journal seq in the mark_key path.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
But we don't need to call it from outside the btree iterator code
anymore, since it's called by bch2_trans_begin() and
bch2_btree_path_traverse().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Code that uses for_each_btree_key often wants transaction restarts to be
handled locally and not returned. Originally, we wouldn't return
transaction restarts if there was a single iterator in the transaction -
the reasoning being if there weren't other iterators being invalidated,
and the current iterator was being advanced/retraversed, there weren't
any locks or iterators we were required to preserve.
But with the btree_path conversion that approach doesn't work anymore -
even when we're using for_each_btree_key() with a single iterator there
will still be two paths in the transaction, since we now always preserve
the path at the pos the iterator was initialized at - the reason being
that on restart we often restart from the same place.
And it turns out there's now a lot of for_each_btree_key() uses that _do
not_ want transaction restarts handled locally, and should be returning
them.
This patch splits out for_each_btree_key_norestart() and
for_each_btree_key_continue_norestart(), and converts existing users as
appropriate. for_each_btree_key(), for_each_btree_key_continue(), and
for_each_btree_node() now handle transaction restarts themselves by
calling bch2_trans_begin() when necessary - and the old hack to not
return transaction restarts when there's a single path in the
transaction has been deleted.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Now that peek_node()/next_node() are converted to return errors
directly, we don't need bch2_trans_exit() to return errors - it's
cleaner this way and wasn't used much anymore.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This changes for_each_btree_node() to work like for_each_btree_key(),
and to that end bch2_btree_iter_peek_node() and next_node() also return
error ptrs.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
- check for getting to the end of the btree in bch2_path_verify_locks
and __btree_path_traverse_all(), this fixes an infinite loop in
__btree_path_traverse_all().
- relax requirement in bch2_btree_node_upgrade() that we must want an
intent lock, this fixes bugs with paths that point to interior nodes
(nonzero level).
- bch2_btree_node_update_key(): fix it to upgrade the path to an intent
lock, if necessary
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Btree node iterators want the interior btree_path to point to the same
pos as the returned btree node - this fixes a regression from the
introduction of btree_path, where rewriting/updating keys of btree nodes
(e.g. in bch2_dev_metadata_drop()) via btree node iterators.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Now that all the existing code has been converted for snapshots, this
patch changes the code for initializing a btree iterator to require a
snapshot to be specified, and also change bkey_invalid() to allow for
non U32_MAX snapshot IDs.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
For snapshots, we need to implement btree lookups that return the first
key that's an ancestor of the snapshot ID the lookup is being done in -
and filter out keys in unrelated snapshots. This patch adds the btree
iterator flag BTREE_ITER_FILTER_SNAPSHOTS which does that filtering.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Figured out the bug we were chasing, and it had nothing to do with
locking btree iterators/paths out of order.
This reverts commit ff08733dd298c969aec7c7828095458f73fd5374.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
We need to take all needed intent locks when relocking an iterator:
bch2_btree_path_traverse() had a special cased, faster version of this,
but it really should be in up_until_good_node() so that set_pos() can
use it too.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This patch significantly reduces the number of btree lookups required in
the extent update path.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
New rule is: if a btree path holds any locks it should be holding
precisely the locks wanted (accoringing to path->level and
path->locks_want).
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Since iter->real_pos was introduced, we no longer have to deal with
extent btree iterators that have skipped past deleted keys - this is a
real performance improvement on btree updates.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
btree_path_traverse_all() traverses btree iterators in sorted order, and
thus shouldn't see transaction restarts due to potential deadlocks - but
sometimes we do. This patch adds some more assertions and tracks some
more state to help track this down.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This splits btree_iter into two components: btree_iter is now the
externally visible componont, and it points to a btree_path which is now
reference counted.
This means we no longer have to clone iterators up front if they might
be mutated - btree_path can be shared by multiple iterators, and cloned
if an iterator would mutate a shared btree_path. This will help us use
iterators more efficiently, as well as slimming down the main long lived
state in btree_trans, and significantly cleans up the logic for iterator
lifetimes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We really only need to distinguish between btree iterators and btree key
cache iterators - this is more prep work for btree_path.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This was used for an optimization that hasn't existing in quite awhile
- iter->uptodate will probably be going away as well.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
These utility functions are for managing btree node state within a
btree_trans - rename them for consistency, and drop some unneeded
arguments.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This is prep work for splitting btree_path out from btree_iter -
btree_path will not have a pointer to btree_trans.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
BTREE_ITER_SET_POS_AFTER_COMMIT is used internally to automagically
advance extent btree iterators on sucessful commit.
But with the upcomnig btree_path patch it's getting more awkward to
support, and it adds overhead to core data structures that's only used
in a few places, and can be easily done by the caller instead.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This factors out bch2_dump_trans_iters_updates() from the iter alloc
overflow path, and makes some small improvements to what it prints.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
iter->real_pos needs to match the key returned or bad things will happen
when we go to update the key at that position. When we returned a
pending update from btree_trans_peek_updates(), this wasn't necessarily
the case.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This makes the flow control in bch2_btree_iter_peek() and
bch2_btree_iter_peek_prev() a bit cleaner.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>