1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

230 commits

Author SHA1 Message Date
Kent Overstreet
7fec8266af bcachefs: Error message improvement
- Centralize format strings in bcachefs.h
 - Add bch2_fmt_inum_offset() and related helpers
 - Switch error messages for inodes to also print out the offset, in
   bytes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:46 -04:00
Kent Overstreet
8eb71e9e1a bcachefs: Improve a few warnings
Warnings ought to always have a format string/log message - makes them
considerably more useful.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:46 -04:00
Kent Overstreet
6b1b186a5a bcachefs: Minor dio write path improvements
This switches where we take quota reservations to be per bch_wirte_op
instead of per dio_write, so we can drop the quota reservation in the
same place as we call i_sectors_acct(), and only take/release
ei_quota_lock once.

In the future we'd like ei_quota_lock to not be a mutex, so that we can
avoid punting to process context before deliving write completions in
nocow mode.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:46 -04:00
Kent Overstreet
a7ecd30c83 bcachefs: Factor out two_state_shared_lock
We have a unique lock used for controlling adding to the pagecache: the
lock has two states, where both states are shared - the lock may be held
multiple times for either state - but not both states at the same time.

This is exactly what we need for nocow mode locking, so this patch pulls
it out of fs.c into its own file.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:45 -04:00
Kent Overstreet
a1ee777bfc bcachefs: Kill BCH_WRITE_FLUSH
BCH_WRITE_FLUSH is a write flag that causes a journal flush.  It's only
used in the direct IO path, and this will allow for some consolidation
with the regular fsync path, which will help with the upcoming nocow
mode.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:45 -04:00
Kent Overstreet
182c7bbfbf bcachefs: DIO write path optimization
- With BCH_WRITE_SYNC, we no longer need the completion in struct
   dio_write
 - Pull out bch2_dio_write_copy_iov() into a separate non-inline
   function, it's code that doesn't run in the common case
 - Copy mapping and inode pointers into dio_write, avoiding pointer
   chasing at the start of bch2_dio_write_loop()
 - kthread_use_mm() is not needed in the common case; move it into
   bch2_dio_write_loop_async()
 - factor out various helpers from bch2_dio_write_loop() and rework
   control flow for better icache utilization

Other small optimizations:

 - bch2_keylist_free() is only used in one place, at the end of the
   bch2_write() path - drop the reinit
 - in bch2_disk_reservation_put(), check if res->sectors is nonzero
   before touching c->online_reserved, since that will likely be a cache
   miss

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: More DIO write path optimization

Better code prefetching (?)

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:45 -04:00
Kent Overstreet
1df3e19996 bcachefs: BCH_WRITE_SYNC
This adds a new flag for the write path, BCH_WRITE_SYNC, and switches
the O_DIRECT write path to use it when we're not running asynchronously.

It runs the btree update after the write in the original thread's
context instead of a kworker, cutting context switches in half.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:45 -04:00
Kent Overstreet
80fe580c8d bcachefs: Fix a spurious warning
Fixes fstests generic/648

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:45 -04:00
Kent Overstreet
353448f3ea bcachefs: Fix buffered write path for generic/275
Per fstests generic/275, on -ENOSPC we're supposed write until the
filesystem is full - i.e. do a partial write instead of failing the full
write.

This is a partial fix for the buffered write path: we'll still fail on a
page boundary.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:45 -04:00
Kent Overstreet
3e3e02e6bc bcachefs: Assorted checkpatch fixes
checkpatch.pl gives lots of warnings that we don't want - suggested
ignore list:

 ASSIGN_IN_IF
 UNSPECIFIED_INT	- bcachefs coding style prefers single token type names
 NEW_TYPEDEFS		- typedefs are occasionally good
 FUNCTION_ARGUMENTS	- we prefer to look at functions in .c files
			  (hopefully with docbook documentation), not .h
			  file prototypes
 MULTISTATEMENT_MACRO_USE_DO_WHILE
			- we have _many_ x-macros and other macros where
			  we can't do this

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:44 -04:00
Kent Overstreet
bd954215ca bcachefs: Quota fixes
- We now correctly allow soft limits to be exceeded, instead of always
   returning -EDQUOT
 - Disk quota grate times/warnings can now be set, not just the
   systemwide defaults

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:44 -04:00
Kent Overstreet
07bfcc0b4c bcachefs: Fix for not dropping privs in fallocate
When modifying a file, we may be required to drop the suid/sgid bits -
we were missing a file_modified() call to do this.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet
3a4d3656e5 bcachefs: Fix bch2_write_begin()
An error case was jumping to the wrong label, creating an infinite loop
- oops.

This fixes fstests generic/648.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet
e8540e5681 bcachefs: Reflink now respects quotas
This adds a new helper, quota_reserve_range(), which takes a quota
reservation for unallocated blocks in a given file range, and uses it in
bch2_remap_file_range().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet
2d848dacb2 bcachefs: Kill io_in_flight semaphore
This used to be needed more for buffered IO, but now the block layer has
writeback throttling - we can delete this now.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
098ef98d5b bcachefs: Add private error codes for ENOSPC
Continuing the saga of introducing private dedicated error codes for
each error path, this patch converts ENOSPC to error codes that are
subtypes of ENOSPC. We've recently had a test failure where we got
-ENOSPC where we shouldn't have, and didn't have enough information to
tell where it came from, so this patch will solve that problem.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:40 -04:00
Kent Overstreet
5c1ef830f6 bcachefs: Errcodes can now subtype standard error codes
The next patch is going to be adding private error codes for all the
places we return -ENOSPC.

Additionally, this patch updates return paths at all module boundaries
to call bch2_err_class(), to return the standard error code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:40 -04:00
Kent Overstreet
549d173c1b bcachefs: EINTR -> BCH_ERR_transaction_restart
Now that we have error codes, with subtypes, we can switch to our own
error code for transaction restarts - and even better, a distinct error
code for each transaction restart reason: clearer code and better
debugging.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:37 -04:00
Kent Overstreet
a3d7afa5c1 bcachefs: Always use percpu_ref_tryget_live() on c->writes
If we're trying to get a ref and the refcount has been killed, it means
we're doing an emergency shutdown - we always want tryget_live().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:34 -04:00
Kent Overstreet
facc81479c bcachefs: Delete bch_writepage
Per Dave Chinner and the xfs folks, .writepage is no longer needed, and
it's better not to define it if .writepages is the intended path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:32 -04:00
Kent Overstreet
b33bf1bc0d bcachefs: Go emergency RO when i_blocks underflows
This improves some of our warnings and assertions - they imply possible
filesystem inconsistencies, so they should be calling
bch2_fs_inconsistent().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:31 -04:00
Kent Overstreet
7c4ca54ae6 bcachefs: Don't skip triggers in fcollapse()
With backpointers this doesn't work anymore - backpointers always need
to be updated to point to the new extent position.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:31 -04:00
Kent Overstreet
f8494d2535 bcachefs: Convert some WARN_ONs to WARN_ON_ONCE
These warnings are symptomatic of something else going wrong, we don't
want them spamming up the logs as that'll make it harder to find the
real issue.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:28 -04:00
Kent Overstreet
9552e19f6f bcachefs: Fix dio write path with loopback dio mode
When the iov_iter is a bvec iter, it's possible the IO was submitted
from a kthread that didn't have an mm to switch to.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:27 -04:00
Kent Overstreet
4d126dc8b3 bcachefs: Use bio_iov_vecs_to_alloc()
This fixes a bug in the DIO read path where, when using a loopback
device in DIO mode, we'd allocate a biovec that would get overwritten
and leaked in bio_iov_iter_get_pages() -> bio_iov_bvec_set().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:27 -04:00
Kent Overstreet
eb331fe5a4 bcachefs: Check for stale dirty pointer before reads
Since we retry reads when we discover we read from a pointer that went
stale, if a dirty pointer is erroniously stale it would cause us to loop
retrying that read forever - unless we check before issuing the read,
while the btree is still locked, when we know that a dirty pointer
should never be stale.

This patch adds that check, along with printing some helpful debug info.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:24 -04:00
Kent Overstreet
57cfdd8b54 bcachefs: BTREE_ITER_FILTER_SNAPSHOTS is selected automatically
It doesn't have to be specified - this patch deletes the two instances
where it was.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:21 -04:00
Kent Overstreet
51c4e406aa bcachefs: Fix an assertion in bch2_truncate()
We recently added an assertion that when we truncate a file to 0,
i_blocks should also go to 0 - but that's not necessarily true if we're
doing an emergency shutdown, lots of invariants no longer hold true in
that case.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:18 -04:00
Kent Overstreet
f54788cc8c bcachefs: Convert a BUG_ON() to a warning
A user reported hitting this assertion, and we can't reproduce it yet,
but it shouldn't be fatal - so convert it to a warning.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:18 -04:00
Kent Overstreet
dcfc593f7b bcachefs: Fix page state after fallocate
This tweaks the fallocate code to also update the page cache to reflect
the new on disk reservations, giving us better i_sectors consistency.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:17 -04:00
Kent Overstreet
e6ec361f95 bcachefs: Fix page state when reading into !PageUptodate pages
This patch adds code to read page state before writing to pages that
aren't uptodate, which corrects i_sectors being tempororarily too large
and means we may not need to get a disk reservation.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

# Conflicts:
#	fs/bcachefs/fs-io.c
2023-10-22 17:09:17 -04:00
Kent Overstreet
7279c1a24c bcachefs: Kill PAGE_SECTOR_SHIFT
Replace it with the new, standard PAGE_SECTORS_SHIFT

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:17 -04:00
Kent Overstreet
084d42bbd6 bcachefs: Apply workaround for too many btree iters to read path
Reading from cached data, which calls bch2_bucket_io_time_reset(), is
leading to transaction iterator overflows - this standardizes the
workaround.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:17 -04:00
Kent Overstreet
b44a66a641 bcachefs: SECTOR_DIRTY_RESERVED
This fixes another i_sectors accounting bug - we need to differentiate
between dirty writes that overwrite a reservation and dirty writes to
unallocated space - dirty writes to unallocated space increase
i_sectors, dirty writes over a reservation do not.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:17 -04:00
Kent Overstreet
b19d307dc1 bcachefs: Fix i_sectors_leak in bch2_truncate_page
When bch2_truncate_page() discards dirty sectors in the page cache, we
need to account for that - we don't need to account for allocated
sectors because that'll be done by the bch2_fpunch() call when it
updates the btree.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:17 -04:00
Kent Overstreet
8810386f6b bcachefs: Fix an i_sectors accounting bug
We weren't checking for errors before calling i_sectors_acct()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:17 -04:00
Kent Overstreet
f74a5051b0 bcachefs: Don't check for -ENOSPC in page writeback
If at all possible we'd prefer to not fail page writeback unless the
filesystem has been shutdown; allowing errors in page writeback means
things we'd like to assert about i_size consistency between the VFS and
the btree go out the window.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:16 -04:00
Kent Overstreet
74163da7c8 bcachefs: Fallocate fixes
- fpunch wasn't always correctly updating i_size - when we drop buffered
  writes that were extending a file, we become responsible for writing
  i_size.

- fzero was sometimes zeroing out more data that it should have -
  block_start and block_end were being rounded in the wrong directions

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:16 -04:00
Kent Overstreet
68a2054d88 bcachefs: Switch fsync to use bi_journal_seq
Now that we're recording in each inode the journal sequence number of
the most recent update, fsync becomes a lot simpler and we can delete
all the plumbing for ei_journal_seq.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:16 -04:00
Kent Overstreet
e5fa91d7ac bcachefs: Fix restart handling in for_each_btree_key()
Code that uses for_each_btree_key often wants transaction restarts to be
handled locally and not returned. Originally, we wouldn't return
transaction restarts if there was a single iterator in the transaction -
the reasoning being if there weren't other iterators being invalidated,
and the current iterator was being advanced/retraversed, there weren't
any locks or iterators we were required to preserve.

But with the btree_path conversion that approach doesn't work anymore -
even when we're using for_each_btree_key() with a single iterator there
will still be two paths in the transaction, since we now always preserve
the path at the pos the iterator was initialized at - the reason being
that on restart we often restart from the same place.

And it turns out there's now a lot of for_each_btree_key() uses that _do
not_ want transaction restarts handled locally, and should be returning
them.

This patch splits out for_each_btree_key_norestart() and
for_each_btree_key_continue_norestart(), and converts existing users as
appropriate. for_each_btree_key(), for_each_btree_key_continue(), and
for_each_btree_node() now handle transaction restarts themselves by
calling bch2_trans_begin() when necessary - and the old hack to not
return transaction restarts when there's a single path in the
transaction has been deleted.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:14 -04:00
Kent Overstreet
9a796fdb06 bcachefs: bch2_trans_exit() no longer returns errors
Now that peek_node()/next_node() are converted to return errors
directly, we don't need bch2_trans_exit() to return errors - it's
cleaner this way and wasn't used much anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:14 -04:00
Kent Overstreet
8c6d298ab2 bcachefs: Convert io paths for snapshots
This plumbs around the subvolume ID as was done previously for other
filesystem code, but now for the IO paths - the control flow in the IO
paths is trickier so the changes in this patch are more involved.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:12 -04:00
Kent Overstreet
6fed42bb77 bcachefs: Plumb through subvolume id
To implement snapshots, we need every filesystem btree operation (every
btree operation without a subvolume) to start by looking up the
subvolume and getting the current snapshot ID, with
bch2_subvolume_get_snapshot() - then, that snapshot ID is used for doing
btree lookups in BTREE_ITER_FILTER_SNAPSHOTS mode.

This patch adds those bch2_subvolume_get_snapshot() calls, and also
switches to passing around a subvol_inum instead of just an inode
number.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:12 -04:00
Kent Overstreet
67e0dd8f0d bcachefs: btree_path
This splits btree_iter into two components: btree_iter is now the
externally visible componont, and it points to a btree_path which is now
reference counted.

This means we no longer have to clone iterators up front if they might
be mutated - btree_path can be shared by multiple iterators, and cloned
if an iterator would mutate a shared btree_path. This will help us use
iterators more efficiently, as well as slimming down the main long lived
state in btree_trans, and significantly cleans up the logic for iterator
lifetimes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:11 -04:00
Kent Overstreet
9f6bd30703 bcachefs: Reduce iter->trans usage
Disfavoured, and should go away.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:10 -04:00
Kent Overstreet
3737e0ddfb bcachefs: Fix an unhandled transaction restart
__bch2_read() -> __bch2_read_extent() -> bch2_bucket_io_time_reset() may
cause a transaction restart, which we don't return an error for because
it doesn't prevent us from making forward progress on the read we're
submitting.

Instead, change __bch2_read() and bchfs_read() to check for transaction
restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:10 -04:00
Kent Overstreet
700c25b32a bcachefs: Use bch2_trans_begin() more consistently
Upcoming patch will require that a transaction restart is always
immediately followed by bch2_trans_begin().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
8b3e9bd65f bcachefs: Always check for transaction restarts
On transaction restart iterators won't be locked anymore - make sure
we're always checking for errors.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
b97bbd4ec3 bcachefs: Use bch2_inode_find_by_inum() in truncate
This is needed for snapshots because we need to start handling lock
restarts even when just calling bch2_inode_peek().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
5468f1195d bcachefs: Fix a memory leak in the dio write path
There were some error paths where we were leaking page refs - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:08 -04:00