1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00
Commit graph

2632 commits

Author SHA1 Message Date
Qu Wenruo
4317ff0056 btrfs: introduce btrfs_bio::fs_info member
Currently we're doing a lot of work for btrfs_bio:

- Checksum verification for data read bios
- Bio splits if it crosses stripe boundary
- Read repair for data read bios

However for the incoming scrub patches, we don't want this extra
functionality at all, just plain logical + mirror -> physical mapping
ability.

Thus here we do the following changes:

- Introduce btrfs_bio::fs_info
  This is for the new scrub specific btrfs_bio, which would not populate
  btrfs_bio::inode.
  Thus we need such new member to grab a fs_info

  This new member will always be populated.

- Replace @inode argument with @fs_info for btrfs_bio_init() and its
  caller
  Since @inode is no longer a mandatory member, replace it with
  @fs_info, and let involved users populate @inode.

- Skip checksum verification and generation if @bbio->inode is NULL

- Add extra ASSERT()s
  To make sure:

  * bbio->inode is properly set for involved read repair path
  * if @file_offset is set, bbio->inode is also populated

- Grab @fs_info from @bbio directly
  We can no longer go @bbio->inode->root->fs_info, as bbio->inode can be
  NULL. This involves:

  * btrfs_simple_end_io()
  * should_async_write()
  * btrfs_wq_submit_bio()
  * btrfs_use_zone_append()

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:23 +02:00
Christoph Hellwig
3480373ebd btrfs, block: move REQ_CGROUP_PUNT to btrfs
REQ_CGROUP_PUNT is a bit annoying as it is hard to follow and adds
a branch to the bio submission hot path.  To fix this, export
blkcg_punt_bio_submit and let btrfs call it directly.  Add a new
REQ_FS_PRIVATE flag for btrfs to indicate to it's own low-level
bio submission code that a punt to the cgroup submission helper
is required.

Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:22 +02:00
Christoph Hellwig
896d7c1a90 btrfs: also use kthread_associate_blkcg for uncompressible ranges
submit_one_async_extent needs to use submit_one_async_extent no matter
if the range it handles ends up beeing compressed or not as the deadlock
risk due to cgroup thottling is the same.  Call kthread_associate_blkcg
earlier to cover submit_uncompressed_range case as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:22 +02:00
Christoph Hellwig
e43a6210b7 btrfs: don't free the async_extent in submit_uncompressed_range
Let submit_one_async_extent, which is the only caller of
submit_uncompressed_range handle freeing of the async_extent in one
central place.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:22 +02:00
Christoph Hellwig
05d06a5c9d btrfs: move kthread_associate_blkcg out of btrfs_submit_compressed_write
btrfs_submit_compressed_write should not have to care if it is called
from a helper thread or not.  Move the kthread_associate_blkcg handling
into submit_one_async_extent, as that is the one caller that needs it.
Also move the assignment of REQ_CGROUP_PUNT into cow_file_range_async,
as that is the routine that sets up the helper thread offload.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:22 +02:00
Boris Burkov
b73a6fd1b1 btrfs: split partial dio bios before submit
If an application is doing direct io to a btrfs file and experiences a
page fault reading from the write buffer, iomap will issue a partial
bio, and allow the fs to keep going. However, there was a subtle bug in
this code path in the btrfs dio iomap implementation that led to the
partial write ending up as a gap in the file's extents and to be read
back as zeros.

The sequence of events in a partial write, lightly summarized and
trimmed down for brevity is as follows:

==== WRITING TASK ====
 btrfs_direct_write
 __iomap_dio_write
 iomap_iter
 btrfs_dio_iomap_begin # create full ordered extent
 iomap_dio_bio_iter
 bio_iov_iter_get_pages # page fault; partial read
 submit_bio # partial bio
 iomap_iter
 btrfs_dio_iomap_end
 btrfs_mark_ordered_io_finished # sets BTRFS_ORDERED_IOERR;
				# submit to finish_ordered_fn wq
 fault_in_iov_iter_readable # btrfs_direct_write detects partial write
 __iomap_dio_write
 iomap_iter
 btrfs_dio_iomap_begin # create second partial ordered extent
 iomap_dio_bio_iter
 bio_iov_iter_get_pages # read all of remainder
 submit_bio # partial bio with all of remainder
 iomap_iter
 btrfs_dio_iomap_end # nothing exciting to do with ordered io

==== DIO ENDIO ====
== FIRST PARTIAL BIO ==
 btrfs_dio_end_io
 btrfs_mark_ordered_io_finished # bytes_left > 0
			        # don't submit to finish_ordered_fn wq
== SECOND PARTIAL BIO ==
 btrfs_dio_end_io
 btrfs_mark_ordered_io_finished # bytes_left == 0
			        # submit to finish_ordered_fn wq

==== BTRFS FINISH ORDERED WQ ====
== FIRST PARTIAL BIO ==
 btrfs_finish_ordered_io # called by dio_iomap_end_io, sees
		         # BTRFS_ORDERED_IOERR, just drops the
		         # ordered_extent
==SECOND PARTIAL BIO==
 btrfs_finish_ordered_io # called by btrfs_dio_end_io, writes out file
		         # extents, csums, etc...

The essence of the problem is that while btrfs_direct_write and iomap
properly interact to submit all the correct bios, there is insufficient
logic in the btrfs dio functions (btrfs_dio_iomap_begin,
btrfs_dio_submit_io, btrfs_dio_end_io, and btrfs_dio_iomap_end) to
ensure that every bio is at least a part of a completed ordered_extent.
And it is completing an ordered_extent that results in crucial
functionality like writing out a file extent for the range.

More specifically, btrfs_dio_end_io treats the ordered extent as
unfinished but btrfs_dio_iomap_end sets BTRFS_ORDERED_IOERR on it.
Thus, the finish io work doesn't result in file extents, csums, etc.
In the aftermath, such a file behaves as though it has a hole in it,
instead of the purportedly written data.

We considered a few options for fixing the bug:

  1. treat the partial bio as if we had truncated the file, which would
     result in properly finishing it.
  2. split the ordered extent when submitting a partial bio.
  3. cache the ordered extent across calls to __iomap_dio_rw in
     iter->private, so that we could reuse it and correctly apply
     several bios to it.

I had trouble with 1, and it felt the most like a hack, so I tried 2
and 3. Since 3 has the benefit of also not creating an extra file
extent, and avoids an ordered extent lookup during bio submission, it
felt like the best option. However, that turned out to re-introduce a
deadlock which this code discarding the ordered_extent between faults
was meant to fix in the first place. (Link to an explanation of the
deadlock below.)

Therefore, go with fix 2, which requires a bit more setup work but fixes
the corruption without introducing the deadlock, which is fundamentally
caused by the ordered extent existing when we attempt to fault in a
range that overlaps with it.

Put succinctly, what this patch does is: when we submit a dio bio, check
if it is partial against the ordered extent stored in dio_data, and if it
is, extract the ordered_extent that matches the bio exactly out of the
larger ordered_extent. Keep the remaining ordered_extent around in dio_data
for cancellation in iomap_end.

Thanks to Josef, Christoph, and Filipe with their help figuring out the
bug and the fix.

Fixes: 51bd9563b6 ("btrfs: fix deadlock due to page faults during direct IO reads and writes")
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2169947
Link: https://lore.kernel.org/linux-btrfs/aa1fb69e-b613-47aa-a99e-a0a2c9ed273f@app.fastmail.com/
Link: https://pastebin.com/3SDaH8C6
Link: https://lore.kernel.org/linux-btrfs/20230315195231.GW10580@twin.jikos.cz/T/#t
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Boris Burkov <boris@bur.io>
[ hch: refactored the ordered_extent extraction ]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:21 +02:00
Boris Burkov
f0f5329a00 btrfs: don't split NOCOW extent_maps in btrfs_extract_ordered_extent
NOCOW writes just overwrite an existing extent map, which thus should
not be split in btrfs_extract_ordered_extent.  The NOCOW case can't
currently happen as btrfs_extract_ordered_extent is only used on zoned
devices that do not support NOCOW writes, but this will change soon.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Boris Burkov <boris@bur.io>
[ hch: split from a larger patch, wrote a commit log ]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:21 +02:00
Christoph Hellwig
7edd339c8a btrfs: pass an ordered_extent to btrfs_extract_ordered_extent
To prepare for a new caller that already has the ordered_extent
available, change btrfs_extract_ordered_extent to take an argument
for it.  Add a wrapper for the bio case that still has to do the
lookup (for now).

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:21 +02:00
Christoph Hellwig
2e38a84bc6 btrfs: simplify extent map splitting and rename split_zoned_em
split_zoned_em is only ever asked to split out the beginning of an extent
map.  Change it to only take a len to split out instead of a pre and post
region.

Also rename the function to split_extent_map as there is nothing zoned
device specific about it.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:21 +02:00
Christoph Hellwig
8f4af4b8e1 btrfs: sink parameter len to btrfs_split_ordered_extent
btrfs_split_ordered_extent is only ever asked to split out the beginning
of an ordered_extent (i.e. post == 0).  Change it to only take a len to
split out, and switch it to allocate the new extent for the beginning,
as that helps with callers that want to keep a pointer to the
ordered_extent that it is stealing from.

Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:21 +02:00
Christoph Hellwig
11d33ab6c1 btrfs: simplify splitting logic in btrfs_extract_ordered_extent
btrfs_extract_ordered_extent is always used to split an ordered_extent
and extent_map into two parts, so it doesn't need to deal with a three
way split.

Simplify it by only allowing for a single split point, and always split
out the beginning of the extent, as that is what we'll later need to
be able to hold on to a reference to the original ordered_extent that
the first part is split off for submission.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:21 +02:00
Christoph Hellwig
e44ca71cfe btrfs: move ordered_extent internal sanity checks into btrfs_split_ordered_extent
Move the three checks that are about ordered extent internal sanity
checking into btrfs_split_ordered_extent instead of doing them in the
higher level btrfs_extract_ordered_extent routine.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:21 +02:00
Boris Burkov
53f2c20687 btrfs: stash ordered extent in dio_data during iomap dio
While it is not feasible for an ordered extent to survive across the
calls btrfs_direct_write makes into __iomap_dio_rw, it is still helpful
to stash it on the dio_data in between creating it in iomap_begin and
finishing it in either end_io or iomap_end.

The specific use I have in mind is that we can check if a particular bio
is partial in submit_io without unconditionally looking up the ordered
extent. This is a preparatory patch for a later patch which does just
that.

Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:21 +02:00
Filipe Manana
b7b1167c36 btrfs: simplify exit paths of btrfs_evict_inode()
Instead of using two labels at btrfs_evict_inode() for exiting depending
on whether we need to delete the inode items and orphan or some error
happened, we can use a single exit label if we initialize the block
reserve to NULL, since btrfs_free_block_rsv() ignores a NULL block reserve
pointer. So just do that. It will also make an upcoming change simpler by
avoiding one extra error label.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:20 +02:00
Filipe Manana
5630e2bcfe btrfs: use a constant for the number of metadata units needed for an unlink
Instead of hard coding the number of metadata units for an unlink operation
in a couple places, define a macro and use it instead. This eliminates the
problem of one place getting out of sync with the other, such as recently
fixed by the previous patch in the series ("btrfs: fix calculation of the
global block reserve's size").

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:20 +02:00
Filipe Manana
b13d57db90 btrfs: calculate correct amount of space for delayed reference when evicting
When evicting an inode, we are incorrectly calculating the amount of space
required for a single delayed reference in case the free space tree is
enabled. We have to multiply by 2 the result of
btrfs_calc_insert_metadata_size(). We should be calculating according to
the size update and space release of the delayed block reserve logic at
btrfs_update_delayed_refs_rsv() and btrfs_delayed_refs_rsv_release().

Fix this by using the btrfs_calc_delayed_ref_bytes() helper at
evict_refill_and_join() instead of btrfs_calc_insert_metadata_size().

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:20 +02:00
Filipe Manana
afa4b0afee btrfs: don't throttle on delayed items when evicting deleted inode
During inode eviction, if we are truncating a deleted inode, we don't add
delayed items for our inode, so there's no need to throttle on delayed
items on each iteration of the loop that truncates inode items from its
subvolume tree. But we dirty extent buffers from its subvolume tree, so
we only need to throttle on btree inode dirty pages.

So use btrfs_btree_balance_dirty_nodelay() in the loop that truncates
inode items.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:19 +02:00
Filipe Manana
4e0527deb3 btrfs: pass a bool to btrfs_block_rsv_migrate() at evict_refill_and_join()
The last argument of btrfs_block_rsv_migrate() is a boolean, but we are
passing an integer, with a value of 1, to it at evict_refill_and_join().
While this is not a bug, due to type conversion, it's a lot more clear to
simply pass the boolean true value instead. So just do that.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:18 +02:00
Christoph Hellwig
b41bbd293e btrfs: return a btrfs_bio from btrfs_bio_alloc
Return the containing struct btrfs_bio instead of the less type safe
struct bio from btrfs_bio_alloc.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:17 +02:00
Christoph Hellwig
ae42a154ca btrfs: pass a btrfs_bio to btrfs_submit_bio
btrfs_submit_bio expects the bio passed to it to be embedded into a
btrfs_bio structure.  Pass the btrfs_bio directly to increase type
safety and make the code self-documenting.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:17 +02:00
Christoph Hellwig
34f888ce3a btrfs: cleanup main loop in btrfs_encoded_read_regular_fill_pages
btrfs_encoded_read_regular_fill_pages has a pretty odd control flow.
Unwind it so that there is a single loop over the pages array.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:17 +02:00
Christoph Hellwig
b665affe93 btrfs: remove unused members from struct btrfs_encoded_read_private
The inode and file_offset members in struct btrfs_encoded_read_private
are unused, so remove them.

Last used in commit 7959bd4411 ("btrfs: remove the start argument to
check_data_csum and export") and commit 7609afac67 ("btrfs: handle
checksum validation and repair at the storage layer").

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:17 +02:00
Christoph Hellwig
544fe4a903 btrfs: embed a btrfs_bio into struct compressed_bio
Embed a btrfs_bio into struct compressed_bio.  This avoids potential
(so far theoretical) deadlocks due to nesting of btrfs_bioset allocations
for the original read bio and the compressed bio, and avoids an extra
memory allocation in the I/O path.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:14 +02:00
Johannes Thumshirn
fa13661c48 btrfs: open code btrfs_csum_ptr
Remove btrfs_csum_ptr() and fold it into it's only caller.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:13 +02:00
Josef Bacik
a13bb2c038 btrfs: add missing iputs on orphan cleanup failure
We missed a couple of iput()s in the orphan cleanup failure paths, add
them so we don't get refcount errors. The iput needs to be done in the
check and not under a common label due to the way the code is
structured.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17 18:01:13 +02:00
Linus Torvalds
285063049a for-6.3-rc3-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmQc0bUACgkQxWXV+ddt
 WDspCQ//TZRZxwvtgHuJO04vk/CyGrB/2FPytweM3QIjUkq7WaWxoDbgkXfJVuej
 qvdlNlugtXuuTZ87j7dTC2tP2agi0BWhJSO9C0S5z8GTYF2uewKknUD01uOZnKz0
 j++9ki5HfcAYbH80xpM2S4GqOz4FBsfRx/10WIdKOfHrB5jhbfMvN6rBE+UGged0
 Of9TZ9u4i5FMlY36G5+Rek/mhQrK2eFIn45IDwzQptUKnK+0OZ1qqk8ZUmAeT+hn
 6EY3ZXXJIhx6fMxqoeo2TelUWwknARgBQvPSY8YbwZc6T+ObZF0jxZx6n9ESVB8R
 AXOXoovn6+pnm3qi/8j8d0z88LYBrGOXPNp4vtXkKToW+6VWbrvM4zHnUSKCXMDy
 1eaxVcv3MDZ07+Y98XbUMJDKjQ4yHXKBMv/wPCTnvRl0ZZ9r4zFKpcFUSFyEM0rR
 rtwsWY8M2UDiF4ypouc9ep+xmxFxun9XQVmxGYprP/OduGwslex6xbrhrFJhlGja
 acbtA/1P5bZCcseeWcZRHqqwtfEH+ZOdG9+nBzxn7yKGcY0DDCQvbiH4HwlAts1R
 GhEQOtqP1szWKENSELluWwbuUdpaYrF3dcsUxtnJOLHsg0dwABm7buM0kiUPEUqK
 nZhAP4wXks6dGFB9V4BUybGtl0Vcr+5nhWCo8Wc/dLN5GMVzPvM=
 =XuDt
 -----END PGP SIGNATURE-----

Merge tag 'for-6.3-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A few more fixes, the zoned accounting fix is spread across a few
  patches, preparatory and the actual fixes:

   - zoned mode:
      - fix accounting of unusable zone space
      - fix zone activation condition for DUP profile
      - preparatory patches

   - improved error handling of missing chunks

   - fix compiler warning"

* tag 'for-6.3-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: zoned: drop space_info->active_total_bytes
  btrfs: zoned: count fresh BG region as zone unusable
  btrfs: use temporary variable for space_info in btrfs_update_block_group
  btrfs: rename BTRFS_FS_NO_OVERCOMMIT to BTRFS_FS_ACTIVE_ZONE_TRACKING
  btrfs: zoned: fix btrfs_can_activate_zone() to support DUP profile
  btrfs: fix compiler warning on SPARC/PA-RISC handling fscrypt_setup_filename
  btrfs: handle missing chunk mapping more gracefully
2023-03-24 08:32:10 -07:00
Sweet Tea Dorminy
10a8857a1b btrfs: fix compiler warning on SPARC/PA-RISC handling fscrypt_setup_filename
Commit 1ec49744ba ("btrfs: turn on -Wmaybe-uninitialized") exposed
that on SPARC and PA-RISC, gcc is unaware that fscrypt_setup_filename()
only returns negative error values or 0. This ultimately results in a
maybe-uninitialized warning in btrfs_lookup_dentry().

Change to only return negative error values or 0 from
fscrypt_setup_filename() at the relevant call site, and assert that no
positive error codes are returned (which would have wider implications
involving other users).

Reported-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/all/481b19b5-83a0-4793-b4fd-194ad7b978c3@roeck-us.net/
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-15 20:51:06 +01:00
Linus Torvalds
885ce48739 for-6.3-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmPzxWcACgkQxWXV+ddt
 WDt+fRAAg5pz7gWNMtIK30gp/uojjAkCWXymxRtK2tZU3naI+6IYSAKxuKq8Iz1Y
 drdlpSvTX/Gv3XlGB9QuoH6digTjQzeVzjAm0eP6w8t8354KGSRUYdtoFp8I8E5Z
 q0JUuZ6w/KvpZfOIsmcgpOScgcl+8+UlOxs2iuSrOvAqP8Dg1VCt5vBm7htIb0tm
 5ClbgmIacxWrOII55XGuY0mWuZSlS4hdyWdYMelvtM8aPPG+e8eEzKjscVOOueLz
 Smi1kN5QU3o+m4oKjN1OJlKfeURdbcZUwva9zOsegSbPHUzNwIao44cQ5cQhMR0r
 kI3nCpJwGKdUd6IblEdcqBN5F4V64edLSruOLuGYzxySnEWhFE2YU2xW/v5b1eQW
 GHurI52FGrPqcX9FgQNzfTjQzk341iQ0QIs5exycJH7xeohEZnlaK2yNUngKSo1C
 naqczEMMMcxNjQaooUuxRkL/zz36D/Dkyo2YOCODtWyu61XY9LqvaxMvClFI20lL
 40dzzYnnMQwkXJrQ/MVQhz1BBaPVqizt8+ErL7GQp2CWr9miD6mcA5b2pyZm5Q3r
 hHadzeTXXS7P9g9UnuDxpZqkhvadGC2Sy4l/D6jURyKFzr8mtplaRRwUS2gSuP3z
 zxavvP4UukwNWXxDz755NAhiGbA+xpSMATKCrZ/Sdogvxe8IhRg=
 =NCpw
 -----END PGP SIGNATURE-----

Merge tag 'for-6.3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs updates from David Sterba:
 "The usual mix of performance improvements and new features.

  The core change is reworking how checksums are processed, with
  followup cleanups and simplifications. There are two minor changes in
  block layer and iomap code.

  Features:

   - block group allocation class heuristics:
      - pack files by size (up to 128k, up to 8M, more) to avoid
        fragmentation in block groups, assuming that file size and life
        time is correlated, in particular this may help during balance
      - with tracepoints and extensible in the future

  Performance:

   - send: cache directory utimes and only emit the command when
     necessary
      - speedup up to 10x
      - smaller final stream produced (no redundant utimes commands
        issued)
      - compatibility not affected

   - fiemap: skip backref checks for shared leaves
      - speedup 3x on sample filesystem with all leaves shared (e.g. on
        snapshots)

   - micro optimized b-tree key lookup, speedup in metadata operations
     (sample benchmark: fs_mark +10% of files/sec)

  Core changes:

   - change where checksumming is done in the io path:
      - checksum and read repair does verification at lower layer
      - cascaded cleanups and simplifications

   - raid56 refactoring and cleanups

  Fixes:

   - sysfs: make sure that a run-time change of a feature is correctly
     tracked by the feature files

   - scrub: better reporting of tree block errors

  Other:

   - locally enable -Wmaybe-uninitialized after fixing all warnings

   - misc cleanups, spelling fixes

  Other code:

   - block: export bio_split_rw

   - iomap: remove IOMAP_F_ZONE_APPEND"

* tag 'for-6.3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (109 commits)
  btrfs: make kobj_type structures constant
  btrfs: remove the bdev argument to btrfs_rmap_block
  btrfs: don't rely on unchanging ->bi_bdev for zone append remaps
  btrfs: never return true for reads in btrfs_use_zone_append
  btrfs: pass a btrfs_bio to btrfs_use_append
  btrfs: set bbio->file_offset in alloc_new_bio
  btrfs: use file_offset to limit bios size in calc_bio_boundaries
  btrfs: do unsigned integer division in the extent buffer binary search loop
  btrfs: eliminate extra call when doing binary search on extent buffer
  btrfs: raid56: handle endio in scrub_rbio
  btrfs: raid56: handle endio in recover_rbio
  btrfs: raid56: handle endio in rmw_rbio
  btrfs: raid56: submit the read bios from scrub_assemble_read_bios
  btrfs: raid56: fold rmw_read_wait_recover into rmw_read_bios
  btrfs: raid56: fold recover_assemble_read_bios into recover_rbio
  btrfs: raid56: add a bio_list_put helper
  btrfs: raid56: wait for I/O completion in submit_read_bios
  btrfs: raid56: simplify code flow in rmw_rbio
  btrfs: raid56: simplify error handling and code flow in raid56_parity_write
  btrfs: replace btrfs_wait_tree_block_writeback by wait_on_extent_buffer_writeback
  ...
2023-02-20 12:54:27 -08:00
Linus Torvalds
05e6295f7b fs.idmapped.v6.3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCY+5NlQAKCRCRxhvAZXjc
 orOaAP9i2h3OJy95nO2Fpde0Bt2UT+oulKCCcGlvXJ8/+TQpyQD/ZQq47gFQ0EAz
 Br5NxeyGeecAb0lHpFz+CpLGsxMrMwQ=
 =+BG5
 -----END PGP SIGNATURE-----

Merge tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping

Pull vfs idmapping updates from Christian Brauner:

 - Last cycle we introduced the dedicated struct mnt_idmap type for
   mount idmapping and the required infrastucture in 256c8aed2b ("fs:
   introduce dedicated idmap type for mounts"). As promised in last
   cycle's pull request message this converts everything to rely on
   struct mnt_idmap.

   Currently we still pass around the plain namespace that was attached
   to a mount. This is in general pretty convenient but it makes it easy
   to conflate namespaces that are relevant on the filesystem with
   namespaces that are relevant on the mount level. Especially for
   non-vfs developers without detailed knowledge in this area this was a
   potential source for bugs.

   This finishes the conversion. Instead of passing the plain namespace
   around this updates all places that currently take a pointer to a
   mnt_userns with a pointer to struct mnt_idmap.

   Now that the conversion is done all helpers down to the really
   low-level helpers only accept a struct mnt_idmap argument instead of
   two namespace arguments.

   Conflating mount and other idmappings will now cause the compiler to
   complain loudly thus eliminating the possibility of any bugs. This
   makes it impossible for filesystem developers to mix up mount and
   filesystem idmappings as they are two distinct types and require
   distinct helpers that cannot be used interchangeably.

   Everything associated with struct mnt_idmap is moved into a single
   separate file. With that change no code can poke around in struct
   mnt_idmap. It can only be interacted with through dedicated helpers.
   That means all filesystems are and all of the vfs is completely
   oblivious to the actual implementation of idmappings.

   We are now also able to extend struct mnt_idmap as we see fit. For
   example, we can decouple it completely from namespaces for users that
   don't require or don't want to use them at all. We can also extend
   the concept of idmappings so we can cover filesystem specific
   requirements.

   In combination with the vfs{g,u}id_t work we finished in v6.2 this
   makes this feature substantially more robust and thus difficult to
   implement wrong by a given filesystem and also protects the vfs.

 - Enable idmapped mounts for tmpfs and fulfill a longstanding request.

   A long-standing request from users had been to make it possible to
   create idmapped mounts for tmpfs. For example, to share the host's
   tmpfs mount between multiple sandboxes. This is a prerequisite for
   some advanced Kubernetes cases. Systemd also has a range of use-cases
   to increase service isolation. And there are more users of this.

   However, with all of the other work going on this was way down on the
   priority list but luckily someone other than ourselves picked this
   up.

   As usual the patch is tiny as all the infrastructure work had been
   done multiple kernel releases ago. In addition to all the tests that
   we already have I requested that Rodrigo add a dedicated tmpfs
   testsuite for idmapped mounts to xfstests. It is to be included into
   xfstests during the v6.3 development cycle. This should add a slew of
   additional tests.

* tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (26 commits)
  shmem: support idmapped mounts for tmpfs
  fs: move mnt_idmap
  fs: port vfs{g,u}id helpers to mnt_idmap
  fs: port fs{g,u}id helpers to mnt_idmap
  fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap
  fs: port i_{g,u}id_{needs_}update() to mnt_idmap
  quota: port to mnt_idmap
  fs: port privilege checking helpers to mnt_idmap
  fs: port inode_owner_or_capable() to mnt_idmap
  fs: port inode_init_owner() to mnt_idmap
  fs: port acl to mnt_idmap
  fs: port xattr to mnt_idmap
  fs: port ->permission() to pass mnt_idmap
  fs: port ->fileattr_set() to pass mnt_idmap
  fs: port ->set_acl() to pass mnt_idmap
  fs: port ->get_acl() to pass mnt_idmap
  fs: port ->tmpfile() to pass mnt_idmap
  fs: port ->rename() to pass mnt_idmap
  fs: port ->mknod() to pass mnt_idmap
  fs: port ->mkdir() to pass mnt_idmap
  ...
2023-02-20 11:53:11 -08:00
Christoph Hellwig
04f0847c45 btrfs: don't rely on unchanging ->bi_bdev for zone append remaps
btrfs_record_physical_zoned relies on a bio->bi_bdev samples in the
bio_end_io handler to find the reverse map for remapping the zone append
write, but stacked block device drivers can and usually do change bi_bdev
when sending on the bio to a lower device.  This can happen e.g. with the
nvme-multipath driver when a NVMe SSD sets the shared namespace bit.

But there is no real need for the bdev in btrfs_record_physical_zoned,
as it is only passed to btrfs_rmap_block, which uses it to pick the
mapping to report if there are multiple reverse mappings.  As zone
writes can only do simple non-mirror writes right now, and anything
more complex will use the stripe tree there is no chance of the multiple
mappings case actually happening.

Instead open code the subset of btrfs_rmap_block in
btrfs_record_physical_zoned, which also removes a memory allocation and
remove the bdev field in the ordered extent.

Fixes: d8e3fb106f ("btrfs: zoned: use ZONE_APPEND write for zoned mode")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:55 +01:00
Christoph Hellwig
d5e4377d50 btrfs: split zone append bios in btrfs_submit_bio
The current btrfs zoned device support is a little cumbersome in the data
I/O path as it requires the callers to not issue I/O larger than the
supported ZONE_APPEND size of the underlying device.  This leads to a lot
of extra accounting.  Instead change btrfs_submit_bio so that it can take
write bios of arbitrary size and form from the upper layers, and just
split them internally to the ZONE_APPEND queue limits.  Then remove all
the upper layer warts catering to limited write sized on zoned devices,
including the extra refcount in the compressed_bio.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:53 +01:00
Christoph Hellwig
35a8d7da3c btrfs: remove now spurious bio submission helpers
Call btrfs_submit_bio and btrfs_submit_compressed_read directly from
submit_one_bio now that all additional functionality has moved into
btrfs_submit_bio.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:53 +01:00
Christoph Hellwig
285599b6fe btrfs: remove the fs_info argument to btrfs_submit_bio
btrfs_submit_bio can derive it trivially from bbio->inode, so stop
bothering in the callers.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:53 +01:00
Christoph Hellwig
48253076c3 btrfs: open code submit_encoded_read_bio
Open code the functionality in the only caller and remove the now
superfluous error handling there.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:53 +01:00
Qu Wenruo
a34e4c3f88 btrfs: remove stripe boundary calculation for encoded I/O
Stop looking at the stripe boundary in
btrfs_encoded_read_regular_fill_pages() now that btrfs_submit_bio can
split bios.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:52 +01:00
Christoph Hellwig
67d6698250 btrfs: pass the iomap bio to btrfs_submit_bio
Now that btrfs_submit_bio splits the bio when crossing stripe boundaries,
there is no need for the higher level code to do that manually.

For direct I/O this is really helpful, as btrfs_submit_io can now simply
take the bio allocated by iomap and send it on to btrfs_submit_bio
instead of allocating clones.

For that to work, the bio embedded into struct btrfs_dio_private needs to
become a full btrfs_bio as expected by btrfs_submit_bio.

With this change there is a single work item to offload the entire iomap
bio so the heuristics to skip async processing for bios that were split
isn't needed anymore either.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:52 +01:00
Christoph Hellwig
69ccf3f424 btrfs: handle recording of zoned writes in the storage layer
Move the code that splits the ordered extents and records the physical
location for them to the storage layer so that the higher level consumers
don't have to care about physical block numbers at all.  This will also
allow to eventually remove accounting for the zone append write sizes in
the upper layer with a little bit more block layer work.

Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:52 +01:00
Christoph Hellwig
f8a53bb58e btrfs: handle checksum generation in the storage layer
Instead of letting the callers of btrfs_submit_bio deal with checksumming
the (meta)data in the bio and making decisions on when to offload the
checksumming to the bio, leave that to btrfs_submit_bio.  Do do so the
existing btrfs_submit_bio function is split into an upper and a lower
half, so that the lower half can be offloaded to a workqueue.

Note that this changes the behavior for direct writes to raid56 volumes so
that async checksum offloading is not skipped when more I/O is expected.
This runs counter to the argument explaining why it was done, although I
can't measure any affects of the change.  Commits later in this series
will make sure the entire direct writes is offloaded to the workqueue
at once and thus make sure it is sent to the raid56 code from a single
thread.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:52 +01:00
Christoph Hellwig
f8c44673e5 btrfs: simplify the btrfs_csum_one_bio calling convention
To prepare for further bio submission changes btrfs_csum_one_bio
should be able to take all it's arguments from the btrfs_bio structure.
It can always use the bbio->inode already, and once the compression code
is updated to set ->file_offset that one can be used unconditionally
as well instead of looking at the page mapping now that btrfs doesn't
allow ordered extents to span discontiguous data ranges.

The only slightly tricky bit is the one_ordered flag set by the
compressed writes.  Replace that one with the driver private bio
flag, which gets cleared before the bio is handed off to the block layer
so that we don't get in the way of driver use.

Note: this leaves an argument and a flag to btrfs_wq_submit_bio unused.
But that whole mechanism will be removed in its current form in the
next patch.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:52 +01:00
Christoph Hellwig
deb6216fa0 btrfs: open code the submit_bio_start helpers
The submit helpers are now trivial and can be called directly.  Note
that btree_csum_one_bio has to be moved up in the file a bit to avoid a
forward declaration.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:52 +01:00
Christoph Hellwig
0571b6357c btrfs: remove the io_failure_record infrastructure
struct io_failure_record and the io_failure_tree tree are unused now,
so remove them. This in turn makes struct btrfs_inode smaller by 16
bytes.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:51 +01:00
Christoph Hellwig
3d49d0d312 btrfs: remove now unused checksumming helpers
Remove the unused btrfs_verify_data_csum helper, and fold
btrfs_check_data_csum into its only caller.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:51 +01:00
Christoph Hellwig
7609afac67 btrfs: handle checksum validation and repair at the storage layer
Currently btrfs handles checksum validation and repair in the end I/O
handler for the btrfs_bio.  This leads to a lot of duplicate code
plus issues with varying semantics or bugs, e.g.

- the until recently broken repair for compressed extents
- the fact that encoded reads validate the checksums but do not kick
  of read repair
- the inconsistent checking of the BTRFS_FS_STATE_NO_CSUMS flag

This commit revamps the checksum validation and repair code to instead
work below the btrfs_submit_bio interfaces.

In case of a checksum failure (or a plain old I/O error), the repair
is now kicked off before the upper level ->end_io handler is invoked.

Progress of an in-progress repair is tracked by a small structure
that is allocated using a mempool for each original bio with failed
sectors, which holds a reference to the original bio.   This new
structure is allocated using a mempool to guarantee forward progress
even under memory pressure.  The mempool will be replenished when
the repair completes, just as the mempools backing the bios.

There is one significant behavior change here:  If repair fails or
is impossible to start with, the whole bio will be failed to the
upper layer.  This is the behavior that all I/O submitters except
for buffered I/O already emulated in their end_io handler.  For
buffered I/O this now means that a large readahead request can
fail due to a single bad sector, but as readahead errors are ignored
the following readpage if the sector is actually accessed will
still be able to read.  This also matches the I/O failure handling
in other file systems.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:51 +01:00
Christoph Hellwig
e52190441b btrfs: add a btrfs_data_csum_ok helper
Add a new checksumming helper that wraps btrfs_check_data_csum and
does all the checks to if we're dealing with some form of nodatacsum
I/O.  This helper will be used by the new storage layer checksum
validation and repair code.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:51 +01:00
Christoph Hellwig
1c2b3ee3b0 btrfs: pre-load data checksum for reads in btrfs_submit_bio
Instead of calling btrfs_lookup_bio_sums in every caller of
btrfs_submit_bio that reads data, do the call once in btrfs_submit_bio.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:51 +01:00
Christoph Hellwig
7276aa7d38 btrfs: save the bio iter for checksum validation in common code
All callers of btrfs_submit_bio that want to validate checksums
currently have to store a copy of the iter in the btrfs_bio.  Move
the assignment into common code.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:51 +01:00
Christoph Hellwig
4ae2edf12d btrfs: simplify parameters of btrfs_lookup_bio_sums
The csums argument is always NULL now, so remove it and always allocate
the csums array in the btrfs_bio.  Also pass the btrfs_bio instead of
inode + bio to document that this function requires a btrfs_bio and
not just any bio.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:50 +01:00
Christoph Hellwig
5fa356531e btrfs: remove the direct I/O read checksum lookup optimization
To prepare for pending changes drop the optimization to only look up
csums once per bio that is submitted from the iomap layer.  In the
short run this does cause additional lookups for fragmented direct
reads, but later in the series, the bio based lookup will be used on
the entire bio submitted from iomap, restoring the old behavior
in common code.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:50 +01:00
Christoph Hellwig
d0e5cb2be7 btrfs: add a btrfs_inode pointer to struct btrfs_bio
All btrfs_bio I/Os are associated with an inode.  Add a pointer to that
inode, which will allow to simplify a lot of calling conventions, and
which will be needed in the I/O completion path in the future.

This grow the btrfs_bio structure by a pointer, but that grows will
be offset by the removal of the device pointer soon.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15 19:38:50 +01:00
Christoph Hellwig
36d4556745 btrfs: remove the wait argument to btrfs_start_ordered_extent
Given that wait is always set to 1, so remove the argument.
Last use of wait with 0 was in 0c304304fe ("Btrfs: remove
csum_bytes_left").

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-13 17:50:34 +01:00