1
0
Fork 0
mirror of synced 2025-03-06 20:59:54 +01:00

Merge branch 'for-3.18/core' of git://git.kernel.dk/linux-block

Pull core block layer changes from Jens Axboe:
 "This is the core block IO pull request for 3.18.  Apart from the new
  and improved flush machinery for blk-mq, this is all mostly bug fixes
  and cleanups.

   - blk-mq timeout updates and fixes from Christoph.

   - Removal of REQ_END, also from Christoph.  We pass it through the
     ->queue_rq() hook for blk-mq instead, freeing up one of the request
     bits.  The space was overly tight on 32-bit, so Martin also killed
     REQ_KERNEL since it's no longer used.

   - blk integrity updates and fixes from Martin and Gu Zheng.

   - Update to the flush machinery for blk-mq from Ming Lei.  Now we
     have a per hardware context flush request, which both cleans up the
     code should scale better for flush intensive workloads on blk-mq.

   - Improve the error printing, from Rob Elliott.

   - Backing device improvements and cleanups from Tejun.

   - Fixup of a misplaced rq_complete() tracepoint from Hannes.

   - Make blk_get_request() return error pointers, fixing up issues
     where we NULL deref when a device goes bad or missing.  From Joe
     Lawrence.

   - Prep work for drastically reducing the memory consumption of dm
     devices from Junichi Nomura.  This allows creating clone bio sets
     without preallocating a lot of memory.

   - Fix a blk-mq hang on certain combinations of queue depths and
     hardware queues from me.

   - Limit memory consumption for blk-mq devices for crash dump
     scenarios and drivers that use crazy high depths (certain SCSI
     shared tag setups).  We now just use a single queue and limited
     depth for that"

* 'for-3.18/core' of git://git.kernel.dk/linux-block: (58 commits)
  block: Remove REQ_KERNEL
  blk-mq: allocate cpumask on the home node
  bio-integrity: remove the needless fail handle of bip_slab creating
  block: include func name in __get_request prints
  block: make blk_update_request print prefix match ratelimited prefix
  blk-merge: don't compute bi_phys_segments from bi_vcnt for cloned bio
  block: fix alignment_offset math that assumes io_min is a power-of-2
  blk-mq: Make bt_clear_tag() easier to read
  blk-mq: fix potential hang if rolling wakeup depth is too high
  block: add bioset_create_nobvec()
  block: use bio_clone_fast() in blk_rq_prep_clone()
  block: misplaced rq_complete tracepoint
  sd: Honor block layer integrity handling flags
  block: Replace strnicmp with strncasecmp
  block: Add T10 Protection Information functions
  block: Don't merge requests if integrity flags differ
  block: Integrity checksum flag
  block: Relocate bio integrity flags
  block: Add a disk flag to block integrity profile
  block: Add prefix to block integrity profile flags
  ...
This commit is contained in:
Linus Torvalds 2014-10-18 11:53:51 -07:00
commit d3dc366bba
65 changed files with 1210 additions and 1172 deletions

View file

@ -53,6 +53,14 @@ Description:
512 bytes of data. 512 bytes of data.
What: /sys/block/<disk>/integrity/device_is_integrity_capable
Date: July 2014
Contact: Martin K. Petersen <martin.petersen@oracle.com>
Description:
Indicates whether a storage device is capable of storing
integrity metadata. Set if the device is T10 PI-capable.
What: /sys/block/<disk>/integrity/write_generate What: /sys/block/<disk>/integrity/write_generate
Date: June 2008 Date: June 2008
Contact: Martin K. Petersen <martin.petersen@oracle.com> Contact: Martin K. Petersen <martin.petersen@oracle.com>

View file

@ -129,11 +129,11 @@ interface for this is being worked on.
4.1 BIO 4.1 BIO
The data integrity patches add a new field to struct bio when The data integrity patches add a new field to struct bio when
CONFIG_BLK_DEV_INTEGRITY is enabled. bio->bi_integrity is a pointer CONFIG_BLK_DEV_INTEGRITY is enabled. bio_integrity(bio) returns a
to a struct bip which contains the bio integrity payload. Essentially pointer to a struct bip which contains the bio integrity payload.
a bip is a trimmed down struct bio which holds a bio_vec containing Essentially a bip is a trimmed down struct bio which holds a bio_vec
the integrity metadata and the required housekeeping information (bvec containing the integrity metadata and the required housekeeping
pool, vector count, etc.) information (bvec pool, vector count, etc.)
A kernel subsystem can enable data integrity protection on a bio by A kernel subsystem can enable data integrity protection on a bio by
calling bio_integrity_alloc(bio). This will allocate and attach the calling bio_integrity_alloc(bio). This will allocate and attach the
@ -192,16 +192,6 @@ will require extra work due to the application tag.
supported by the block device. supported by the block device.
int bdev_integrity_enabled(block_device, int rw);
bdev_integrity_enabled() will return 1 if the block device
supports integrity metadata transfer for the data direction
specified in 'rw'.
bdev_integrity_enabled() honors the write_generate and
read_verify flags in sysfs and will respond accordingly.
int bio_integrity_prep(bio); int bio_integrity_prep(bio);
To generate IMD for WRITE and to set up buffers for READ, the To generate IMD for WRITE and to set up buffers for READ, the
@ -216,36 +206,6 @@ will require extra work due to the application tag.
bio_integrity_enabled() returned 1. bio_integrity_enabled() returned 1.
int bio_integrity_tag_size(bio);
If the filesystem wants to use the application tag space it will
first have to find out how much storage space is available.
Because tag space is generally limited (usually 2 bytes per
sector regardless of sector size), the integrity framework
supports interleaving the information between the sectors in an
I/O.
Filesystems can call bio_integrity_tag_size(bio) to find out how
many bytes of storage are available for that particular bio.
Another option is bdev_get_tag_size(block_device) which will
return the number of available bytes per hardware sector.
int bio_integrity_set_tag(bio, void *tag_buf, len);
After a successful return from bio_integrity_prep(),
bio_integrity_set_tag() can be used to attach an opaque tag
buffer to a bio. Obviously this only makes sense if the I/O is
a WRITE.
int bio_integrity_get_tag(bio, void *tag_buf, len);
Similarly, at READ I/O completion time the filesystem can
retrieve the tag buffer using bio_integrity_get_tag().
5.3 PASSING EXISTING INTEGRITY METADATA 5.3 PASSING EXISTING INTEGRITY METADATA
Filesystems that either generate their own integrity metadata or Filesystems that either generate their own integrity metadata or
@ -298,8 +258,6 @@ will require extra work due to the application tag.
.name = "STANDARDSBODY-TYPE-VARIANT-CSUM", .name = "STANDARDSBODY-TYPE-VARIANT-CSUM",
.generate_fn = my_generate_fn, .generate_fn = my_generate_fn,
.verify_fn = my_verify_fn, .verify_fn = my_verify_fn,
.get_tag_fn = my_get_tag_fn,
.set_tag_fn = my_set_tag_fn,
.tuple_size = sizeof(struct my_tuple_size), .tuple_size = sizeof(struct my_tuple_size),
.tag_size = <tag bytes per hw sector>, .tag_size = <tag bytes per hw sector>,
}; };
@ -321,7 +279,5 @@ will require extra work due to the application tag.
are available per hardware sector. For DIF this is either 2 or are available per hardware sector. For DIF this is either 2 or
0 depending on the value of the Control Mode Page ATO bit. 0 depending on the value of the Control Mode Page ATO bit.
See 6.2 for a description of get_tag_fn and set_tag_fn.
---------------------------------------------------------------------- ----------------------------------------------------------------------
2007-12-24 Martin K. Petersen <martin.petersen@oracle.com> 2007-12-24 Martin K. Petersen <martin.petersen@oracle.com>

View file

@ -77,6 +77,7 @@ config BLK_DEV_BSGLIB
config BLK_DEV_INTEGRITY config BLK_DEV_INTEGRITY
bool "Block layer data integrity support" bool "Block layer data integrity support"
select CRC_T10DIF if BLK_DEV_INTEGRITY
---help--- ---help---
Some storage devices allow extra information to be Some storage devices allow extra information to be
stored/retrieved to help protect the data. The block layer stored/retrieved to help protect the data. The block layer

View file

@ -20,6 +20,6 @@ obj-$(CONFIG_IOSCHED_DEADLINE) += deadline-iosched.o
obj-$(CONFIG_IOSCHED_CFQ) += cfq-iosched.o obj-$(CONFIG_IOSCHED_CFQ) += cfq-iosched.o
obj-$(CONFIG_BLOCK_COMPAT) += compat_ioctl.o obj-$(CONFIG_BLOCK_COMPAT) += compat_ioctl.o
obj-$(CONFIG_BLK_DEV_INTEGRITY) += blk-integrity.o
obj-$(CONFIG_BLK_CMDLINE_PARSER) += cmdline-parser.o obj-$(CONFIG_BLK_CMDLINE_PARSER) += cmdline-parser.o
obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o blk-integrity.o t10-pi.o

View file

@ -79,6 +79,7 @@ struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
bip->bip_slab = idx; bip->bip_slab = idx;
bip->bip_bio = bio; bip->bip_bio = bio;
bio->bi_integrity = bip; bio->bi_integrity = bip;
bio->bi_rw |= REQ_INTEGRITY;
return bip; return bip;
err: err:
@ -96,11 +97,12 @@ EXPORT_SYMBOL(bio_integrity_alloc);
*/ */
void bio_integrity_free(struct bio *bio) void bio_integrity_free(struct bio *bio)
{ {
struct bio_integrity_payload *bip = bio->bi_integrity; struct bio_integrity_payload *bip = bio_integrity(bio);
struct bio_set *bs = bio->bi_pool; struct bio_set *bs = bio->bi_pool;
if (bip->bip_owns_buf) if (bip->bip_flags & BIP_BLOCK_INTEGRITY)
kfree(bip->bip_buf); kfree(page_address(bip->bip_vec->bv_page) +
bip->bip_vec->bv_offset);
if (bs) { if (bs) {
if (bip->bip_slab != BIO_POOL_NONE) if (bip->bip_slab != BIO_POOL_NONE)
@ -128,7 +130,7 @@ EXPORT_SYMBOL(bio_integrity_free);
int bio_integrity_add_page(struct bio *bio, struct page *page, int bio_integrity_add_page(struct bio *bio, struct page *page,
unsigned int len, unsigned int offset) unsigned int len, unsigned int offset)
{ {
struct bio_integrity_payload *bip = bio->bi_integrity; struct bio_integrity_payload *bip = bio_integrity(bio);
struct bio_vec *iv; struct bio_vec *iv;
if (bip->bip_vcnt >= bip->bip_max_vcnt) { if (bip->bip_vcnt >= bip->bip_max_vcnt) {
@ -147,24 +149,6 @@ int bio_integrity_add_page(struct bio *bio, struct page *page,
} }
EXPORT_SYMBOL(bio_integrity_add_page); EXPORT_SYMBOL(bio_integrity_add_page);
static int bdev_integrity_enabled(struct block_device *bdev, int rw)
{
struct blk_integrity *bi = bdev_get_integrity(bdev);
if (bi == NULL)
return 0;
if (rw == READ && bi->verify_fn != NULL &&
(bi->flags & INTEGRITY_FLAG_READ))
return 1;
if (rw == WRITE && bi->generate_fn != NULL &&
(bi->flags & INTEGRITY_FLAG_WRITE))
return 1;
return 0;
}
/** /**
* bio_integrity_enabled - Check whether integrity can be passed * bio_integrity_enabled - Check whether integrity can be passed
* @bio: bio to check * @bio: bio to check
@ -174,199 +158,92 @@ static int bdev_integrity_enabled(struct block_device *bdev, int rw)
* set prior to calling. The functions honors the write_generate and * set prior to calling. The functions honors the write_generate and
* read_verify flags in sysfs. * read_verify flags in sysfs.
*/ */
int bio_integrity_enabled(struct bio *bio) bool bio_integrity_enabled(struct bio *bio)
{ {
struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
if (!bio_is_rw(bio)) if (!bio_is_rw(bio))
return 0; return false;
/* Already protected? */ /* Already protected? */
if (bio_integrity(bio)) if (bio_integrity(bio))
return 0; return false;
return bdev_integrity_enabled(bio->bi_bdev, bio_data_dir(bio)); if (bi == NULL)
return false;
if (bio_data_dir(bio) == READ && bi->verify_fn != NULL &&
(bi->flags & BLK_INTEGRITY_VERIFY))
return true;
if (bio_data_dir(bio) == WRITE && bi->generate_fn != NULL &&
(bi->flags & BLK_INTEGRITY_GENERATE))
return true;
return false;
} }
EXPORT_SYMBOL(bio_integrity_enabled); EXPORT_SYMBOL(bio_integrity_enabled);
/** /**
* bio_integrity_hw_sectors - Convert 512b sectors to hardware ditto * bio_integrity_intervals - Return number of integrity intervals for a bio
* @bi: blk_integrity profile for device * @bi: blk_integrity profile for device
* @sectors: Number of 512 sectors to convert * @sectors: Size of the bio in 512-byte sectors
* *
* Description: The block layer calculates everything in 512 byte * Description: The block layer calculates everything in 512 byte
* sectors but integrity metadata is done in terms of the hardware * sectors but integrity metadata is done in terms of the data integrity
* sector size of the storage device. Convert the block layer sectors * interval size of the storage device. Convert the block layer sectors
* to physical sectors. * to the appropriate number of integrity intervals.
*/ */
static inline unsigned int bio_integrity_hw_sectors(struct blk_integrity *bi, static inline unsigned int bio_integrity_intervals(struct blk_integrity *bi,
unsigned int sectors) unsigned int sectors)
{ {
/* At this point there are only 512b or 4096b DIF/EPP devices */ return sectors >> (ilog2(bi->interval) - 9);
if (bi->sector_size == 4096)
return sectors >>= 3;
return sectors;
} }
static inline unsigned int bio_integrity_bytes(struct blk_integrity *bi, static inline unsigned int bio_integrity_bytes(struct blk_integrity *bi,
unsigned int sectors) unsigned int sectors)
{ {
return bio_integrity_hw_sectors(bi, sectors) * bi->tuple_size; return bio_integrity_intervals(bi, sectors) * bi->tuple_size;
} }
/** /**
* bio_integrity_tag_size - Retrieve integrity tag space * bio_integrity_process - Process integrity metadata for a bio
* @bio: bio to inspect
*
* Description: Returns the maximum number of tag bytes that can be
* attached to this bio. Filesystems can use this to determine how
* much metadata to attach to an I/O.
*/
unsigned int bio_integrity_tag_size(struct bio *bio)
{
struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
BUG_ON(bio->bi_iter.bi_size == 0);
return bi->tag_size * (bio->bi_iter.bi_size / bi->sector_size);
}
EXPORT_SYMBOL(bio_integrity_tag_size);
static int bio_integrity_tag(struct bio *bio, void *tag_buf, unsigned int len,
int set)
{
struct bio_integrity_payload *bip = bio->bi_integrity;
struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
unsigned int nr_sectors;
BUG_ON(bip->bip_buf == NULL);
if (bi->tag_size == 0)
return -1;
nr_sectors = bio_integrity_hw_sectors(bi,
DIV_ROUND_UP(len, bi->tag_size));
if (nr_sectors * bi->tuple_size > bip->bip_iter.bi_size) {
printk(KERN_ERR "%s: tag too big for bio: %u > %u\n", __func__,
nr_sectors * bi->tuple_size, bip->bip_iter.bi_size);
return -1;
}
if (set)
bi->set_tag_fn(bip->bip_buf, tag_buf, nr_sectors);
else
bi->get_tag_fn(bip->bip_buf, tag_buf, nr_sectors);
return 0;
}
/**
* bio_integrity_set_tag - Attach a tag buffer to a bio
* @bio: bio to attach buffer to
* @tag_buf: Pointer to a buffer containing tag data
* @len: Length of the included buffer
*
* Description: Use this function to tag a bio by leveraging the extra
* space provided by devices formatted with integrity protection. The
* size of the integrity buffer must be <= to the size reported by
* bio_integrity_tag_size().
*/
int bio_integrity_set_tag(struct bio *bio, void *tag_buf, unsigned int len)
{
BUG_ON(bio_data_dir(bio) != WRITE);
return bio_integrity_tag(bio, tag_buf, len, 1);
}
EXPORT_SYMBOL(bio_integrity_set_tag);
/**
* bio_integrity_get_tag - Retrieve a tag buffer from a bio
* @bio: bio to retrieve buffer from
* @tag_buf: Pointer to a buffer for the tag data
* @len: Length of the target buffer
*
* Description: Use this function to retrieve the tag buffer from a
* completed I/O. The size of the integrity buffer must be <= to the
* size reported by bio_integrity_tag_size().
*/
int bio_integrity_get_tag(struct bio *bio, void *tag_buf, unsigned int len)
{
BUG_ON(bio_data_dir(bio) != READ);
return bio_integrity_tag(bio, tag_buf, len, 0);
}
EXPORT_SYMBOL(bio_integrity_get_tag);
/**
* bio_integrity_generate_verify - Generate/verify integrity metadata for a bio
* @bio: bio to generate/verify integrity metadata for * @bio: bio to generate/verify integrity metadata for
* @operate: operate number, 1 for generate, 0 for verify * @proc_fn: Pointer to the relevant processing function
*/ */
static int bio_integrity_generate_verify(struct bio *bio, int operate) static int bio_integrity_process(struct bio *bio,
integrity_processing_fn *proc_fn)
{ {
struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev); struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
struct blk_integrity_exchg bix; struct blk_integrity_iter iter;
struct bio_vec *bv; struct bio_vec *bv;
sector_t sector; struct bio_integrity_payload *bip = bio_integrity(bio);
unsigned int sectors, ret = 0, i; unsigned int i, ret = 0;
void *prot_buf = bio->bi_integrity->bip_buf; void *prot_buf = page_address(bip->bip_vec->bv_page) +
bip->bip_vec->bv_offset;
if (operate) iter.disk_name = bio->bi_bdev->bd_disk->disk_name;
sector = bio->bi_iter.bi_sector; iter.interval = bi->interval;
else iter.seed = bip_get_seed(bip);
sector = bio->bi_integrity->bip_iter.bi_sector; iter.prot_buf = prot_buf;
bix.disk_name = bio->bi_bdev->bd_disk->disk_name;
bix.sector_size = bi->sector_size;
bio_for_each_segment_all(bv, bio, i) { bio_for_each_segment_all(bv, bio, i) {
void *kaddr = kmap_atomic(bv->bv_page); void *kaddr = kmap_atomic(bv->bv_page);
bix.data_buf = kaddr + bv->bv_offset;
bix.data_size = bv->bv_len;
bix.prot_buf = prot_buf;
bix.sector = sector;
if (operate) iter.data_buf = kaddr + bv->bv_offset;
bi->generate_fn(&bix); iter.data_size = bv->bv_len;
else {
ret = bi->verify_fn(&bix); ret = proc_fn(&iter);
if (ret) { if (ret) {
kunmap_atomic(kaddr); kunmap_atomic(kaddr);
return ret; return ret;
}
} }
sectors = bv->bv_len / bi->sector_size;
sector += sectors;
prot_buf += sectors * bi->tuple_size;
kunmap_atomic(kaddr); kunmap_atomic(kaddr);
} }
return ret; return ret;
} }
/**
* bio_integrity_generate - Generate integrity metadata for a bio
* @bio: bio to generate integrity metadata for
*
* Description: Generates integrity metadata for a bio by calling the
* block device's generation callback function. The bio must have a
* bip attached with enough room to accommodate the generated
* integrity metadata.
*/
static void bio_integrity_generate(struct bio *bio)
{
bio_integrity_generate_verify(bio, 1);
}
static inline unsigned short blk_integrity_tuple_size(struct blk_integrity *bi)
{
if (bi)
return bi->tuple_size;
return 0;
}
/** /**
* bio_integrity_prep - Prepare bio for integrity I/O * bio_integrity_prep - Prepare bio for integrity I/O
* @bio: bio to prepare * @bio: bio to prepare
@ -387,17 +264,17 @@ int bio_integrity_prep(struct bio *bio)
unsigned long start, end; unsigned long start, end;
unsigned int len, nr_pages; unsigned int len, nr_pages;
unsigned int bytes, offset, i; unsigned int bytes, offset, i;
unsigned int sectors; unsigned int intervals;
bi = bdev_get_integrity(bio->bi_bdev); bi = bdev_get_integrity(bio->bi_bdev);
q = bdev_get_queue(bio->bi_bdev); q = bdev_get_queue(bio->bi_bdev);
BUG_ON(bi == NULL); BUG_ON(bi == NULL);
BUG_ON(bio_integrity(bio)); BUG_ON(bio_integrity(bio));
sectors = bio_integrity_hw_sectors(bi, bio_sectors(bio)); intervals = bio_integrity_intervals(bi, bio_sectors(bio));
/* Allocate kernel buffer for protection data */ /* Allocate kernel buffer for protection data */
len = sectors * blk_integrity_tuple_size(bi); len = intervals * bi->tuple_size;
buf = kmalloc(len, GFP_NOIO | q->bounce_gfp); buf = kmalloc(len, GFP_NOIO | q->bounce_gfp);
if (unlikely(buf == NULL)) { if (unlikely(buf == NULL)) {
printk(KERN_ERR "could not allocate integrity buffer\n"); printk(KERN_ERR "could not allocate integrity buffer\n");
@ -416,10 +293,12 @@ int bio_integrity_prep(struct bio *bio)
return -EIO; return -EIO;
} }
bip->bip_owns_buf = 1; bip->bip_flags |= BIP_BLOCK_INTEGRITY;
bip->bip_buf = buf;
bip->bip_iter.bi_size = len; bip->bip_iter.bi_size = len;
bip->bip_iter.bi_sector = bio->bi_iter.bi_sector; bip_set_seed(bip, bio->bi_iter.bi_sector);
if (bi->flags & BLK_INTEGRITY_IP_CHECKSUM)
bip->bip_flags |= BIP_IP_CHECKSUM;
/* Map it */ /* Map it */
offset = offset_in_page(buf); offset = offset_in_page(buf);
@ -455,25 +334,12 @@ int bio_integrity_prep(struct bio *bio)
/* Auto-generate integrity metadata if this is a write */ /* Auto-generate integrity metadata if this is a write */
if (bio_data_dir(bio) == WRITE) if (bio_data_dir(bio) == WRITE)
bio_integrity_generate(bio); bio_integrity_process(bio, bi->generate_fn);
return 0; return 0;
} }
EXPORT_SYMBOL(bio_integrity_prep); EXPORT_SYMBOL(bio_integrity_prep);
/**
* bio_integrity_verify - Verify integrity metadata for a bio
* @bio: bio to verify
*
* Description: This function is called to verify the integrity of a
* bio. The data in the bio io_vec is compared to the integrity
* metadata returned by the HBA.
*/
static int bio_integrity_verify(struct bio *bio)
{
return bio_integrity_generate_verify(bio, 0);
}
/** /**
* bio_integrity_verify_fn - Integrity I/O completion worker * bio_integrity_verify_fn - Integrity I/O completion worker
* @work: Work struct stored in bio to be verified * @work: Work struct stored in bio to be verified
@ -487,9 +353,10 @@ static void bio_integrity_verify_fn(struct work_struct *work)
struct bio_integrity_payload *bip = struct bio_integrity_payload *bip =
container_of(work, struct bio_integrity_payload, bip_work); container_of(work, struct bio_integrity_payload, bip_work);
struct bio *bio = bip->bip_bio; struct bio *bio = bip->bip_bio;
struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
int error; int error;
error = bio_integrity_verify(bio); error = bio_integrity_process(bio, bi->verify_fn);
/* Restore original bio completion handler */ /* Restore original bio completion handler */
bio->bi_end_io = bip->bip_end_io; bio->bi_end_io = bip->bip_end_io;
@ -510,7 +377,7 @@ static void bio_integrity_verify_fn(struct work_struct *work)
*/ */
void bio_integrity_endio(struct bio *bio, int error) void bio_integrity_endio(struct bio *bio, int error)
{ {
struct bio_integrity_payload *bip = bio->bi_integrity; struct bio_integrity_payload *bip = bio_integrity(bio);
BUG_ON(bip->bip_bio != bio); BUG_ON(bip->bip_bio != bio);
@ -541,7 +408,7 @@ EXPORT_SYMBOL(bio_integrity_endio);
*/ */
void bio_integrity_advance(struct bio *bio, unsigned int bytes_done) void bio_integrity_advance(struct bio *bio, unsigned int bytes_done)
{ {
struct bio_integrity_payload *bip = bio->bi_integrity; struct bio_integrity_payload *bip = bio_integrity(bio);
struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev); struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
unsigned bytes = bio_integrity_bytes(bi, bytes_done >> 9); unsigned bytes = bio_integrity_bytes(bi, bytes_done >> 9);
@ -563,7 +430,7 @@ EXPORT_SYMBOL(bio_integrity_advance);
void bio_integrity_trim(struct bio *bio, unsigned int offset, void bio_integrity_trim(struct bio *bio, unsigned int offset,
unsigned int sectors) unsigned int sectors)
{ {
struct bio_integrity_payload *bip = bio->bi_integrity; struct bio_integrity_payload *bip = bio_integrity(bio);
struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev); struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
bio_integrity_advance(bio, offset << 9); bio_integrity_advance(bio, offset << 9);
@ -582,7 +449,7 @@ EXPORT_SYMBOL(bio_integrity_trim);
int bio_integrity_clone(struct bio *bio, struct bio *bio_src, int bio_integrity_clone(struct bio *bio, struct bio *bio_src,
gfp_t gfp_mask) gfp_t gfp_mask)
{ {
struct bio_integrity_payload *bip_src = bio_src->bi_integrity; struct bio_integrity_payload *bip_src = bio_integrity(bio_src);
struct bio_integrity_payload *bip; struct bio_integrity_payload *bip;
BUG_ON(bip_src == NULL); BUG_ON(bip_src == NULL);
@ -646,6 +513,4 @@ void __init bio_integrity_init(void)
sizeof(struct bio_integrity_payload) + sizeof(struct bio_integrity_payload) +
sizeof(struct bio_vec) * BIP_INLINE_VECS, sizeof(struct bio_vec) * BIP_INLINE_VECS,
0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL); 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
if (!bip_slab)
panic("Failed to create slab\n");
} }

View file

@ -428,6 +428,9 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
front_pad = 0; front_pad = 0;
inline_vecs = nr_iovecs; inline_vecs = nr_iovecs;
} else { } else {
/* should not use nobvec bioset for nr_iovecs > 0 */
if (WARN_ON_ONCE(!bs->bvec_pool && nr_iovecs > 0))
return NULL;
/* /*
* generic_make_request() converts recursion to iteration; this * generic_make_request() converts recursion to iteration; this
* means if we're running beneath it, any bios we allocate and * means if we're running beneath it, any bios we allocate and
@ -1900,20 +1903,9 @@ void bioset_free(struct bio_set *bs)
} }
EXPORT_SYMBOL(bioset_free); EXPORT_SYMBOL(bioset_free);
/** static struct bio_set *__bioset_create(unsigned int pool_size,
* bioset_create - Create a bio_set unsigned int front_pad,
* @pool_size: Number of bio and bio_vecs to cache in the mempool bool create_bvec_pool)
* @front_pad: Number of bytes to allocate in front of the returned bio
*
* Description:
* Set up a bio_set to be used with @bio_alloc_bioset. Allows the caller
* to ask for a number of bytes to be allocated in front of the bio.
* Front pad allocation is useful for embedding the bio inside
* another structure, to avoid allocating extra data to go with the bio.
* Note that the bio must be embedded at the END of that structure always,
* or things will break badly.
*/
struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad)
{ {
unsigned int back_pad = BIO_INLINE_VECS * sizeof(struct bio_vec); unsigned int back_pad = BIO_INLINE_VECS * sizeof(struct bio_vec);
struct bio_set *bs; struct bio_set *bs;
@ -1938,9 +1930,11 @@ struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad)
if (!bs->bio_pool) if (!bs->bio_pool)
goto bad; goto bad;
bs->bvec_pool = biovec_create_pool(pool_size); if (create_bvec_pool) {
if (!bs->bvec_pool) bs->bvec_pool = biovec_create_pool(pool_size);
goto bad; if (!bs->bvec_pool)
goto bad;
}
bs->rescue_workqueue = alloc_workqueue("bioset", WQ_MEM_RECLAIM, 0); bs->rescue_workqueue = alloc_workqueue("bioset", WQ_MEM_RECLAIM, 0);
if (!bs->rescue_workqueue) if (!bs->rescue_workqueue)
@ -1951,8 +1945,41 @@ bad:
bioset_free(bs); bioset_free(bs);
return NULL; return NULL;
} }
/**
* bioset_create - Create a bio_set
* @pool_size: Number of bio and bio_vecs to cache in the mempool
* @front_pad: Number of bytes to allocate in front of the returned bio
*
* Description:
* Set up a bio_set to be used with @bio_alloc_bioset. Allows the caller
* to ask for a number of bytes to be allocated in front of the bio.
* Front pad allocation is useful for embedding the bio inside
* another structure, to avoid allocating extra data to go with the bio.
* Note that the bio must be embedded at the END of that structure always,
* or things will break badly.
*/
struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad)
{
return __bioset_create(pool_size, front_pad, true);
}
EXPORT_SYMBOL(bioset_create); EXPORT_SYMBOL(bioset_create);
/**
* bioset_create_nobvec - Create a bio_set without bio_vec mempool
* @pool_size: Number of bio to cache in the mempool
* @front_pad: Number of bytes to allocate in front of the returned bio
*
* Description:
* Same functionality as bioset_create() except that mempool is not
* created for bio_vecs. Saving some memory for bio_clone_fast() users.
*/
struct bio_set *bioset_create_nobvec(unsigned int pool_size, unsigned int front_pad)
{
return __bioset_create(pool_size, front_pad, false);
}
EXPORT_SYMBOL(bioset_create_nobvec);
#ifdef CONFIG_BLK_CGROUP #ifdef CONFIG_BLK_CGROUP
/** /**
* bio_associate_current - associate a bio with %current * bio_associate_current - associate a bio with %current

View file

@ -822,7 +822,6 @@ static void blkcg_css_free(struct cgroup_subsys_state *css)
static struct cgroup_subsys_state * static struct cgroup_subsys_state *
blkcg_css_alloc(struct cgroup_subsys_state *parent_css) blkcg_css_alloc(struct cgroup_subsys_state *parent_css)
{ {
static atomic64_t id_seq = ATOMIC64_INIT(0);
struct blkcg *blkcg; struct blkcg *blkcg;
if (!parent_css) { if (!parent_css) {
@ -836,7 +835,6 @@ blkcg_css_alloc(struct cgroup_subsys_state *parent_css)
blkcg->cfq_weight = CFQ_WEIGHT_DEFAULT; blkcg->cfq_weight = CFQ_WEIGHT_DEFAULT;
blkcg->cfq_leaf_weight = CFQ_WEIGHT_DEFAULT; blkcg->cfq_leaf_weight = CFQ_WEIGHT_DEFAULT;
blkcg->id = atomic64_inc_return(&id_seq); /* root is 0, start from 1 */
done: done:
spin_lock_init(&blkcg->lock); spin_lock_init(&blkcg->lock);
INIT_RADIX_TREE(&blkcg->blkg_tree, GFP_ATOMIC); INIT_RADIX_TREE(&blkcg->blkg_tree, GFP_ATOMIC);

View file

@ -50,9 +50,6 @@ struct blkcg {
struct blkcg_gq *blkg_hint; struct blkcg_gq *blkg_hint;
struct hlist_head blkg_list; struct hlist_head blkg_list;
/* for policies to test whether associated blkcg has changed */
uint64_t id;
/* TODO: per-policy storage in blkcg */ /* TODO: per-policy storage in blkcg */
unsigned int cfq_weight; /* belongs to cfq */ unsigned int cfq_weight; /* belongs to cfq */
unsigned int cfq_leaf_weight; unsigned int cfq_leaf_weight;

View file

@ -83,18 +83,14 @@ void blk_queue_congestion_threshold(struct request_queue *q)
* @bdev: device * @bdev: device
* *
* Locates the passed device's request queue and returns the address of its * Locates the passed device's request queue and returns the address of its
* backing_dev_info * backing_dev_info. This function can only be called if @bdev is opened
* * and the return value is never NULL.
* Will return NULL if the request queue cannot be located.
*/ */
struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev) struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev)
{ {
struct backing_dev_info *ret = NULL;
struct request_queue *q = bdev_get_queue(bdev); struct request_queue *q = bdev_get_queue(bdev);
if (q) return &q->backing_dev_info;
ret = &q->backing_dev_info;
return ret;
} }
EXPORT_SYMBOL(blk_get_backing_dev_info); EXPORT_SYMBOL(blk_get_backing_dev_info);
@ -394,11 +390,13 @@ static void __blk_drain_queue(struct request_queue *q, bool drain_all)
* be drained. Check all the queues and counters. * be drained. Check all the queues and counters.
*/ */
if (drain_all) { if (drain_all) {
struct blk_flush_queue *fq = blk_get_flush_queue(q, NULL);
drain |= !list_empty(&q->queue_head); drain |= !list_empty(&q->queue_head);
for (i = 0; i < 2; i++) { for (i = 0; i < 2; i++) {
drain |= q->nr_rqs[i]; drain |= q->nr_rqs[i];
drain |= q->in_flight[i]; drain |= q->in_flight[i];
drain |= !list_empty(&q->flush_queue[i]); if (fq)
drain |= !list_empty(&fq->flush_queue[i]);
} }
} }
@ -604,9 +602,6 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
#ifdef CONFIG_BLK_CGROUP #ifdef CONFIG_BLK_CGROUP
INIT_LIST_HEAD(&q->blkg_list); INIT_LIST_HEAD(&q->blkg_list);
#endif #endif
INIT_LIST_HEAD(&q->flush_queue[0]);
INIT_LIST_HEAD(&q->flush_queue[1]);
INIT_LIST_HEAD(&q->flush_data_in_flight);
INIT_DELAYED_WORK(&q->delay_work, blk_delay_work); INIT_DELAYED_WORK(&q->delay_work, blk_delay_work);
kobject_init(&q->kobj, &blk_queue_ktype); kobject_init(&q->kobj, &blk_queue_ktype);
@ -709,8 +704,8 @@ blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
if (!q) if (!q)
return NULL; return NULL;
q->flush_rq = kzalloc(sizeof(struct request), GFP_KERNEL); q->fq = blk_alloc_flush_queue(q, NUMA_NO_NODE, 0);
if (!q->flush_rq) if (!q->fq)
return NULL; return NULL;
if (blk_init_rl(&q->root_rl, q, GFP_KERNEL)) if (blk_init_rl(&q->root_rl, q, GFP_KERNEL))
@ -746,7 +741,7 @@ blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
return q; return q;
fail: fail:
kfree(q->flush_rq); blk_free_flush_queue(q->fq);
return NULL; return NULL;
} }
EXPORT_SYMBOL(blk_init_allocated_queue); EXPORT_SYMBOL(blk_init_allocated_queue);
@ -934,8 +929,8 @@ static struct io_context *rq_ioc(struct bio *bio)
* pressure or if @q is dead. * pressure or if @q is dead.
* *
* Must be called with @q->queue_lock held and, * Must be called with @q->queue_lock held and,
* Returns %NULL on failure, with @q->queue_lock held. * Returns ERR_PTR on failure, with @q->queue_lock held.
* Returns !%NULL on success, with @q->queue_lock *not held*. * Returns request pointer on success, with @q->queue_lock *not held*.
*/ */
static struct request *__get_request(struct request_list *rl, int rw_flags, static struct request *__get_request(struct request_list *rl, int rw_flags,
struct bio *bio, gfp_t gfp_mask) struct bio *bio, gfp_t gfp_mask)
@ -949,7 +944,7 @@ static struct request *__get_request(struct request_list *rl, int rw_flags,
int may_queue; int may_queue;
if (unlikely(blk_queue_dying(q))) if (unlikely(blk_queue_dying(q)))
return NULL; return ERR_PTR(-ENODEV);
may_queue = elv_may_queue(q, rw_flags); may_queue = elv_may_queue(q, rw_flags);
if (may_queue == ELV_MQUEUE_NO) if (may_queue == ELV_MQUEUE_NO)
@ -974,7 +969,7 @@ static struct request *__get_request(struct request_list *rl, int rw_flags,
* process is not a "batcher", and not * process is not a "batcher", and not
* exempted by the IO scheduler * exempted by the IO scheduler
*/ */
return NULL; return ERR_PTR(-ENOMEM);
} }
} }
} }
@ -992,7 +987,7 @@ static struct request *__get_request(struct request_list *rl, int rw_flags,
* allocated with any setting of ->nr_requests * allocated with any setting of ->nr_requests
*/ */
if (rl->count[is_sync] >= (3 * q->nr_requests / 2)) if (rl->count[is_sync] >= (3 * q->nr_requests / 2))
return NULL; return ERR_PTR(-ENOMEM);
q->nr_rqs[is_sync]++; q->nr_rqs[is_sync]++;
rl->count[is_sync]++; rl->count[is_sync]++;
@ -1065,8 +1060,8 @@ fail_elvpriv:
* shouldn't stall IO. Treat this request as !elvpriv. This will * shouldn't stall IO. Treat this request as !elvpriv. This will
* disturb iosched and blkcg but weird is bettern than dead. * disturb iosched and blkcg but weird is bettern than dead.
*/ */
printk_ratelimited(KERN_WARNING "%s: request aux data allocation failed, iosched may be disturbed\n", printk_ratelimited(KERN_WARNING "%s: dev %s: request aux data allocation failed, iosched may be disturbed\n",
dev_name(q->backing_dev_info.dev)); __func__, dev_name(q->backing_dev_info.dev));
rq->cmd_flags &= ~REQ_ELVPRIV; rq->cmd_flags &= ~REQ_ELVPRIV;
rq->elv.icq = NULL; rq->elv.icq = NULL;
@ -1097,7 +1092,7 @@ fail_alloc:
rq_starved: rq_starved:
if (unlikely(rl->count[is_sync] == 0)) if (unlikely(rl->count[is_sync] == 0))
rl->starved[is_sync] = 1; rl->starved[is_sync] = 1;
return NULL; return ERR_PTR(-ENOMEM);
} }
/** /**
@ -1111,8 +1106,8 @@ rq_starved:
* function keeps retrying under memory pressure and fails iff @q is dead. * function keeps retrying under memory pressure and fails iff @q is dead.
* *
* Must be called with @q->queue_lock held and, * Must be called with @q->queue_lock held and,
* Returns %NULL on failure, with @q->queue_lock held. * Returns ERR_PTR on failure, with @q->queue_lock held.
* Returns !%NULL on success, with @q->queue_lock *not held*. * Returns request pointer on success, with @q->queue_lock *not held*.
*/ */
static struct request *get_request(struct request_queue *q, int rw_flags, static struct request *get_request(struct request_queue *q, int rw_flags,
struct bio *bio, gfp_t gfp_mask) struct bio *bio, gfp_t gfp_mask)
@ -1125,12 +1120,12 @@ static struct request *get_request(struct request_queue *q, int rw_flags,
rl = blk_get_rl(q, bio); /* transferred to @rq on success */ rl = blk_get_rl(q, bio); /* transferred to @rq on success */
retry: retry:
rq = __get_request(rl, rw_flags, bio, gfp_mask); rq = __get_request(rl, rw_flags, bio, gfp_mask);
if (rq) if (!IS_ERR(rq))
return rq; return rq;
if (!(gfp_mask & __GFP_WAIT) || unlikely(blk_queue_dying(q))) { if (!(gfp_mask & __GFP_WAIT) || unlikely(blk_queue_dying(q))) {
blk_put_rl(rl); blk_put_rl(rl);
return NULL; return rq;
} }
/* wait on @rl and retry */ /* wait on @rl and retry */
@ -1167,7 +1162,7 @@ static struct request *blk_old_get_request(struct request_queue *q, int rw,
spin_lock_irq(q->queue_lock); spin_lock_irq(q->queue_lock);
rq = get_request(q, rw, NULL, gfp_mask); rq = get_request(q, rw, NULL, gfp_mask);
if (!rq) if (IS_ERR(rq))
spin_unlock_irq(q->queue_lock); spin_unlock_irq(q->queue_lock);
/* q->queue_lock is unlocked at this point */ /* q->queue_lock is unlocked at this point */
@ -1219,8 +1214,8 @@ struct request *blk_make_request(struct request_queue *q, struct bio *bio,
{ {
struct request *rq = blk_get_request(q, bio_data_dir(bio), gfp_mask); struct request *rq = blk_get_request(q, bio_data_dir(bio), gfp_mask);
if (unlikely(!rq)) if (IS_ERR(rq))
return ERR_PTR(-ENOMEM); return rq;
blk_rq_set_block_pc(rq); blk_rq_set_block_pc(rq);
@ -1614,8 +1609,8 @@ get_rq:
* Returns with the queue unlocked. * Returns with the queue unlocked.
*/ */
req = get_request(q, rw_flags, bio, GFP_NOIO); req = get_request(q, rw_flags, bio, GFP_NOIO);
if (unlikely(!req)) { if (IS_ERR(req)) {
bio_endio(bio, -ENODEV); /* @q is dead */ bio_endio(bio, PTR_ERR(req)); /* @q is dead */
goto out_unlock; goto out_unlock;
} }
@ -2405,11 +2400,11 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes)
{ {
int total_bytes; int total_bytes;
trace_block_rq_complete(req->q, req, nr_bytes);
if (!req->bio) if (!req->bio)
return false; return false;
trace_block_rq_complete(req->q, req, nr_bytes);
/* /*
* For fs requests, rq is just carrier of independent bio's * For fs requests, rq is just carrier of independent bio's
* and each partial completion should be handled separately. * and each partial completion should be handled separately.
@ -2449,8 +2444,8 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes)
error_type = "I/O"; error_type = "I/O";
break; break;
} }
printk_ratelimited(KERN_ERR "end_request: %s error, dev %s, sector %llu\n", printk_ratelimited(KERN_ERR "%s: %s error, dev %s, sector %llu\n",
error_type, req->rq_disk ? __func__, error_type, req->rq_disk ?
req->rq_disk->disk_name : "?", req->rq_disk->disk_name : "?",
(unsigned long long)blk_rq_pos(req)); (unsigned long long)blk_rq_pos(req));
@ -2931,7 +2926,7 @@ int blk_rq_prep_clone(struct request *rq, struct request *rq_src,
blk_rq_init(NULL, rq); blk_rq_init(NULL, rq);
__rq_for_each_bio(bio_src, rq_src) { __rq_for_each_bio(bio_src, rq_src) {
bio = bio_clone_bioset(bio_src, gfp_mask, bs); bio = bio_clone_fast(bio_src, gfp_mask, bs);
if (!bio) if (!bio)
goto free_and_out; goto free_and_out;

View file

@ -28,7 +28,7 @@
* *
* The actual execution of flush is double buffered. Whenever a request * The actual execution of flush is double buffered. Whenever a request
* needs to execute PRE or POSTFLUSH, it queues at * needs to execute PRE or POSTFLUSH, it queues at
* q->flush_queue[q->flush_pending_idx]. Once certain criteria are met, a * fq->flush_queue[fq->flush_pending_idx]. Once certain criteria are met, a
* flush is issued and the pending_idx is toggled. When the flush * flush is issued and the pending_idx is toggled. When the flush
* completes, all the requests which were pending are proceeded to the next * completes, all the requests which were pending are proceeded to the next
* step. This allows arbitrary merging of different types of FLUSH/FUA * step. This allows arbitrary merging of different types of FLUSH/FUA
@ -91,7 +91,8 @@ enum {
FLUSH_PENDING_TIMEOUT = 5 * HZ, FLUSH_PENDING_TIMEOUT = 5 * HZ,
}; };
static bool blk_kick_flush(struct request_queue *q); static bool blk_kick_flush(struct request_queue *q,
struct blk_flush_queue *fq);
static unsigned int blk_flush_policy(unsigned int fflags, struct request *rq) static unsigned int blk_flush_policy(unsigned int fflags, struct request *rq)
{ {
@ -126,8 +127,6 @@ static void blk_flush_restore_request(struct request *rq)
/* make @rq a normal request */ /* make @rq a normal request */
rq->cmd_flags &= ~REQ_FLUSH_SEQ; rq->cmd_flags &= ~REQ_FLUSH_SEQ;
rq->end_io = rq->flush.saved_end_io; rq->end_io = rq->flush.saved_end_io;
blk_clear_rq_complete(rq);
} }
static bool blk_flush_queue_rq(struct request *rq, bool add_front) static bool blk_flush_queue_rq(struct request *rq, bool add_front)
@ -150,6 +149,7 @@ static bool blk_flush_queue_rq(struct request *rq, bool add_front)
/** /**
* blk_flush_complete_seq - complete flush sequence * blk_flush_complete_seq - complete flush sequence
* @rq: FLUSH/FUA request being sequenced * @rq: FLUSH/FUA request being sequenced
* @fq: flush queue
* @seq: sequences to complete (mask of %REQ_FSEQ_*, can be zero) * @seq: sequences to complete (mask of %REQ_FSEQ_*, can be zero)
* @error: whether an error occurred * @error: whether an error occurred
* *
@ -157,16 +157,17 @@ static bool blk_flush_queue_rq(struct request *rq, bool add_front)
* completion and trigger the next step. * completion and trigger the next step.
* *
* CONTEXT: * CONTEXT:
* spin_lock_irq(q->queue_lock or q->mq_flush_lock) * spin_lock_irq(q->queue_lock or fq->mq_flush_lock)
* *
* RETURNS: * RETURNS:
* %true if requests were added to the dispatch queue, %false otherwise. * %true if requests were added to the dispatch queue, %false otherwise.
*/ */
static bool blk_flush_complete_seq(struct request *rq, unsigned int seq, static bool blk_flush_complete_seq(struct request *rq,
int error) struct blk_flush_queue *fq,
unsigned int seq, int error)
{ {
struct request_queue *q = rq->q; struct request_queue *q = rq->q;
struct list_head *pending = &q->flush_queue[q->flush_pending_idx]; struct list_head *pending = &fq->flush_queue[fq->flush_pending_idx];
bool queued = false, kicked; bool queued = false, kicked;
BUG_ON(rq->flush.seq & seq); BUG_ON(rq->flush.seq & seq);
@ -182,12 +183,12 @@ static bool blk_flush_complete_seq(struct request *rq, unsigned int seq,
case REQ_FSEQ_POSTFLUSH: case REQ_FSEQ_POSTFLUSH:
/* queue for flush */ /* queue for flush */
if (list_empty(pending)) if (list_empty(pending))
q->flush_pending_since = jiffies; fq->flush_pending_since = jiffies;
list_move_tail(&rq->flush.list, pending); list_move_tail(&rq->flush.list, pending);
break; break;
case REQ_FSEQ_DATA: case REQ_FSEQ_DATA:
list_move_tail(&rq->flush.list, &q->flush_data_in_flight); list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
queued = blk_flush_queue_rq(rq, true); queued = blk_flush_queue_rq(rq, true);
break; break;
@ -202,7 +203,7 @@ static bool blk_flush_complete_seq(struct request *rq, unsigned int seq,
list_del_init(&rq->flush.list); list_del_init(&rq->flush.list);
blk_flush_restore_request(rq); blk_flush_restore_request(rq);
if (q->mq_ops) if (q->mq_ops)
blk_mq_end_io(rq, error); blk_mq_end_request(rq, error);
else else
__blk_end_request_all(rq, error); __blk_end_request_all(rq, error);
break; break;
@ -211,7 +212,7 @@ static bool blk_flush_complete_seq(struct request *rq, unsigned int seq,
BUG(); BUG();
} }
kicked = blk_kick_flush(q); kicked = blk_kick_flush(q, fq);
return kicked | queued; return kicked | queued;
} }
@ -222,17 +223,18 @@ static void flush_end_io(struct request *flush_rq, int error)
bool queued = false; bool queued = false;
struct request *rq, *n; struct request *rq, *n;
unsigned long flags = 0; unsigned long flags = 0;
struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx);
if (q->mq_ops) { if (q->mq_ops) {
spin_lock_irqsave(&q->mq_flush_lock, flags); spin_lock_irqsave(&fq->mq_flush_lock, flags);
q->flush_rq->tag = -1; flush_rq->tag = -1;
} }
running = &q->flush_queue[q->flush_running_idx]; running = &fq->flush_queue[fq->flush_running_idx];
BUG_ON(q->flush_pending_idx == q->flush_running_idx); BUG_ON(fq->flush_pending_idx == fq->flush_running_idx);
/* account completion of the flush request */ /* account completion of the flush request */
q->flush_running_idx ^= 1; fq->flush_running_idx ^= 1;
if (!q->mq_ops) if (!q->mq_ops)
elv_completed_request(q, flush_rq); elv_completed_request(q, flush_rq);
@ -242,7 +244,7 @@ static void flush_end_io(struct request *flush_rq, int error)
unsigned int seq = blk_flush_cur_seq(rq); unsigned int seq = blk_flush_cur_seq(rq);
BUG_ON(seq != REQ_FSEQ_PREFLUSH && seq != REQ_FSEQ_POSTFLUSH); BUG_ON(seq != REQ_FSEQ_PREFLUSH && seq != REQ_FSEQ_POSTFLUSH);
queued |= blk_flush_complete_seq(rq, seq, error); queued |= blk_flush_complete_seq(rq, fq, seq, error);
} }
/* /*
@ -256,71 +258,81 @@ static void flush_end_io(struct request *flush_rq, int error)
* directly into request_fn may confuse the driver. Always use * directly into request_fn may confuse the driver. Always use
* kblockd. * kblockd.
*/ */
if (queued || q->flush_queue_delayed) { if (queued || fq->flush_queue_delayed) {
WARN_ON(q->mq_ops); WARN_ON(q->mq_ops);
blk_run_queue_async(q); blk_run_queue_async(q);
} }
q->flush_queue_delayed = 0; fq->flush_queue_delayed = 0;
if (q->mq_ops) if (q->mq_ops)
spin_unlock_irqrestore(&q->mq_flush_lock, flags); spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
} }
/** /**
* blk_kick_flush - consider issuing flush request * blk_kick_flush - consider issuing flush request
* @q: request_queue being kicked * @q: request_queue being kicked
* @fq: flush queue
* *
* Flush related states of @q have changed, consider issuing flush request. * Flush related states of @q have changed, consider issuing flush request.
* Please read the comment at the top of this file for more info. * Please read the comment at the top of this file for more info.
* *
* CONTEXT: * CONTEXT:
* spin_lock_irq(q->queue_lock or q->mq_flush_lock) * spin_lock_irq(q->queue_lock or fq->mq_flush_lock)
* *
* RETURNS: * RETURNS:
* %true if flush was issued, %false otherwise. * %true if flush was issued, %false otherwise.
*/ */
static bool blk_kick_flush(struct request_queue *q) static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq)
{ {
struct list_head *pending = &q->flush_queue[q->flush_pending_idx]; struct list_head *pending = &fq->flush_queue[fq->flush_pending_idx];
struct request *first_rq = struct request *first_rq =
list_first_entry(pending, struct request, flush.list); list_first_entry(pending, struct request, flush.list);
struct request *flush_rq = fq->flush_rq;
/* C1 described at the top of this file */ /* C1 described at the top of this file */
if (q->flush_pending_idx != q->flush_running_idx || list_empty(pending)) if (fq->flush_pending_idx != fq->flush_running_idx || list_empty(pending))
return false; return false;
/* C2 and C3 */ /* C2 and C3 */
if (!list_empty(&q->flush_data_in_flight) && if (!list_empty(&fq->flush_data_in_flight) &&
time_before(jiffies, time_before(jiffies,
q->flush_pending_since + FLUSH_PENDING_TIMEOUT)) fq->flush_pending_since + FLUSH_PENDING_TIMEOUT))
return false; return false;
/* /*
* Issue flush and toggle pending_idx. This makes pending_idx * Issue flush and toggle pending_idx. This makes pending_idx
* different from running_idx, which means flush is in flight. * different from running_idx, which means flush is in flight.
*/ */
q->flush_pending_idx ^= 1; fq->flush_pending_idx ^= 1;
blk_rq_init(q, q->flush_rq); blk_rq_init(q, flush_rq);
if (q->mq_ops)
blk_mq_clone_flush_request(q->flush_rq, first_rq);
q->flush_rq->cmd_type = REQ_TYPE_FS; /*
q->flush_rq->cmd_flags = WRITE_FLUSH | REQ_FLUSH_SEQ; * Borrow tag from the first request since they can't
q->flush_rq->rq_disk = first_rq->rq_disk; * be in flight at the same time.
q->flush_rq->end_io = flush_end_io; */
if (q->mq_ops) {
flush_rq->mq_ctx = first_rq->mq_ctx;
flush_rq->tag = first_rq->tag;
}
return blk_flush_queue_rq(q->flush_rq, false); flush_rq->cmd_type = REQ_TYPE_FS;
flush_rq->cmd_flags = WRITE_FLUSH | REQ_FLUSH_SEQ;
flush_rq->rq_disk = first_rq->rq_disk;
flush_rq->end_io = flush_end_io;
return blk_flush_queue_rq(flush_rq, false);
} }
static void flush_data_end_io(struct request *rq, int error) static void flush_data_end_io(struct request *rq, int error)
{ {
struct request_queue *q = rq->q; struct request_queue *q = rq->q;
struct blk_flush_queue *fq = blk_get_flush_queue(q, NULL);
/* /*
* After populating an empty queue, kick it to avoid stall. Read * After populating an empty queue, kick it to avoid stall. Read
* the comment in flush_end_io(). * the comment in flush_end_io().
*/ */
if (blk_flush_complete_seq(rq, REQ_FSEQ_DATA, error)) if (blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error))
blk_run_queue_async(q); blk_run_queue_async(q);
} }
@ -328,20 +340,20 @@ static void mq_flush_data_end_io(struct request *rq, int error)
{ {
struct request_queue *q = rq->q; struct request_queue *q = rq->q;
struct blk_mq_hw_ctx *hctx; struct blk_mq_hw_ctx *hctx;
struct blk_mq_ctx *ctx; struct blk_mq_ctx *ctx = rq->mq_ctx;
unsigned long flags; unsigned long flags;
struct blk_flush_queue *fq = blk_get_flush_queue(q, ctx);
ctx = rq->mq_ctx;
hctx = q->mq_ops->map_queue(q, ctx->cpu); hctx = q->mq_ops->map_queue(q, ctx->cpu);
/* /*
* After populating an empty queue, kick it to avoid stall. Read * After populating an empty queue, kick it to avoid stall. Read
* the comment in flush_end_io(). * the comment in flush_end_io().
*/ */
spin_lock_irqsave(&q->mq_flush_lock, flags); spin_lock_irqsave(&fq->mq_flush_lock, flags);
if (blk_flush_complete_seq(rq, REQ_FSEQ_DATA, error)) if (blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error))
blk_mq_run_hw_queue(hctx, true); blk_mq_run_hw_queue(hctx, true);
spin_unlock_irqrestore(&q->mq_flush_lock, flags); spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
} }
/** /**
@ -361,6 +373,7 @@ void blk_insert_flush(struct request *rq)
struct request_queue *q = rq->q; struct request_queue *q = rq->q;
unsigned int fflags = q->flush_flags; /* may change, cache */ unsigned int fflags = q->flush_flags; /* may change, cache */
unsigned int policy = blk_flush_policy(fflags, rq); unsigned int policy = blk_flush_policy(fflags, rq);
struct blk_flush_queue *fq = blk_get_flush_queue(q, rq->mq_ctx);
/* /*
* @policy now records what operations need to be done. Adjust * @policy now records what operations need to be done. Adjust
@ -378,7 +391,7 @@ void blk_insert_flush(struct request *rq)
*/ */
if (!policy) { if (!policy) {
if (q->mq_ops) if (q->mq_ops)
blk_mq_end_io(rq, 0); blk_mq_end_request(rq, 0);
else else
__blk_end_bidi_request(rq, 0, 0, 0); __blk_end_bidi_request(rq, 0, 0, 0);
return; return;
@ -411,14 +424,14 @@ void blk_insert_flush(struct request *rq)
if (q->mq_ops) { if (q->mq_ops) {
rq->end_io = mq_flush_data_end_io; rq->end_io = mq_flush_data_end_io;
spin_lock_irq(&q->mq_flush_lock); spin_lock_irq(&fq->mq_flush_lock);
blk_flush_complete_seq(rq, REQ_FSEQ_ACTIONS & ~policy, 0); blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
spin_unlock_irq(&q->mq_flush_lock); spin_unlock_irq(&fq->mq_flush_lock);
return; return;
} }
rq->end_io = flush_data_end_io; rq->end_io = flush_data_end_io;
blk_flush_complete_seq(rq, REQ_FSEQ_ACTIONS & ~policy, 0); blk_flush_complete_seq(rq, fq, REQ_FSEQ_ACTIONS & ~policy, 0);
} }
/** /**
@ -474,7 +487,43 @@ int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
} }
EXPORT_SYMBOL(blkdev_issue_flush); EXPORT_SYMBOL(blkdev_issue_flush);
void blk_mq_init_flush(struct request_queue *q) struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q,
int node, int cmd_size)
{ {
spin_lock_init(&q->mq_flush_lock); struct blk_flush_queue *fq;
int rq_sz = sizeof(struct request);
fq = kzalloc_node(sizeof(*fq), GFP_KERNEL, node);
if (!fq)
goto fail;
if (q->mq_ops) {
spin_lock_init(&fq->mq_flush_lock);
rq_sz = round_up(rq_sz + cmd_size, cache_line_size());
}
fq->flush_rq = kzalloc_node(rq_sz, GFP_KERNEL, node);
if (!fq->flush_rq)
goto fail_rq;
INIT_LIST_HEAD(&fq->flush_queue[0]);
INIT_LIST_HEAD(&fq->flush_queue[1]);
INIT_LIST_HEAD(&fq->flush_data_in_flight);
return fq;
fail_rq:
kfree(fq);
fail:
return NULL;
}
void blk_free_flush_queue(struct blk_flush_queue *fq)
{
/* bio based request queue hasn't flush queue */
if (!fq)
return;
kfree(fq->flush_rq);
kfree(fq);
} }

View file

@ -154,10 +154,10 @@ int blk_integrity_compare(struct gendisk *gd1, struct gendisk *gd2)
if (!b1 || !b2) if (!b1 || !b2)
return -1; return -1;
if (b1->sector_size != b2->sector_size) { if (b1->interval != b2->interval) {
printk(KERN_ERR "%s: %s/%s sector sz %u != %u\n", __func__, pr_err("%s: %s/%s protection interval %u != %u\n",
gd1->disk_name, gd2->disk_name, __func__, gd1->disk_name, gd2->disk_name,
b1->sector_size, b2->sector_size); b1->interval, b2->interval);
return -1; return -1;
} }
@ -186,37 +186,53 @@ int blk_integrity_compare(struct gendisk *gd1, struct gendisk *gd2)
} }
EXPORT_SYMBOL(blk_integrity_compare); EXPORT_SYMBOL(blk_integrity_compare);
int blk_integrity_merge_rq(struct request_queue *q, struct request *req, bool blk_integrity_merge_rq(struct request_queue *q, struct request *req,
struct request *next) struct request *next)
{ {
if (blk_integrity_rq(req) != blk_integrity_rq(next)) if (blk_integrity_rq(req) == 0 && blk_integrity_rq(next) == 0)
return -1; return true;
if (blk_integrity_rq(req) == 0 || blk_integrity_rq(next) == 0)
return false;
if (bio_integrity(req->bio)->bip_flags !=
bio_integrity(next->bio)->bip_flags)
return false;
if (req->nr_integrity_segments + next->nr_integrity_segments > if (req->nr_integrity_segments + next->nr_integrity_segments >
q->limits.max_integrity_segments) q->limits.max_integrity_segments)
return -1; return false;
return 0; return true;
} }
EXPORT_SYMBOL(blk_integrity_merge_rq); EXPORT_SYMBOL(blk_integrity_merge_rq);
int blk_integrity_merge_bio(struct request_queue *q, struct request *req, bool blk_integrity_merge_bio(struct request_queue *q, struct request *req,
struct bio *bio) struct bio *bio)
{ {
int nr_integrity_segs; int nr_integrity_segs;
struct bio *next = bio->bi_next; struct bio *next = bio->bi_next;
if (blk_integrity_rq(req) == 0 && bio_integrity(bio) == NULL)
return true;
if (blk_integrity_rq(req) == 0 || bio_integrity(bio) == NULL)
return false;
if (bio_integrity(req->bio)->bip_flags != bio_integrity(bio)->bip_flags)
return false;
bio->bi_next = NULL; bio->bi_next = NULL;
nr_integrity_segs = blk_rq_count_integrity_sg(q, bio); nr_integrity_segs = blk_rq_count_integrity_sg(q, bio);
bio->bi_next = next; bio->bi_next = next;
if (req->nr_integrity_segments + nr_integrity_segs > if (req->nr_integrity_segments + nr_integrity_segs >
q->limits.max_integrity_segments) q->limits.max_integrity_segments)
return -1; return false;
req->nr_integrity_segments += nr_integrity_segs; req->nr_integrity_segments += nr_integrity_segs;
return 0; return true;
} }
EXPORT_SYMBOL(blk_integrity_merge_bio); EXPORT_SYMBOL(blk_integrity_merge_bio);
@ -269,42 +285,48 @@ static ssize_t integrity_tag_size_show(struct blk_integrity *bi, char *page)
return sprintf(page, "0\n"); return sprintf(page, "0\n");
} }
static ssize_t integrity_read_store(struct blk_integrity *bi, static ssize_t integrity_verify_store(struct blk_integrity *bi,
const char *page, size_t count) const char *page, size_t count)
{ {
char *p = (char *) page; char *p = (char *) page;
unsigned long val = simple_strtoul(p, &p, 10); unsigned long val = simple_strtoul(p, &p, 10);
if (val) if (val)
bi->flags |= INTEGRITY_FLAG_READ; bi->flags |= BLK_INTEGRITY_VERIFY;
else else
bi->flags &= ~INTEGRITY_FLAG_READ; bi->flags &= ~BLK_INTEGRITY_VERIFY;
return count; return count;
} }
static ssize_t integrity_read_show(struct blk_integrity *bi, char *page) static ssize_t integrity_verify_show(struct blk_integrity *bi, char *page)
{ {
return sprintf(page, "%d\n", (bi->flags & INTEGRITY_FLAG_READ) != 0); return sprintf(page, "%d\n", (bi->flags & BLK_INTEGRITY_VERIFY) != 0);
} }
static ssize_t integrity_write_store(struct blk_integrity *bi, static ssize_t integrity_generate_store(struct blk_integrity *bi,
const char *page, size_t count) const char *page, size_t count)
{ {
char *p = (char *) page; char *p = (char *) page;
unsigned long val = simple_strtoul(p, &p, 10); unsigned long val = simple_strtoul(p, &p, 10);
if (val) if (val)
bi->flags |= INTEGRITY_FLAG_WRITE; bi->flags |= BLK_INTEGRITY_GENERATE;
else else
bi->flags &= ~INTEGRITY_FLAG_WRITE; bi->flags &= ~BLK_INTEGRITY_GENERATE;
return count; return count;
} }
static ssize_t integrity_write_show(struct blk_integrity *bi, char *page) static ssize_t integrity_generate_show(struct blk_integrity *bi, char *page)
{ {
return sprintf(page, "%d\n", (bi->flags & INTEGRITY_FLAG_WRITE) != 0); return sprintf(page, "%d\n", (bi->flags & BLK_INTEGRITY_GENERATE) != 0);
}
static ssize_t integrity_device_show(struct blk_integrity *bi, char *page)
{
return sprintf(page, "%u\n",
(bi->flags & BLK_INTEGRITY_DEVICE_CAPABLE) != 0);
} }
static struct integrity_sysfs_entry integrity_format_entry = { static struct integrity_sysfs_entry integrity_format_entry = {
@ -317,23 +339,29 @@ static struct integrity_sysfs_entry integrity_tag_size_entry = {
.show = integrity_tag_size_show, .show = integrity_tag_size_show,
}; };
static struct integrity_sysfs_entry integrity_read_entry = { static struct integrity_sysfs_entry integrity_verify_entry = {
.attr = { .name = "read_verify", .mode = S_IRUGO | S_IWUSR }, .attr = { .name = "read_verify", .mode = S_IRUGO | S_IWUSR },
.show = integrity_read_show, .show = integrity_verify_show,
.store = integrity_read_store, .store = integrity_verify_store,
}; };
static struct integrity_sysfs_entry integrity_write_entry = { static struct integrity_sysfs_entry integrity_generate_entry = {
.attr = { .name = "write_generate", .mode = S_IRUGO | S_IWUSR }, .attr = { .name = "write_generate", .mode = S_IRUGO | S_IWUSR },
.show = integrity_write_show, .show = integrity_generate_show,
.store = integrity_write_store, .store = integrity_generate_store,
};
static struct integrity_sysfs_entry integrity_device_entry = {
.attr = { .name = "device_is_integrity_capable", .mode = S_IRUGO },
.show = integrity_device_show,
}; };
static struct attribute *integrity_attrs[] = { static struct attribute *integrity_attrs[] = {
&integrity_format_entry.attr, &integrity_format_entry.attr,
&integrity_tag_size_entry.attr, &integrity_tag_size_entry.attr,
&integrity_read_entry.attr, &integrity_verify_entry.attr,
&integrity_write_entry.attr, &integrity_generate_entry.attr,
&integrity_device_entry.attr,
NULL, NULL,
}; };
@ -406,8 +434,8 @@ int blk_integrity_register(struct gendisk *disk, struct blk_integrity *template)
kobject_uevent(&bi->kobj, KOBJ_ADD); kobject_uevent(&bi->kobj, KOBJ_ADD);
bi->flags |= INTEGRITY_FLAG_READ | INTEGRITY_FLAG_WRITE; bi->flags |= BLK_INTEGRITY_VERIFY | BLK_INTEGRITY_GENERATE;
bi->sector_size = queue_logical_block_size(disk->queue); bi->interval = queue_logical_block_size(disk->queue);
disk->integrity = bi; disk->integrity = bi;
} else } else
bi = disk->integrity; bi = disk->integrity;
@ -418,9 +446,8 @@ int blk_integrity_register(struct gendisk *disk, struct blk_integrity *template)
bi->generate_fn = template->generate_fn; bi->generate_fn = template->generate_fn;
bi->verify_fn = template->verify_fn; bi->verify_fn = template->verify_fn;
bi->tuple_size = template->tuple_size; bi->tuple_size = template->tuple_size;
bi->set_tag_fn = template->set_tag_fn;
bi->get_tag_fn = template->get_tag_fn;
bi->tag_size = template->tag_size; bi->tag_size = template->tag_size;
bi->flags |= template->flags;
} else } else
bi->name = bi_unsupported_name; bi->name = bi_unsupported_name;

View file

@ -97,14 +97,18 @@ void blk_recalc_rq_segments(struct request *rq)
void blk_recount_segments(struct request_queue *q, struct bio *bio) void blk_recount_segments(struct request_queue *q, struct bio *bio)
{ {
if (test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags) && bool no_sg_merge = !!test_bit(QUEUE_FLAG_NO_SG_MERGE,
&q->queue_flags);
if (no_sg_merge && !bio_flagged(bio, BIO_CLONED) &&
bio->bi_vcnt < queue_max_segments(q)) bio->bi_vcnt < queue_max_segments(q))
bio->bi_phys_segments = bio->bi_vcnt; bio->bi_phys_segments = bio->bi_vcnt;
else { else {
struct bio *nxt = bio->bi_next; struct bio *nxt = bio->bi_next;
bio->bi_next = NULL; bio->bi_next = NULL;
bio->bi_phys_segments = __blk_recalc_rq_segments(q, bio, false); bio->bi_phys_segments = __blk_recalc_rq_segments(q, bio,
no_sg_merge);
bio->bi_next = nxt; bio->bi_next = nxt;
} }
@ -313,7 +317,7 @@ static inline int ll_new_hw_segment(struct request_queue *q,
if (req->nr_phys_segments + nr_phys_segs > queue_max_segments(q)) if (req->nr_phys_segments + nr_phys_segs > queue_max_segments(q))
goto no_merge; goto no_merge;
if (bio_integrity(bio) && blk_integrity_merge_bio(q, req, bio)) if (blk_integrity_merge_bio(q, req, bio) == false)
goto no_merge; goto no_merge;
/* /*
@ -410,7 +414,7 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
if (total_phys_segments > queue_max_segments(q)) if (total_phys_segments > queue_max_segments(q))
return 0; return 0;
if (blk_integrity_rq(req) && blk_integrity_merge_rq(q, req, next)) if (blk_integrity_merge_rq(q, req, next) == false)
return 0; return 0;
/* Merge is OK... */ /* Merge is OK... */
@ -590,7 +594,7 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
return false; return false;
/* only merge integrity protected bio into ditto rq */ /* only merge integrity protected bio into ditto rq */
if (bio_integrity(bio) != blk_integrity_rq(rq)) if (blk_integrity_merge_bio(rq->q, rq, bio) == false)
return false; return false;
/* must be using the same buffer */ /* must be using the same buffer */

View file

@ -351,15 +351,12 @@ static void bt_clear_tag(struct blk_mq_bitmap_tags *bt, unsigned int tag)
return; return;
wait_cnt = atomic_dec_return(&bs->wait_cnt); wait_cnt = atomic_dec_return(&bs->wait_cnt);
if (unlikely(wait_cnt < 0))
wait_cnt = atomic_inc_return(&bs->wait_cnt);
if (wait_cnt == 0) { if (wait_cnt == 0) {
wake:
atomic_add(bt->wake_cnt, &bs->wait_cnt); atomic_add(bt->wake_cnt, &bs->wait_cnt);
bt_index_atomic_inc(&bt->wake_index); bt_index_atomic_inc(&bt->wake_index);
wake_up(&bs->wait); wake_up(&bs->wait);
} else if (wait_cnt < 0) {
wait_cnt = atomic_inc_return(&bs->wait_cnt);
if (!wait_cnt)
goto wake;
} }
} }
@ -392,45 +389,37 @@ void blk_mq_put_tag(struct blk_mq_hw_ctx *hctx, unsigned int tag,
__blk_mq_put_reserved_tag(tags, tag); __blk_mq_put_reserved_tag(tags, tag);
} }
static void bt_for_each_free(struct blk_mq_bitmap_tags *bt, static void bt_for_each(struct blk_mq_hw_ctx *hctx,
unsigned long *free_map, unsigned int off) struct blk_mq_bitmap_tags *bt, unsigned int off,
busy_iter_fn *fn, void *data, bool reserved)
{ {
int i; struct request *rq;
int bit, i;
for (i = 0; i < bt->map_nr; i++) { for (i = 0; i < bt->map_nr; i++) {
struct blk_align_bitmap *bm = &bt->map[i]; struct blk_align_bitmap *bm = &bt->map[i];
int bit = 0;
do { for (bit = find_first_bit(&bm->word, bm->depth);
bit = find_next_zero_bit(&bm->word, bm->depth, bit); bit < bm->depth;
if (bit >= bm->depth) bit = find_next_bit(&bm->word, bm->depth, bit + 1)) {
break; rq = blk_mq_tag_to_rq(hctx->tags, off + bit);
if (rq->q == hctx->queue)
__set_bit(bit + off, free_map); fn(hctx, rq, data, reserved);
bit++; }
} while (1);
off += (1 << bt->bits_per_word); off += (1 << bt->bits_per_word);
} }
} }
void blk_mq_tag_busy_iter(struct blk_mq_tags *tags, void blk_mq_tag_busy_iter(struct blk_mq_hw_ctx *hctx, busy_iter_fn *fn,
void (*fn)(void *, unsigned long *), void *data) void *priv)
{ {
unsigned long *tag_map; struct blk_mq_tags *tags = hctx->tags;
size_t map_size;
map_size = ALIGN(tags->nr_tags, BITS_PER_LONG) / BITS_PER_LONG;
tag_map = kzalloc(map_size * sizeof(unsigned long), GFP_ATOMIC);
if (!tag_map)
return;
bt_for_each_free(&tags->bitmap_tags, tag_map, tags->nr_reserved_tags);
if (tags->nr_reserved_tags) if (tags->nr_reserved_tags)
bt_for_each_free(&tags->breserved_tags, tag_map, 0); bt_for_each(hctx, &tags->breserved_tags, 0, fn, priv, true);
bt_for_each(hctx, &tags->bitmap_tags, tags->nr_reserved_tags, fn, priv,
fn(data, tag_map); false);
kfree(tag_map);
} }
EXPORT_SYMBOL(blk_mq_tag_busy_iter); EXPORT_SYMBOL(blk_mq_tag_busy_iter);
@ -463,8 +452,8 @@ static void bt_update_count(struct blk_mq_bitmap_tags *bt,
} }
bt->wake_cnt = BT_WAIT_BATCH; bt->wake_cnt = BT_WAIT_BATCH;
if (bt->wake_cnt > depth / 4) if (bt->wake_cnt > depth / BT_WAIT_QUEUES)
bt->wake_cnt = max(1U, depth / 4); bt->wake_cnt = max(1U, depth / BT_WAIT_QUEUES);
bt->depth = depth; bt->depth = depth;
} }

View file

@ -20,6 +20,7 @@
#include <linux/cache.h> #include <linux/cache.h>
#include <linux/sched/sysctl.h> #include <linux/sched/sysctl.h>
#include <linux/delay.h> #include <linux/delay.h>
#include <linux/crash_dump.h>
#include <trace/events/block.h> #include <trace/events/block.h>
@ -223,9 +224,11 @@ struct request *blk_mq_alloc_request(struct request_queue *q, int rw, gfp_t gfp,
struct blk_mq_hw_ctx *hctx; struct blk_mq_hw_ctx *hctx;
struct request *rq; struct request *rq;
struct blk_mq_alloc_data alloc_data; struct blk_mq_alloc_data alloc_data;
int ret;
if (blk_mq_queue_enter(q)) ret = blk_mq_queue_enter(q);
return NULL; if (ret)
return ERR_PTR(ret);
ctx = blk_mq_get_ctx(q); ctx = blk_mq_get_ctx(q);
hctx = q->mq_ops->map_queue(q, ctx->cpu); hctx = q->mq_ops->map_queue(q, ctx->cpu);
@ -245,6 +248,8 @@ struct request *blk_mq_alloc_request(struct request_queue *q, int rw, gfp_t gfp,
ctx = alloc_data.ctx; ctx = alloc_data.ctx;
} }
blk_mq_put_ctx(ctx); blk_mq_put_ctx(ctx);
if (!rq)
return ERR_PTR(-EWOULDBLOCK);
return rq; return rq;
} }
EXPORT_SYMBOL(blk_mq_alloc_request); EXPORT_SYMBOL(blk_mq_alloc_request);
@ -276,27 +281,7 @@ void blk_mq_free_request(struct request *rq)
__blk_mq_free_request(hctx, ctx, rq); __blk_mq_free_request(hctx, ctx, rq);
} }
/* inline void __blk_mq_end_request(struct request *rq, int error)
* Clone all relevant state from a request that has been put on hold in
* the flush state machine into the preallocated flush request that hangs
* off the request queue.
*
* For a driver the flush request should be invisible, that's why we are
* impersonating the original request here.
*/
void blk_mq_clone_flush_request(struct request *flush_rq,
struct request *orig_rq)
{
struct blk_mq_hw_ctx *hctx =
orig_rq->q->mq_ops->map_queue(orig_rq->q, orig_rq->mq_ctx->cpu);
flush_rq->mq_ctx = orig_rq->mq_ctx;
flush_rq->tag = orig_rq->tag;
memcpy(blk_mq_rq_to_pdu(flush_rq), blk_mq_rq_to_pdu(orig_rq),
hctx->cmd_size);
}
inline void __blk_mq_end_io(struct request *rq, int error)
{ {
blk_account_io_done(rq); blk_account_io_done(rq);
@ -308,15 +293,15 @@ inline void __blk_mq_end_io(struct request *rq, int error)
blk_mq_free_request(rq); blk_mq_free_request(rq);
} }
} }
EXPORT_SYMBOL(__blk_mq_end_io); EXPORT_SYMBOL(__blk_mq_end_request);
void blk_mq_end_io(struct request *rq, int error) void blk_mq_end_request(struct request *rq, int error)
{ {
if (blk_update_request(rq, error, blk_rq_bytes(rq))) if (blk_update_request(rq, error, blk_rq_bytes(rq)))
BUG(); BUG();
__blk_mq_end_io(rq, error); __blk_mq_end_request(rq, error);
} }
EXPORT_SYMBOL(blk_mq_end_io); EXPORT_SYMBOL(blk_mq_end_request);
static void __blk_mq_complete_request_remote(void *data) static void __blk_mq_complete_request_remote(void *data)
{ {
@ -356,7 +341,7 @@ void __blk_mq_complete_request(struct request *rq)
struct request_queue *q = rq->q; struct request_queue *q = rq->q;
if (!q->softirq_done_fn) if (!q->softirq_done_fn)
blk_mq_end_io(rq, rq->errors); blk_mq_end_request(rq, rq->errors);
else else
blk_mq_ipi_complete_request(rq); blk_mq_ipi_complete_request(rq);
} }
@ -380,7 +365,7 @@ void blk_mq_complete_request(struct request *rq)
} }
EXPORT_SYMBOL(blk_mq_complete_request); EXPORT_SYMBOL(blk_mq_complete_request);
static void blk_mq_start_request(struct request *rq, bool last) void blk_mq_start_request(struct request *rq)
{ {
struct request_queue *q = rq->q; struct request_queue *q = rq->q;
@ -417,35 +402,24 @@ static void blk_mq_start_request(struct request *rq, bool last)
*/ */
rq->nr_phys_segments++; rq->nr_phys_segments++;
} }
/*
* Flag the last request in the series so that drivers know when IO
* should be kicked off, if they don't do it on a per-request basis.
*
* Note: the flag isn't the only condition drivers should do kick off.
* If drive is busy, the last request might not have the bit set.
*/
if (last)
rq->cmd_flags |= REQ_END;
} }
EXPORT_SYMBOL(blk_mq_start_request);
static void __blk_mq_requeue_request(struct request *rq) static void __blk_mq_requeue_request(struct request *rq)
{ {
struct request_queue *q = rq->q; struct request_queue *q = rq->q;
trace_block_rq_requeue(q, rq); trace_block_rq_requeue(q, rq);
clear_bit(REQ_ATOM_STARTED, &rq->atomic_flags);
rq->cmd_flags &= ~REQ_END; if (test_and_clear_bit(REQ_ATOM_STARTED, &rq->atomic_flags)) {
if (q->dma_drain_size && blk_rq_bytes(rq))
if (q->dma_drain_size && blk_rq_bytes(rq)) rq->nr_phys_segments--;
rq->nr_phys_segments--; }
} }
void blk_mq_requeue_request(struct request *rq) void blk_mq_requeue_request(struct request *rq)
{ {
__blk_mq_requeue_request(rq); __blk_mq_requeue_request(rq);
blk_clear_rq_complete(rq);
BUG_ON(blk_queued_rq(rq)); BUG_ON(blk_queued_rq(rq));
blk_mq_add_to_requeue_list(rq, true); blk_mq_add_to_requeue_list(rq, true);
@ -514,78 +488,35 @@ void blk_mq_kick_requeue_list(struct request_queue *q)
} }
EXPORT_SYMBOL(blk_mq_kick_requeue_list); EXPORT_SYMBOL(blk_mq_kick_requeue_list);
static inline bool is_flush_request(struct request *rq, unsigned int tag) static inline bool is_flush_request(struct request *rq,
struct blk_flush_queue *fq, unsigned int tag)
{ {
return ((rq->cmd_flags & REQ_FLUSH_SEQ) && return ((rq->cmd_flags & REQ_FLUSH_SEQ) &&
rq->q->flush_rq->tag == tag); fq->flush_rq->tag == tag);
} }
struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag) struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
{ {
struct request *rq = tags->rqs[tag]; struct request *rq = tags->rqs[tag];
/* mq_ctx of flush rq is always cloned from the corresponding req */
struct blk_flush_queue *fq = blk_get_flush_queue(rq->q, rq->mq_ctx);
if (!is_flush_request(rq, tag)) if (!is_flush_request(rq, fq, tag))
return rq; return rq;
return rq->q->flush_rq; return fq->flush_rq;
} }
EXPORT_SYMBOL(blk_mq_tag_to_rq); EXPORT_SYMBOL(blk_mq_tag_to_rq);
struct blk_mq_timeout_data { struct blk_mq_timeout_data {
struct blk_mq_hw_ctx *hctx; unsigned long next;
unsigned long *next; unsigned int next_set;
unsigned int *next_set;
}; };
static void blk_mq_timeout_check(void *__data, unsigned long *free_tags) void blk_mq_rq_timed_out(struct request *req, bool reserved)
{ {
struct blk_mq_timeout_data *data = __data; struct blk_mq_ops *ops = req->q->mq_ops;
struct blk_mq_hw_ctx *hctx = data->hctx; enum blk_eh_timer_return ret = BLK_EH_RESET_TIMER;
unsigned int tag;
/* It may not be in flight yet (this is where
* the REQ_ATOMIC_STARTED flag comes in). The requests are
* statically allocated, so we know it's always safe to access the
* memory associated with a bit offset into ->rqs[].
*/
tag = 0;
do {
struct request *rq;
tag = find_next_zero_bit(free_tags, hctx->tags->nr_tags, tag);
if (tag >= hctx->tags->nr_tags)
break;
rq = blk_mq_tag_to_rq(hctx->tags, tag++);
if (rq->q != hctx->queue)
continue;
if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
continue;
blk_rq_check_expired(rq, data->next, data->next_set);
} while (1);
}
static void blk_mq_hw_ctx_check_timeout(struct blk_mq_hw_ctx *hctx,
unsigned long *next,
unsigned int *next_set)
{
struct blk_mq_timeout_data data = {
.hctx = hctx,
.next = next,
.next_set = next_set,
};
/*
* Ask the tagging code to iterate busy requests, so we can
* check them for timeout.
*/
blk_mq_tag_busy_iter(hctx->tags, blk_mq_timeout_check, &data);
}
static enum blk_eh_timer_return blk_mq_rq_timed_out(struct request *rq)
{
struct request_queue *q = rq->q;
/* /*
* We know that complete is set at this point. If STARTED isn't set * We know that complete is set at this point. If STARTED isn't set
@ -596,21 +527,54 @@ static enum blk_eh_timer_return blk_mq_rq_timed_out(struct request *rq)
* we both flags will get cleared. So check here again, and ignore * we both flags will get cleared. So check here again, and ignore
* a timeout event with a request that isn't active. * a timeout event with a request that isn't active.
*/ */
if (!test_bit(REQ_ATOM_STARTED, &req->atomic_flags))
return;
if (ops->timeout)
ret = ops->timeout(req, reserved);
switch (ret) {
case BLK_EH_HANDLED:
__blk_mq_complete_request(req);
break;
case BLK_EH_RESET_TIMER:
blk_add_timer(req);
blk_clear_rq_complete(req);
break;
case BLK_EH_NOT_HANDLED:
break;
default:
printk(KERN_ERR "block: bad eh return: %d\n", ret);
break;
}
}
static void blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
struct request *rq, void *priv, bool reserved)
{
struct blk_mq_timeout_data *data = priv;
if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags)) if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
return BLK_EH_NOT_HANDLED; return;
if (!q->mq_ops->timeout) if (time_after_eq(jiffies, rq->deadline)) {
return BLK_EH_RESET_TIMER; if (!blk_mark_rq_complete(rq))
blk_mq_rq_timed_out(rq, reserved);
return q->mq_ops->timeout(rq); } else if (!data->next_set || time_after(data->next, rq->deadline)) {
data->next = rq->deadline;
data->next_set = 1;
}
} }
static void blk_mq_rq_timer(unsigned long data) static void blk_mq_rq_timer(unsigned long priv)
{ {
struct request_queue *q = (struct request_queue *) data; struct request_queue *q = (struct request_queue *)priv;
struct blk_mq_timeout_data data = {
.next = 0,
.next_set = 0,
};
struct blk_mq_hw_ctx *hctx; struct blk_mq_hw_ctx *hctx;
unsigned long next = 0; int i;
int i, next_set = 0;
queue_for_each_hw_ctx(q, hctx, i) { queue_for_each_hw_ctx(q, hctx, i) {
/* /*
@ -620,12 +584,12 @@ static void blk_mq_rq_timer(unsigned long data)
if (!hctx->nr_ctx || !hctx->tags) if (!hctx->nr_ctx || !hctx->tags)
continue; continue;
blk_mq_hw_ctx_check_timeout(hctx, &next, &next_set); blk_mq_tag_busy_iter(hctx, blk_mq_check_expired, &data);
} }
if (next_set) { if (data.next_set) {
next = blk_rq_timeout(round_jiffies_up(next)); data.next = blk_rq_timeout(round_jiffies_up(data.next));
mod_timer(&q->timeout, next); mod_timer(&q->timeout, data.next);
} else { } else {
queue_for_each_hw_ctx(q, hctx, i) queue_for_each_hw_ctx(q, hctx, i)
blk_mq_tag_idle(hctx); blk_mq_tag_idle(hctx);
@ -751,9 +715,7 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
rq = list_first_entry(&rq_list, struct request, queuelist); rq = list_first_entry(&rq_list, struct request, queuelist);
list_del_init(&rq->queuelist); list_del_init(&rq->queuelist);
blk_mq_start_request(rq, list_empty(&rq_list)); ret = q->mq_ops->queue_rq(hctx, rq, list_empty(&rq_list));
ret = q->mq_ops->queue_rq(hctx, rq);
switch (ret) { switch (ret) {
case BLK_MQ_RQ_QUEUE_OK: case BLK_MQ_RQ_QUEUE_OK:
queued++; queued++;
@ -766,7 +728,7 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
pr_err("blk-mq: bad return on queue: %d\n", ret); pr_err("blk-mq: bad return on queue: %d\n", ret);
case BLK_MQ_RQ_QUEUE_ERROR: case BLK_MQ_RQ_QUEUE_ERROR:
rq->errors = -EIO; rq->errors = -EIO;
blk_mq_end_io(rq, rq->errors); blk_mq_end_request(rq, rq->errors);
break; break;
} }
@ -1194,14 +1156,13 @@ static void blk_mq_make_request(struct request_queue *q, struct bio *bio)
int ret; int ret;
blk_mq_bio_to_request(rq, bio); blk_mq_bio_to_request(rq, bio);
blk_mq_start_request(rq, true);
/* /*
* For OK queue, we are done. For error, kill it. Any other * For OK queue, we are done. For error, kill it. Any other
* error (busy), just add it to our list as we previously * error (busy), just add it to our list as we previously
* would have done * would have done
*/ */
ret = q->mq_ops->queue_rq(data.hctx, rq); ret = q->mq_ops->queue_rq(data.hctx, rq, true);
if (ret == BLK_MQ_RQ_QUEUE_OK) if (ret == BLK_MQ_RQ_QUEUE_OK)
goto done; goto done;
else { else {
@ -1209,7 +1170,7 @@ static void blk_mq_make_request(struct request_queue *q, struct bio *bio)
if (ret == BLK_MQ_RQ_QUEUE_ERROR) { if (ret == BLK_MQ_RQ_QUEUE_ERROR) {
rq->errors = -EIO; rq->errors = -EIO;
blk_mq_end_io(rq, rq->errors); blk_mq_end_request(rq, rq->errors);
goto done; goto done;
} }
} }
@ -1531,6 +1492,28 @@ static int blk_mq_hctx_notify(void *data, unsigned long action,
return NOTIFY_OK; return NOTIFY_OK;
} }
static void blk_mq_exit_hctx(struct request_queue *q,
struct blk_mq_tag_set *set,
struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
{
unsigned flush_start_tag = set->queue_depth;
blk_mq_tag_idle(hctx);
if (set->ops->exit_request)
set->ops->exit_request(set->driver_data,
hctx->fq->flush_rq, hctx_idx,
flush_start_tag + hctx_idx);
if (set->ops->exit_hctx)
set->ops->exit_hctx(hctx, hctx_idx);
blk_mq_unregister_cpu_notifier(&hctx->cpu_notifier);
blk_free_flush_queue(hctx->fq);
kfree(hctx->ctxs);
blk_mq_free_bitmap(&hctx->ctx_map);
}
static void blk_mq_exit_hw_queues(struct request_queue *q, static void blk_mq_exit_hw_queues(struct request_queue *q,
struct blk_mq_tag_set *set, int nr_queue) struct blk_mq_tag_set *set, int nr_queue)
{ {
@ -1540,17 +1523,8 @@ static void blk_mq_exit_hw_queues(struct request_queue *q,
queue_for_each_hw_ctx(q, hctx, i) { queue_for_each_hw_ctx(q, hctx, i) {
if (i == nr_queue) if (i == nr_queue)
break; break;
blk_mq_exit_hctx(q, set, hctx, i);
blk_mq_tag_idle(hctx);
if (set->ops->exit_hctx)
set->ops->exit_hctx(hctx, i);
blk_mq_unregister_cpu_notifier(&hctx->cpu_notifier);
kfree(hctx->ctxs);
blk_mq_free_bitmap(&hctx->ctx_map);
} }
} }
static void blk_mq_free_hw_queues(struct request_queue *q, static void blk_mq_free_hw_queues(struct request_queue *q,
@ -1565,6 +1539,77 @@ static void blk_mq_free_hw_queues(struct request_queue *q,
} }
} }
static int blk_mq_init_hctx(struct request_queue *q,
struct blk_mq_tag_set *set,
struct blk_mq_hw_ctx *hctx, unsigned hctx_idx)
{
int node;
unsigned flush_start_tag = set->queue_depth;
node = hctx->numa_node;
if (node == NUMA_NO_NODE)
node = hctx->numa_node = set->numa_node;
INIT_DELAYED_WORK(&hctx->run_work, blk_mq_run_work_fn);
INIT_DELAYED_WORK(&hctx->delay_work, blk_mq_delay_work_fn);
spin_lock_init(&hctx->lock);
INIT_LIST_HEAD(&hctx->dispatch);
hctx->queue = q;
hctx->queue_num = hctx_idx;
hctx->flags = set->flags;
hctx->cmd_size = set->cmd_size;
blk_mq_init_cpu_notifier(&hctx->cpu_notifier,
blk_mq_hctx_notify, hctx);
blk_mq_register_cpu_notifier(&hctx->cpu_notifier);
hctx->tags = set->tags[hctx_idx];
/*
* Allocate space for all possible cpus to avoid allocation at
* runtime
*/
hctx->ctxs = kmalloc_node(nr_cpu_ids * sizeof(void *),
GFP_KERNEL, node);
if (!hctx->ctxs)
goto unregister_cpu_notifier;
if (blk_mq_alloc_bitmap(&hctx->ctx_map, node))
goto free_ctxs;
hctx->nr_ctx = 0;
if (set->ops->init_hctx &&
set->ops->init_hctx(hctx, set->driver_data, hctx_idx))
goto free_bitmap;
hctx->fq = blk_alloc_flush_queue(q, hctx->numa_node, set->cmd_size);
if (!hctx->fq)
goto exit_hctx;
if (set->ops->init_request &&
set->ops->init_request(set->driver_data,
hctx->fq->flush_rq, hctx_idx,
flush_start_tag + hctx_idx, node))
goto free_fq;
return 0;
free_fq:
kfree(hctx->fq);
exit_hctx:
if (set->ops->exit_hctx)
set->ops->exit_hctx(hctx, hctx_idx);
free_bitmap:
blk_mq_free_bitmap(&hctx->ctx_map);
free_ctxs:
kfree(hctx->ctxs);
unregister_cpu_notifier:
blk_mq_unregister_cpu_notifier(&hctx->cpu_notifier);
return -1;
}
static int blk_mq_init_hw_queues(struct request_queue *q, static int blk_mq_init_hw_queues(struct request_queue *q,
struct blk_mq_tag_set *set) struct blk_mq_tag_set *set)
{ {
@ -1575,43 +1620,7 @@ static int blk_mq_init_hw_queues(struct request_queue *q,
* Initialize hardware queues * Initialize hardware queues
*/ */
queue_for_each_hw_ctx(q, hctx, i) { queue_for_each_hw_ctx(q, hctx, i) {
int node; if (blk_mq_init_hctx(q, set, hctx, i))
node = hctx->numa_node;
if (node == NUMA_NO_NODE)
node = hctx->numa_node = set->numa_node;
INIT_DELAYED_WORK(&hctx->run_work, blk_mq_run_work_fn);
INIT_DELAYED_WORK(&hctx->delay_work, blk_mq_delay_work_fn);
spin_lock_init(&hctx->lock);
INIT_LIST_HEAD(&hctx->dispatch);
hctx->queue = q;
hctx->queue_num = i;
hctx->flags = set->flags;
hctx->cmd_size = set->cmd_size;
blk_mq_init_cpu_notifier(&hctx->cpu_notifier,
blk_mq_hctx_notify, hctx);
blk_mq_register_cpu_notifier(&hctx->cpu_notifier);
hctx->tags = set->tags[i];
/*
* Allocate space for all possible cpus to avoid allocation at
* runtime
*/
hctx->ctxs = kmalloc_node(nr_cpu_ids * sizeof(void *),
GFP_KERNEL, node);
if (!hctx->ctxs)
break;
if (blk_mq_alloc_bitmap(&hctx->ctx_map, node))
break;
hctx->nr_ctx = 0;
if (set->ops->init_hctx &&
set->ops->init_hctx(hctx, set->driver_data, i))
break; break;
} }
@ -1765,6 +1774,16 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
if (!ctx) if (!ctx)
return ERR_PTR(-ENOMEM); return ERR_PTR(-ENOMEM);
/*
* If a crashdump is active, then we are potentially in a very
* memory constrained environment. Limit us to 1 queue and
* 64 tags to prevent using too much memory.
*/
if (is_kdump_kernel()) {
set->nr_hw_queues = 1;
set->queue_depth = min(64U, set->queue_depth);
}
hctxs = kmalloc_node(set->nr_hw_queues * sizeof(*hctxs), GFP_KERNEL, hctxs = kmalloc_node(set->nr_hw_queues * sizeof(*hctxs), GFP_KERNEL,
set->numa_node); set->numa_node);
@ -1783,7 +1802,8 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
if (!hctxs[i]) if (!hctxs[i])
goto err_hctxs; goto err_hctxs;
if (!zalloc_cpumask_var(&hctxs[i]->cpumask, GFP_KERNEL)) if (!zalloc_cpumask_var_node(&hctxs[i]->cpumask, GFP_KERNEL,
node))
goto err_hctxs; goto err_hctxs;
atomic_set(&hctxs[i]->nr_active, 0); atomic_set(&hctxs[i]->nr_active, 0);
@ -1830,7 +1850,6 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
else else
blk_queue_make_request(q, blk_sq_make_request); blk_queue_make_request(q, blk_sq_make_request);
blk_queue_rq_timed_out(q, blk_mq_rq_timed_out);
if (set->timeout) if (set->timeout)
blk_queue_rq_timeout(q, set->timeout); blk_queue_rq_timeout(q, set->timeout);
@ -1842,17 +1861,10 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
if (set->ops->complete) if (set->ops->complete)
blk_queue_softirq_done(q, set->ops->complete); blk_queue_softirq_done(q, set->ops->complete);
blk_mq_init_flush(q);
blk_mq_init_cpu_queues(q, set->nr_hw_queues); blk_mq_init_cpu_queues(q, set->nr_hw_queues);
q->flush_rq = kzalloc(round_up(sizeof(struct request) +
set->cmd_size, cache_line_size()),
GFP_KERNEL);
if (!q->flush_rq)
goto err_hw;
if (blk_mq_init_hw_queues(q, set)) if (blk_mq_init_hw_queues(q, set))
goto err_flush_rq; goto err_hw;
mutex_lock(&all_q_mutex); mutex_lock(&all_q_mutex);
list_add_tail(&q->all_q_node, &all_q_list); list_add_tail(&q->all_q_node, &all_q_list);
@ -1864,8 +1876,6 @@ struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
return q; return q;
err_flush_rq:
kfree(q->flush_rq);
err_hw: err_hw:
blk_cleanup_queue(q); blk_cleanup_queue(q);
err_hctxs: err_hctxs:

View file

@ -27,7 +27,6 @@ struct blk_mq_ctx {
void __blk_mq_complete_request(struct request *rq); void __blk_mq_complete_request(struct request *rq);
void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async); void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
void blk_mq_init_flush(struct request_queue *q);
void blk_mq_freeze_queue(struct request_queue *q); void blk_mq_freeze_queue(struct request_queue *q);
void blk_mq_free_queue(struct request_queue *q); void blk_mq_free_queue(struct request_queue *q);
void blk_mq_clone_flush_request(struct request *flush_rq, void blk_mq_clone_flush_request(struct request *flush_rq,
@ -60,6 +59,8 @@ extern int blk_mq_hw_queue_to_node(unsigned int *map, unsigned int);
extern int blk_mq_sysfs_register(struct request_queue *q); extern int blk_mq_sysfs_register(struct request_queue *q);
extern void blk_mq_sysfs_unregister(struct request_queue *q); extern void blk_mq_sysfs_unregister(struct request_queue *q);
extern void blk_mq_rq_timed_out(struct request *req, bool reserved);
/* /*
* Basic implementation of sparser bitmap, allowing the user to spread * Basic implementation of sparser bitmap, allowing the user to spread
* the bits over more cachelines. * the bits over more cachelines.

View file

@ -574,7 +574,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
bottom = max(b->physical_block_size, b->io_min) + alignment; bottom = max(b->physical_block_size, b->io_min) + alignment;
/* Verify that top and bottom intervals line up */ /* Verify that top and bottom intervals line up */
if (max(top, bottom) & (min(top, bottom) - 1)) { if (max(top, bottom) % min(top, bottom)) {
t->misaligned = 1; t->misaligned = 1;
ret = -1; ret = -1;
} }
@ -619,7 +619,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
/* Find lowest common alignment_offset */ /* Find lowest common alignment_offset */
t->alignment_offset = lcm(t->alignment_offset, alignment) t->alignment_offset = lcm(t->alignment_offset, alignment)
& (max(t->physical_block_size, t->io_min) - 1); % max(t->physical_block_size, t->io_min);
/* Verify that new alignment_offset is on a logical block boundary */ /* Verify that new alignment_offset is on a logical block boundary */
if (t->alignment_offset & (t->logical_block_size - 1)) { if (t->alignment_offset & (t->logical_block_size - 1)) {

View file

@ -519,8 +519,8 @@ static void blk_release_queue(struct kobject *kobj)
if (q->mq_ops) if (q->mq_ops)
blk_mq_free_queue(q); blk_mq_free_queue(q);
else
kfree(q->flush_rq); blk_free_flush_queue(q->fq);
blk_trace_shutdown(q); blk_trace_shutdown(q);

View file

@ -90,10 +90,7 @@ static void blk_rq_timed_out(struct request *req)
switch (ret) { switch (ret) {
case BLK_EH_HANDLED: case BLK_EH_HANDLED:
/* Can we use req->errors here? */ /* Can we use req->errors here? */
if (q->mq_ops) __blk_complete_request(req);
__blk_mq_complete_request(req);
else
__blk_complete_request(req);
break; break;
case BLK_EH_RESET_TIMER: case BLK_EH_RESET_TIMER:
blk_add_timer(req); blk_add_timer(req);
@ -113,7 +110,7 @@ static void blk_rq_timed_out(struct request *req)
} }
} }
void blk_rq_check_expired(struct request *rq, unsigned long *next_timeout, static void blk_rq_check_expired(struct request *rq, unsigned long *next_timeout,
unsigned int *next_set) unsigned int *next_set)
{ {
if (time_after_eq(jiffies, rq->deadline)) { if (time_after_eq(jiffies, rq->deadline)) {
@ -162,7 +159,10 @@ void blk_abort_request(struct request *req)
if (blk_mark_rq_complete(req)) if (blk_mark_rq_complete(req))
return; return;
blk_delete_timer(req); blk_delete_timer(req);
blk_rq_timed_out(req); if (req->q->mq_ops)
blk_mq_rq_timed_out(req, false);
else
blk_rq_timed_out(req);
} }
EXPORT_SYMBOL_GPL(blk_abort_request); EXPORT_SYMBOL_GPL(blk_abort_request);
@ -190,7 +190,8 @@ void blk_add_timer(struct request *req)
struct request_queue *q = req->q; struct request_queue *q = req->q;
unsigned long expiry; unsigned long expiry;
if (!q->rq_timed_out_fn) /* blk-mq has its own handler, so we don't need ->rq_timed_out_fn */
if (!q->mq_ops && !q->rq_timed_out_fn)
return; return;
BUG_ON(!list_empty(&req->timeout_list)); BUG_ON(!list_empty(&req->timeout_list));

View file

@ -2,6 +2,8 @@
#define BLK_INTERNAL_H #define BLK_INTERNAL_H
#include <linux/idr.h> #include <linux/idr.h>
#include <linux/blk-mq.h>
#include "blk-mq.h"
/* Amount of time in which a process may batch requests */ /* Amount of time in which a process may batch requests */
#define BLK_BATCH_TIME (HZ/50UL) #define BLK_BATCH_TIME (HZ/50UL)
@ -12,16 +14,44 @@
/* Max future timer expiry for timeouts */ /* Max future timer expiry for timeouts */
#define BLK_MAX_TIMEOUT (5 * HZ) #define BLK_MAX_TIMEOUT (5 * HZ)
struct blk_flush_queue {
unsigned int flush_queue_delayed:1;
unsigned int flush_pending_idx:1;
unsigned int flush_running_idx:1;
unsigned long flush_pending_since;
struct list_head flush_queue[2];
struct list_head flush_data_in_flight;
struct request *flush_rq;
spinlock_t mq_flush_lock;
};
extern struct kmem_cache *blk_requestq_cachep; extern struct kmem_cache *blk_requestq_cachep;
extern struct kmem_cache *request_cachep; extern struct kmem_cache *request_cachep;
extern struct kobj_type blk_queue_ktype; extern struct kobj_type blk_queue_ktype;
extern struct ida blk_queue_ida; extern struct ida blk_queue_ida;
static inline struct blk_flush_queue *blk_get_flush_queue(
struct request_queue *q, struct blk_mq_ctx *ctx)
{
struct blk_mq_hw_ctx *hctx;
if (!q->mq_ops)
return q->fq;
hctx = q->mq_ops->map_queue(q, ctx->cpu);
return hctx->fq;
}
static inline void __blk_get_queue(struct request_queue *q) static inline void __blk_get_queue(struct request_queue *q)
{ {
kobject_get(&q->kobj); kobject_get(&q->kobj);
} }
struct blk_flush_queue *blk_alloc_flush_queue(struct request_queue *q,
int node, int cmd_size);
void blk_free_flush_queue(struct blk_flush_queue *q);
int blk_init_rl(struct request_list *rl, struct request_queue *q, int blk_init_rl(struct request_list *rl, struct request_queue *q,
gfp_t gfp_mask); gfp_t gfp_mask);
void blk_exit_rl(struct request_list *rl); void blk_exit_rl(struct request_list *rl);
@ -38,8 +68,6 @@ bool __blk_end_bidi_request(struct request *rq, int error,
unsigned int nr_bytes, unsigned int bidi_bytes); unsigned int nr_bytes, unsigned int bidi_bytes);
void blk_rq_timed_out_timer(unsigned long data); void blk_rq_timed_out_timer(unsigned long data);
void blk_rq_check_expired(struct request *rq, unsigned long *next_timeout,
unsigned int *next_set);
unsigned long blk_rq_timeout(unsigned long timeout); unsigned long blk_rq_timeout(unsigned long timeout);
void blk_add_timer(struct request *req); void blk_add_timer(struct request *req);
void blk_delete_timer(struct request *); void blk_delete_timer(struct request *);
@ -88,6 +116,7 @@ void blk_insert_flush(struct request *rq);
static inline struct request *__elv_next_request(struct request_queue *q) static inline struct request *__elv_next_request(struct request_queue *q)
{ {
struct request *rq; struct request *rq;
struct blk_flush_queue *fq = blk_get_flush_queue(q, NULL);
while (1) { while (1) {
if (!list_empty(&q->queue_head)) { if (!list_empty(&q->queue_head)) {
@ -110,9 +139,9 @@ static inline struct request *__elv_next_request(struct request_queue *q)
* should be restarted later. Please see flush_end_io() for * should be restarted later. Please see flush_end_io() for
* details. * details.
*/ */
if (q->flush_pending_idx != q->flush_running_idx && if (fq->flush_pending_idx != fq->flush_running_idx &&
!queue_flush_queueable(q)) { !queue_flush_queueable(q)) {
q->flush_queue_delayed = 1; fq->flush_queue_delayed = 1;
return NULL; return NULL;
} }
if (unlikely(blk_queue_bypass(q)) || if (unlikely(blk_queue_bypass(q)) ||

View file

@ -270,8 +270,8 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, fmode_t has_write_perm,
* map scatter-gather elements separately and string them to request * map scatter-gather elements separately and string them to request
*/ */
rq = blk_get_request(q, rw, GFP_KERNEL); rq = blk_get_request(q, rw, GFP_KERNEL);
if (!rq) if (IS_ERR(rq))
return ERR_PTR(-ENOMEM); return rq;
blk_rq_set_block_pc(rq); blk_rq_set_block_pc(rq);
ret = blk_fill_sgv4_hdr_rq(q, rq, hdr, bd, has_write_perm); ret = blk_fill_sgv4_hdr_rq(q, rq, hdr, bd, has_write_perm);
@ -285,8 +285,9 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, fmode_t has_write_perm,
} }
next_rq = blk_get_request(q, READ, GFP_KERNEL); next_rq = blk_get_request(q, READ, GFP_KERNEL);
if (!next_rq) { if (IS_ERR(next_rq)) {
ret = -ENOMEM; ret = PTR_ERR(next_rq);
next_rq = NULL;
goto out; goto out;
} }
rq->next_rq = next_rq; rq->next_rq = next_rq;

View file

@ -299,7 +299,7 @@ struct cfq_io_cq {
struct cfq_ttime ttime; struct cfq_ttime ttime;
int ioprio; /* the current ioprio */ int ioprio; /* the current ioprio */
#ifdef CONFIG_CFQ_GROUP_IOSCHED #ifdef CONFIG_CFQ_GROUP_IOSCHED
uint64_t blkcg_id; /* the current blkcg ID */ uint64_t blkcg_serial_nr; /* the current blkcg serial */
#endif #endif
}; };
@ -3547,17 +3547,17 @@ static void check_blkcg_changed(struct cfq_io_cq *cic, struct bio *bio)
{ {
struct cfq_data *cfqd = cic_to_cfqd(cic); struct cfq_data *cfqd = cic_to_cfqd(cic);
struct cfq_queue *sync_cfqq; struct cfq_queue *sync_cfqq;
uint64_t id; uint64_t serial_nr;
rcu_read_lock(); rcu_read_lock();
id = bio_blkcg(bio)->id; serial_nr = bio_blkcg(bio)->css.serial_nr;
rcu_read_unlock(); rcu_read_unlock();
/* /*
* Check whether blkcg has changed. The condition may trigger * Check whether blkcg has changed. The condition may trigger
* spuriously on a newly created cic but there's no harm. * spuriously on a newly created cic but there's no harm.
*/ */
if (unlikely(!cfqd) || likely(cic->blkcg_id == id)) if (unlikely(!cfqd) || likely(cic->blkcg_serial_nr == serial_nr))
return; return;
sync_cfqq = cic_to_cfqq(cic, 1); sync_cfqq = cic_to_cfqq(cic, 1);
@ -3571,7 +3571,7 @@ static void check_blkcg_changed(struct cfq_io_cq *cic, struct bio *bio)
cfq_put_queue(sync_cfqq); cfq_put_queue(sync_cfqq);
} }
cic->blkcg_id = id; cic->blkcg_serial_nr = serial_nr;
} }
#else #else
static inline void check_blkcg_changed(struct cfq_io_cq *cic, struct bio *bio) { } static inline void check_blkcg_changed(struct cfq_io_cq *cic, struct bio *bio) { }

View file

@ -709,8 +709,6 @@ long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
if (!arg) if (!arg)
return -EINVAL; return -EINVAL;
bdi = blk_get_backing_dev_info(bdev); bdi = blk_get_backing_dev_info(bdev);
if (bdi == NULL)
return -ENOTTY;
return compat_put_long(arg, return compat_put_long(arg,
(bdi->ra_pages * PAGE_CACHE_SIZE) / 512); (bdi->ra_pages * PAGE_CACHE_SIZE) / 512);
case BLKROGET: /* compatible */ case BLKROGET: /* compatible */
@ -731,8 +729,6 @@ long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg)
if (!capable(CAP_SYS_ADMIN)) if (!capable(CAP_SYS_ADMIN))
return -EACCES; return -EACCES;
bdi = blk_get_backing_dev_info(bdev); bdi = blk_get_backing_dev_info(bdev);
if (bdi == NULL)
return -ENOTTY;
bdi->ra_pages = (arg * 512) / PAGE_CACHE_SIZE; bdi->ra_pages = (arg * 512) / PAGE_CACHE_SIZE;
return 0; return 0;
case BLKGETSIZE: case BLKGETSIZE:

View file

@ -356,8 +356,6 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
if (!arg) if (!arg)
return -EINVAL; return -EINVAL;
bdi = blk_get_backing_dev_info(bdev); bdi = blk_get_backing_dev_info(bdev);
if (bdi == NULL)
return -ENOTTY;
return put_long(arg, (bdi->ra_pages * PAGE_CACHE_SIZE) / 512); return put_long(arg, (bdi->ra_pages * PAGE_CACHE_SIZE) / 512);
case BLKROGET: case BLKROGET:
return put_int(arg, bdev_read_only(bdev) != 0); return put_int(arg, bdev_read_only(bdev) != 0);
@ -386,8 +384,6 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
if(!capable(CAP_SYS_ADMIN)) if(!capable(CAP_SYS_ADMIN))
return -EACCES; return -EACCES;
bdi = blk_get_backing_dev_info(bdev); bdi = blk_get_backing_dev_info(bdev);
if (bdi == NULL)
return -ENOTTY;
bdi->ra_pages = (arg * 512) / PAGE_CACHE_SIZE; bdi->ra_pages = (arg * 512) / PAGE_CACHE_SIZE;
return 0; return 0;
case BLKBSZSET: case BLKBSZSET:

View file

@ -81,7 +81,7 @@ int mac_partition(struct parsed_partitions *state)
be32_to_cpu(part->start_block) * (secsize/512), be32_to_cpu(part->start_block) * (secsize/512),
be32_to_cpu(part->block_count) * (secsize/512)); be32_to_cpu(part->block_count) * (secsize/512));
if (!strnicmp(part->type, "Linux_RAID", 10)) if (!strncasecmp(part->type, "Linux_RAID", 10))
state->parts[slot].flags = ADDPART_FLAG_RAID; state->parts[slot].flags = ADDPART_FLAG_RAID;
#ifdef CONFIG_PPC_PMAC #ifdef CONFIG_PPC_PMAC
/* /*
@ -100,7 +100,7 @@ int mac_partition(struct parsed_partitions *state)
goodness++; goodness++;
if (strcasecmp(part->type, "Apple_UNIX_SVR2") == 0 if (strcasecmp(part->type, "Apple_UNIX_SVR2") == 0
|| (strnicmp(part->type, "Linux", 5) == 0 || (strncasecmp(part->type, "Linux", 5) == 0
&& strcasecmp(part->type, "Linux_swap") != 0)) { && strcasecmp(part->type, "Linux_swap") != 0)) {
int i, l; int i, l;
@ -109,13 +109,13 @@ int mac_partition(struct parsed_partitions *state)
if (strcmp(part->name, "/") == 0) if (strcmp(part->name, "/") == 0)
goodness++; goodness++;
for (i = 0; i <= l - 4; ++i) { for (i = 0; i <= l - 4; ++i) {
if (strnicmp(part->name + i, "root", if (strncasecmp(part->name + i, "root",
4) == 0) { 4) == 0) {
goodness += 2; goodness += 2;
break; break;
} }
} }
if (strnicmp(part->name, "swap", 4) == 0) if (strncasecmp(part->name, "swap", 4) == 0)
goodness--; goodness--;
} }

View file

@ -316,8 +316,8 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
ret = -ENOMEM; ret = -ENOMEM;
rq = blk_get_request(q, writing ? WRITE : READ, GFP_KERNEL); rq = blk_get_request(q, writing ? WRITE : READ, GFP_KERNEL);
if (!rq) if (IS_ERR(rq))
goto out; return PTR_ERR(rq);
blk_rq_set_block_pc(rq); blk_rq_set_block_pc(rq);
if (hdr->cmd_len > BLK_MAX_CDB) { if (hdr->cmd_len > BLK_MAX_CDB) {
@ -387,7 +387,6 @@ out_free_cdb:
kfree(rq->cmd); kfree(rq->cmd);
out_put_request: out_put_request:
blk_put_request(rq); blk_put_request(rq);
out:
return ret; return ret;
} }
@ -457,8 +456,8 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk *disk, fmode_t mode,
} }
rq = blk_get_request(q, in_len ? WRITE : READ, __GFP_WAIT); rq = blk_get_request(q, in_len ? WRITE : READ, __GFP_WAIT);
if (!rq) { if (IS_ERR(rq)) {
err = -ENOMEM; err = PTR_ERR(rq);
goto error; goto error;
} }
blk_rq_set_block_pc(rq); blk_rq_set_block_pc(rq);
@ -548,6 +547,8 @@ static int __blk_send_generic(struct request_queue *q, struct gendisk *bd_disk,
int err; int err;
rq = blk_get_request(q, WRITE, __GFP_WAIT); rq = blk_get_request(q, WRITE, __GFP_WAIT);
if (IS_ERR(rq))
return PTR_ERR(rq);
blk_rq_set_block_pc(rq); blk_rq_set_block_pc(rq);
rq->timeout = BLK_DEFAULT_SG_TIMEOUT; rq->timeout = BLK_DEFAULT_SG_TIMEOUT;
rq->cmd[0] = cmd; rq->cmd[0] = cmd;

197
block/t10-pi.c Normal file
View file

@ -0,0 +1,197 @@
/*
* t10_pi.c - Functions for generating and verifying T10 Protection
* Information.
*
* Copyright (C) 2007, 2008, 2014 Oracle Corporation
* Written by: Martin K. Petersen <martin.petersen@oracle.com>
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License version
* 2 as published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; see the file COPYING. If not, write to
* the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139,
* USA.
*
*/
#include <linux/t10-pi.h>
#include <linux/blkdev.h>
#include <linux/crc-t10dif.h>
#include <net/checksum.h>
typedef __be16 (csum_fn) (void *, unsigned int);
static const __be16 APP_ESCAPE = (__force __be16) 0xffff;
static const __be32 REF_ESCAPE = (__force __be32) 0xffffffff;
static __be16 t10_pi_crc_fn(void *data, unsigned int len)
{
return cpu_to_be16(crc_t10dif(data, len));
}
static __be16 t10_pi_ip_fn(void *data, unsigned int len)
{
return (__force __be16)ip_compute_csum(data, len);
}
/*
* Type 1 and Type 2 protection use the same format: 16 bit guard tag,
* 16 bit app tag, 32 bit reference tag. Type 3 does not define the ref
* tag.
*/
static int t10_pi_generate(struct blk_integrity_iter *iter, csum_fn *fn,
unsigned int type)
{
unsigned int i;
for (i = 0 ; i < iter->data_size ; i += iter->interval) {
struct t10_pi_tuple *pi = iter->prot_buf;
pi->guard_tag = fn(iter->data_buf, iter->interval);
pi->app_tag = 0;
if (type == 1)
pi->ref_tag = cpu_to_be32(lower_32_bits(iter->seed));
else
pi->ref_tag = 0;
iter->data_buf += iter->interval;
iter->prot_buf += sizeof(struct t10_pi_tuple);
iter->seed++;
}
return 0;
}
static int t10_pi_verify(struct blk_integrity_iter *iter, csum_fn *fn,
unsigned int type)
{
unsigned int i;
for (i = 0 ; i < iter->data_size ; i += iter->interval) {
struct t10_pi_tuple *pi = iter->prot_buf;
__be16 csum;
switch (type) {
case 1:
case 2:
if (pi->app_tag == APP_ESCAPE)
goto next;
if (be32_to_cpu(pi->ref_tag) !=
lower_32_bits(iter->seed)) {
pr_err("%s: ref tag error at location %llu " \
"(rcvd %u)\n", iter->disk_name,
(unsigned long long)
iter->seed, be32_to_cpu(pi->ref_tag));
return -EILSEQ;
}
break;
case 3:
if (pi->app_tag == APP_ESCAPE &&
pi->ref_tag == REF_ESCAPE)
goto next;
break;
}
csum = fn(iter->data_buf, iter->interval);
if (pi->guard_tag != csum) {
pr_err("%s: guard tag error at sector %llu " \
"(rcvd %04x, want %04x)\n", iter->disk_name,
(unsigned long long)iter->seed,
be16_to_cpu(pi->guard_tag), be16_to_cpu(csum));
return -EILSEQ;
}
next:
iter->data_buf += iter->interval;
iter->prot_buf += sizeof(struct t10_pi_tuple);
iter->seed++;
}
return 0;
}
static int t10_pi_type1_generate_crc(struct blk_integrity_iter *iter)
{
return t10_pi_generate(iter, t10_pi_crc_fn, 1);
}
static int t10_pi_type1_generate_ip(struct blk_integrity_iter *iter)
{
return t10_pi_generate(iter, t10_pi_ip_fn, 1);
}
static int t10_pi_type1_verify_crc(struct blk_integrity_iter *iter)
{
return t10_pi_verify(iter, t10_pi_crc_fn, 1);
}
static int t10_pi_type1_verify_ip(struct blk_integrity_iter *iter)
{
return t10_pi_verify(iter, t10_pi_ip_fn, 1);
}
static int t10_pi_type3_generate_crc(struct blk_integrity_iter *iter)
{
return t10_pi_generate(iter, t10_pi_crc_fn, 3);
}
static int t10_pi_type3_generate_ip(struct blk_integrity_iter *iter)
{
return t10_pi_generate(iter, t10_pi_ip_fn, 3);
}
static int t10_pi_type3_verify_crc(struct blk_integrity_iter *iter)
{
return t10_pi_verify(iter, t10_pi_crc_fn, 3);
}
static int t10_pi_type3_verify_ip(struct blk_integrity_iter *iter)
{
return t10_pi_verify(iter, t10_pi_ip_fn, 3);
}
struct blk_integrity t10_pi_type1_crc = {
.name = "T10-DIF-TYPE1-CRC",
.generate_fn = t10_pi_type1_generate_crc,
.verify_fn = t10_pi_type1_verify_crc,
.tuple_size = sizeof(struct t10_pi_tuple),
.tag_size = 0,
};
EXPORT_SYMBOL(t10_pi_type1_crc);
struct blk_integrity t10_pi_type1_ip = {
.name = "T10-DIF-TYPE1-IP",
.generate_fn = t10_pi_type1_generate_ip,
.verify_fn = t10_pi_type1_verify_ip,
.tuple_size = sizeof(struct t10_pi_tuple),
.tag_size = 0,
};
EXPORT_SYMBOL(t10_pi_type1_ip);
struct blk_integrity t10_pi_type3_crc = {
.name = "T10-DIF-TYPE3-CRC",
.generate_fn = t10_pi_type3_generate_crc,
.verify_fn = t10_pi_type3_verify_crc,
.tuple_size = sizeof(struct t10_pi_tuple),
.tag_size = 0,
};
EXPORT_SYMBOL(t10_pi_type3_crc);
struct blk_integrity t10_pi_type3_ip = {
.name = "T10-DIF-TYPE3-IP",
.generate_fn = t10_pi_type3_generate_ip,
.verify_fn = t10_pi_type3_verify_ip,
.tuple_size = sizeof(struct t10_pi_tuple),
.tag_size = 0,
};
EXPORT_SYMBOL(t10_pi_type3_ip);

View file

@ -247,7 +247,7 @@ static void mtip_async_complete(struct mtip_port *port,
if (unlikely(cmd->unaligned)) if (unlikely(cmd->unaligned))
up(&port->cmd_slot_unal); up(&port->cmd_slot_unal);
blk_mq_end_io(rq, status ? -EIO : 0); blk_mq_end_request(rq, status ? -EIO : 0);
} }
/* /*
@ -3739,7 +3739,7 @@ static int mtip_submit_request(struct blk_mq_hw_ctx *hctx, struct request *rq)
int err; int err;
err = mtip_send_trim(dd, blk_rq_pos(rq), blk_rq_sectors(rq)); err = mtip_send_trim(dd, blk_rq_pos(rq), blk_rq_sectors(rq));
blk_mq_end_io(rq, err); blk_mq_end_request(rq, err);
return 0; return 0;
} }
@ -3775,13 +3775,16 @@ static bool mtip_check_unal_depth(struct blk_mq_hw_ctx *hctx,
return false; return false;
} }
static int mtip_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *rq) static int mtip_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *rq,
bool last)
{ {
int ret; int ret;
if (unlikely(mtip_check_unal_depth(hctx, rq))) if (unlikely(mtip_check_unal_depth(hctx, rq)))
return BLK_MQ_RQ_QUEUE_BUSY; return BLK_MQ_RQ_QUEUE_BUSY;
blk_mq_start_request(rq);
ret = mtip_submit_request(hctx, rq); ret = mtip_submit_request(hctx, rq);
if (likely(!ret)) if (likely(!ret))
return BLK_MQ_RQ_QUEUE_OK; return BLK_MQ_RQ_QUEUE_OK;

View file

@ -177,7 +177,7 @@ static void end_cmd(struct nullb_cmd *cmd)
{ {
switch (queue_mode) { switch (queue_mode) {
case NULL_Q_MQ: case NULL_Q_MQ:
blk_mq_end_io(cmd->rq, 0); blk_mq_end_request(cmd->rq, 0);
return; return;
case NULL_Q_RQ: case NULL_Q_RQ:
INIT_LIST_HEAD(&cmd->rq->queuelist); INIT_LIST_HEAD(&cmd->rq->queuelist);
@ -313,13 +313,16 @@ static void null_request_fn(struct request_queue *q)
} }
} }
static int null_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *rq) static int null_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *rq,
bool last)
{ {
struct nullb_cmd *cmd = blk_mq_rq_to_pdu(rq); struct nullb_cmd *cmd = blk_mq_rq_to_pdu(rq);
cmd->rq = rq; cmd->rq = rq;
cmd->nq = hctx->driver_data; cmd->nq = hctx->driver_data;
blk_mq_start_request(rq);
null_handle_cmd(cmd); null_handle_cmd(cmd);
return BLK_MQ_RQ_QUEUE_OK; return BLK_MQ_RQ_QUEUE_OK;
} }

View file

@ -722,6 +722,8 @@ static int pd_special_command(struct pd_unit *disk,
int err = 0; int err = 0;
rq = blk_get_request(disk->gd->queue, READ, __GFP_WAIT); rq = blk_get_request(disk->gd->queue, READ, __GFP_WAIT);
if (IS_ERR(rq))
return PTR_ERR(rq);
rq->cmd_type = REQ_TYPE_SPECIAL; rq->cmd_type = REQ_TYPE_SPECIAL;
rq->special = func; rq->special = func;

View file

@ -704,6 +704,8 @@ static int pkt_generic_packet(struct pktcdvd_device *pd, struct packet_command *
rq = blk_get_request(q, (cgc->data_direction == CGC_DATA_WRITE) ? rq = blk_get_request(q, (cgc->data_direction == CGC_DATA_WRITE) ?
WRITE : READ, __GFP_WAIT); WRITE : READ, __GFP_WAIT);
if (IS_ERR(rq))
return PTR_ERR(rq);
blk_rq_set_block_pc(rq); blk_rq_set_block_pc(rq);
if (cgc->buflen) { if (cgc->buflen) {

View file

@ -568,7 +568,7 @@ static struct carm_request *carm_get_special(struct carm_host *host)
return NULL; return NULL;
rq = blk_get_request(host->oob_q, WRITE /* bogus */, GFP_KERNEL); rq = blk_get_request(host->oob_q, WRITE /* bogus */, GFP_KERNEL);
if (!rq) { if (IS_ERR(rq)) {
spin_lock_irqsave(&host->lock, flags); spin_lock_irqsave(&host->lock, flags);
carm_put_request(host, crq); carm_put_request(host, crq);
spin_unlock_irqrestore(&host->lock, flags); spin_unlock_irqrestore(&host->lock, flags);

View file

@ -129,7 +129,7 @@ static inline void virtblk_request_done(struct request *req)
req->errors = (error != 0); req->errors = (error != 0);
} }
blk_mq_end_io(req, error); blk_mq_end_request(req, error);
} }
static void virtblk_done(struct virtqueue *vq) static void virtblk_done(struct virtqueue *vq)
@ -158,14 +158,14 @@ static void virtblk_done(struct virtqueue *vq)
spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags); spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags);
} }
static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req) static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req,
bool last)
{ {
struct virtio_blk *vblk = hctx->queue->queuedata; struct virtio_blk *vblk = hctx->queue->queuedata;
struct virtblk_req *vbr = blk_mq_rq_to_pdu(req); struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
unsigned long flags; unsigned long flags;
unsigned int num; unsigned int num;
int qid = hctx->queue_num; int qid = hctx->queue_num;
const bool last = (req->cmd_flags & REQ_END) != 0;
int err; int err;
bool notify = false; bool notify = false;
@ -199,6 +199,8 @@ static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
} }
} }
blk_mq_start_request(req);
num = blk_rq_map_sg(hctx->queue, vbr->req, vbr->sg); num = blk_rq_map_sg(hctx->queue, vbr->req, vbr->sg);
if (num) { if (num) {
if (rq_data_dir(vbr->req) == WRITE) if (rq_data_dir(vbr->req) == WRITE)

View file

@ -2180,8 +2180,8 @@ static int cdrom_read_cdda_bpc(struct cdrom_device_info *cdi, __u8 __user *ubuf,
len = nr * CD_FRAMESIZE_RAW; len = nr * CD_FRAMESIZE_RAW;
rq = blk_get_request(q, READ, GFP_KERNEL); rq = blk_get_request(q, READ, GFP_KERNEL);
if (!rq) { if (IS_ERR(rq)) {
ret = -ENOMEM; ret = PTR_ERR(rq);
break; break;
} }
blk_rq_set_block_pc(rq); blk_rq_set_block_pc(rq);

View file

@ -46,7 +46,7 @@ static void issue_park_cmd(ide_drive_t *drive, unsigned long timeout)
* timeout has expired, so power management will be reenabled. * timeout has expired, so power management will be reenabled.
*/ */
rq = blk_get_request(q, READ, GFP_NOWAIT); rq = blk_get_request(q, READ, GFP_NOWAIT);
if (unlikely(!rq)) if (IS_ERR(rq))
goto out; goto out;
rq->cmd[0] = REQ_UNPARK_HEADS; rq->cmd[0] = REQ_UNPARK_HEADS;

View file

@ -73,7 +73,6 @@ comment "SCSI support type (disk, tape, CD-ROM)"
config BLK_DEV_SD config BLK_DEV_SD
tristate "SCSI disk support" tristate "SCSI disk support"
depends on SCSI depends on SCSI
select CRC_T10DIF if BLK_DEV_INTEGRITY
---help--- ---help---
If you want to use SCSI hard disks, Fibre Channel disks, If you want to use SCSI hard disks, Fibre Channel disks,
Serial ATA (SATA) or Parallel ATA (PATA) hard disks, Serial ATA (SATA) or Parallel ATA (PATA) hard disks,

View file

@ -115,7 +115,7 @@ static struct request *get_alua_req(struct scsi_device *sdev,
rq = blk_get_request(q, rw, GFP_NOIO); rq = blk_get_request(q, rw, GFP_NOIO);
if (!rq) { if (IS_ERR(rq)) {
sdev_printk(KERN_INFO, sdev, sdev_printk(KERN_INFO, sdev,
"%s: blk_get_request failed\n", __func__); "%s: blk_get_request failed\n", __func__);
return NULL; return NULL;

View file

@ -275,7 +275,7 @@ static struct request *get_req(struct scsi_device *sdev, int cmd,
rq = blk_get_request(sdev->request_queue, rq = blk_get_request(sdev->request_queue,
(cmd != INQUIRY) ? WRITE : READ, GFP_NOIO); (cmd != INQUIRY) ? WRITE : READ, GFP_NOIO);
if (!rq) { if (IS_ERR(rq)) {
sdev_printk(KERN_INFO, sdev, "get_req: blk_get_request failed"); sdev_printk(KERN_INFO, sdev, "get_req: blk_get_request failed");
return NULL; return NULL;
} }

View file

@ -117,7 +117,7 @@ static int hp_sw_tur(struct scsi_device *sdev, struct hp_sw_dh_data *h)
retry: retry:
req = blk_get_request(sdev->request_queue, WRITE, GFP_NOIO); req = blk_get_request(sdev->request_queue, WRITE, GFP_NOIO);
if (!req) if (IS_ERR(req))
return SCSI_DH_RES_TEMP_UNAVAIL; return SCSI_DH_RES_TEMP_UNAVAIL;
blk_rq_set_block_pc(req); blk_rq_set_block_pc(req);
@ -247,7 +247,7 @@ static int hp_sw_start_stop(struct hp_sw_dh_data *h)
struct request *req; struct request *req;
req = blk_get_request(h->sdev->request_queue, WRITE, GFP_ATOMIC); req = blk_get_request(h->sdev->request_queue, WRITE, GFP_ATOMIC);
if (!req) if (IS_ERR(req))
return SCSI_DH_RES_TEMP_UNAVAIL; return SCSI_DH_RES_TEMP_UNAVAIL;
blk_rq_set_block_pc(req); blk_rq_set_block_pc(req);

View file

@ -274,7 +274,7 @@ static struct request *get_rdac_req(struct scsi_device *sdev,
rq = blk_get_request(q, rw, GFP_NOIO); rq = blk_get_request(q, rw, GFP_NOIO);
if (!rq) { if (IS_ERR(rq)) {
sdev_printk(KERN_INFO, sdev, sdev_printk(KERN_INFO, sdev,
"get_rdac_req: blk_get_request failed.\n"); "get_rdac_req: blk_get_request failed.\n");
return NULL; return NULL;

View file

@ -1567,8 +1567,8 @@ static struct request *_make_request(struct request_queue *q, bool has_write,
struct request *req; struct request *req;
req = blk_get_request(q, has_write ? WRITE : READ, flags); req = blk_get_request(q, has_write ? WRITE : READ, flags);
if (unlikely(!req)) if (IS_ERR(req))
return ERR_PTR(-ENOMEM); return req;
blk_rq_set_block_pc(req); blk_rq_set_block_pc(req);
return req; return req;

View file

@ -362,7 +362,7 @@ static int osst_execute(struct osst_request *SRpnt, const unsigned char *cmd,
int write = (data_direction == DMA_TO_DEVICE); int write = (data_direction == DMA_TO_DEVICE);
req = blk_get_request(SRpnt->stp->device->request_queue, write, GFP_KERNEL); req = blk_get_request(SRpnt->stp->device->request_queue, write, GFP_KERNEL);
if (!req) if (IS_ERR(req))
return DRIVER_ERROR << 24; return DRIVER_ERROR << 24;
blk_rq_set_block_pc(req); blk_rq_set_block_pc(req);

View file

@ -1961,6 +1961,8 @@ static void scsi_eh_lock_door(struct scsi_device *sdev)
* request becomes available * request becomes available
*/ */
req = blk_get_request(sdev->request_queue, READ, GFP_KERNEL); req = blk_get_request(sdev->request_queue, READ, GFP_KERNEL);
if (IS_ERR(req))
return;
blk_rq_set_block_pc(req); blk_rq_set_block_pc(req);

View file

@ -221,7 +221,7 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
int ret = DRIVER_ERROR << 24; int ret = DRIVER_ERROR << 24;
req = blk_get_request(sdev->request_queue, write, __GFP_WAIT); req = blk_get_request(sdev->request_queue, write, __GFP_WAIT);
if (!req) if (IS_ERR(req))
return ret; return ret;
blk_rq_set_block_pc(req); blk_rq_set_block_pc(req);
@ -715,7 +715,7 @@ static bool scsi_end_request(struct request *req, int error,
if (req->mq_ctx) { if (req->mq_ctx) {
/* /*
* In the MQ case the command gets freed by __blk_mq_end_io, * In the MQ case the command gets freed by __blk_mq_end_request,
* so we have to do all cleanup that depends on it earlier. * so we have to do all cleanup that depends on it earlier.
* *
* We also can't kick the queues from irq context, so we * We also can't kick the queues from irq context, so we
@ -723,7 +723,7 @@ static bool scsi_end_request(struct request *req, int error,
*/ */
scsi_mq_uninit_cmd(cmd); scsi_mq_uninit_cmd(cmd);
__blk_mq_end_io(req, error); __blk_mq_end_request(req, error);
if (scsi_target(sdev)->single_lun || if (scsi_target(sdev)->single_lun ||
!list_empty(&sdev->host->starved_list)) !list_empty(&sdev->host->starved_list))
@ -1847,6 +1847,8 @@ static int scsi_mq_prep_fn(struct request *req)
next_rq->special = bidi_sdb; next_rq->special = bidi_sdb;
} }
blk_mq_start_request(req);
return scsi_setup_cmnd(sdev, req); return scsi_setup_cmnd(sdev, req);
} }
@ -1856,7 +1858,8 @@ static void scsi_mq_done(struct scsi_cmnd *cmd)
blk_mq_complete_request(cmd->request); blk_mq_complete_request(cmd->request);
} }
static int scsi_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req) static int scsi_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req,
bool last)
{ {
struct request_queue *q = req->q; struct request_queue *q = req->q;
struct scsi_device *sdev = q->queuedata; struct scsi_device *sdev = q->queuedata;
@ -1880,11 +1883,14 @@ static int scsi_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
if (!scsi_host_queue_ready(q, shost, sdev)) if (!scsi_host_queue_ready(q, shost, sdev))
goto out_dec_target_busy; goto out_dec_target_busy;
if (!(req->cmd_flags & REQ_DONTPREP)) { if (!(req->cmd_flags & REQ_DONTPREP)) {
ret = prep_to_mq(scsi_mq_prep_fn(req)); ret = prep_to_mq(scsi_mq_prep_fn(req));
if (ret) if (ret)
goto out_dec_host_busy; goto out_dec_host_busy;
req->cmd_flags |= REQ_DONTPREP; req->cmd_flags |= REQ_DONTPREP;
} else {
blk_mq_start_request(req);
} }
scsi_init_cmd_errh(cmd); scsi_init_cmd_errh(cmd);
@ -1931,6 +1937,14 @@ out:
return ret; return ret;
} }
static enum blk_eh_timer_return scsi_timeout(struct request *req,
bool reserved)
{
if (reserved)
return BLK_EH_RESET_TIMER;
return scsi_times_out(req);
}
static int scsi_init_request(void *data, struct request *rq, static int scsi_init_request(void *data, struct request *rq,
unsigned int hctx_idx, unsigned int request_idx, unsigned int hctx_idx, unsigned int request_idx,
unsigned int numa_node) unsigned int numa_node)
@ -2042,7 +2056,7 @@ static struct blk_mq_ops scsi_mq_ops = {
.map_queue = blk_mq_map_queue, .map_queue = blk_mq_map_queue,
.queue_rq = scsi_queue_rq, .queue_rq = scsi_queue_rq,
.complete = scsi_softirq_done, .complete = scsi_softirq_done,
.timeout = scsi_times_out, .timeout = scsi_timeout,
.init_request = scsi_init_request, .init_request = scsi_init_request,
.exit_request = scsi_exit_request, .exit_request = scsi_exit_request,
}; };

View file

@ -610,29 +610,44 @@ static void scsi_disk_put(struct scsi_disk *sdkp)
mutex_unlock(&sd_ref_mutex); mutex_unlock(&sd_ref_mutex);
} }
static void sd_prot_op(struct scsi_cmnd *scmd, unsigned int dif)
{
unsigned int prot_op = SCSI_PROT_NORMAL;
unsigned int dix = scsi_prot_sg_count(scmd);
if (scmd->sc_data_direction == DMA_FROM_DEVICE) {
if (dif && dix) static unsigned char sd_setup_protect_cmnd(struct scsi_cmnd *scmd,
prot_op = SCSI_PROT_READ_PASS; unsigned int dix, unsigned int dif)
else if (dif && !dix) {
prot_op = SCSI_PROT_READ_STRIP; struct bio *bio = scmd->request->bio;
else if (!dif && dix) unsigned int prot_op = sd_prot_op(rq_data_dir(scmd->request), dix, dif);
prot_op = SCSI_PROT_READ_INSERT; unsigned int protect = 0;
} else {
if (dif && dix) if (dix) { /* DIX Type 0, 1, 2, 3 */
prot_op = SCSI_PROT_WRITE_PASS; if (bio_integrity_flagged(bio, BIP_IP_CHECKSUM))
else if (dif && !dix) scmd->prot_flags |= SCSI_PROT_IP_CHECKSUM;
prot_op = SCSI_PROT_WRITE_INSERT;
else if (!dif && dix) if (bio_integrity_flagged(bio, BIP_CTRL_NOCHECK) == false)
prot_op = SCSI_PROT_WRITE_STRIP; scmd->prot_flags |= SCSI_PROT_GUARD_CHECK;
}
if (dif != SD_DIF_TYPE3_PROTECTION) { /* DIX/DIF Type 0, 1, 2 */
scmd->prot_flags |= SCSI_PROT_REF_INCREMENT;
if (bio_integrity_flagged(bio, BIP_CTRL_NOCHECK) == false)
scmd->prot_flags |= SCSI_PROT_REF_CHECK;
}
if (dif) { /* DIX/DIF Type 1, 2, 3 */
scmd->prot_flags |= SCSI_PROT_TRANSFER_PI;
if (bio_integrity_flagged(bio, BIP_DISK_NOCHECK))
protect = 3 << 5; /* Disable target PI checking */
else
protect = 1 << 5; /* Enable target PI checking */
} }
scsi_set_prot_op(scmd, prot_op); scsi_set_prot_op(scmd, prot_op);
scsi_set_prot_type(scmd, dif); scsi_set_prot_type(scmd, dif);
scmd->prot_flags &= sd_prot_flag_mask(prot_op);
return protect;
} }
static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode) static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
@ -893,7 +908,8 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
sector_t block = blk_rq_pos(rq); sector_t block = blk_rq_pos(rq);
sector_t threshold; sector_t threshold;
unsigned int this_count = blk_rq_sectors(rq); unsigned int this_count = blk_rq_sectors(rq);
int ret, host_dif; unsigned int dif, dix;
int ret;
unsigned char protect; unsigned char protect;
ret = scsi_init_io(SCpnt, GFP_ATOMIC); ret = scsi_init_io(SCpnt, GFP_ATOMIC);
@ -995,7 +1011,7 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
SCpnt->cmnd[0] = WRITE_6; SCpnt->cmnd[0] = WRITE_6;
if (blk_integrity_rq(rq)) if (blk_integrity_rq(rq))
sd_dif_prepare(rq, block, sdp->sector_size); sd_dif_prepare(SCpnt);
} else if (rq_data_dir(rq) == READ) { } else if (rq_data_dir(rq) == READ) {
SCpnt->cmnd[0] = READ_6; SCpnt->cmnd[0] = READ_6;
@ -1010,14 +1026,15 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
"writing" : "reading", this_count, "writing" : "reading", this_count,
blk_rq_sectors(rq))); blk_rq_sectors(rq)));
/* Set RDPROTECT/WRPROTECT if disk is formatted with DIF */ dix = scsi_prot_sg_count(SCpnt);
host_dif = scsi_host_dif_capable(sdp->host, sdkp->protection_type); dif = scsi_host_dif_capable(SCpnt->device->host, sdkp->protection_type);
if (host_dif)
protect = 1 << 5; if (dif || dix)
protect = sd_setup_protect_cmnd(SCpnt, dix, dif);
else else
protect = 0; protect = 0;
if (host_dif == SD_DIF_TYPE2_PROTECTION) { if (protect && sdkp->protection_type == SD_DIF_TYPE2_PROTECTION) {
SCpnt->cmnd = mempool_alloc(sd_cdb_pool, GFP_ATOMIC); SCpnt->cmnd = mempool_alloc(sd_cdb_pool, GFP_ATOMIC);
if (unlikely(SCpnt->cmnd == NULL)) { if (unlikely(SCpnt->cmnd == NULL)) {
@ -1102,10 +1119,6 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
} }
SCpnt->sdb.length = this_count * sdp->sector_size; SCpnt->sdb.length = this_count * sdp->sector_size;
/* If DIF or DIX is enabled, tell HBA how to handle request */
if (host_dif || scsi_prot_sg_count(SCpnt))
sd_prot_op(SCpnt, host_dif);
/* /*
* We shouldn't disconnect in the middle of a sector, so with a dumb * We shouldn't disconnect in the middle of a sector, so with a dumb
* host adapter, it's safe to assume that we can at least transfer * host adapter, it's safe to assume that we can at least transfer

View file

@ -166,6 +166,68 @@ enum sd_dif_target_protection_types {
SD_DIF_TYPE3_PROTECTION = 0x3, SD_DIF_TYPE3_PROTECTION = 0x3,
}; };
/*
* Look up the DIX operation based on whether the command is read or
* write and whether dix and dif are enabled.
*/
static inline unsigned int sd_prot_op(bool write, bool dix, bool dif)
{
/* Lookup table: bit 2 (write), bit 1 (dix), bit 0 (dif) */
const unsigned int ops[] = { /* wrt dix dif */
SCSI_PROT_NORMAL, /* 0 0 0 */
SCSI_PROT_READ_STRIP, /* 0 0 1 */
SCSI_PROT_READ_INSERT, /* 0 1 0 */
SCSI_PROT_READ_PASS, /* 0 1 1 */
SCSI_PROT_NORMAL, /* 1 0 0 */
SCSI_PROT_WRITE_INSERT, /* 1 0 1 */
SCSI_PROT_WRITE_STRIP, /* 1 1 0 */
SCSI_PROT_WRITE_PASS, /* 1 1 1 */
};
return ops[write << 2 | dix << 1 | dif];
}
/*
* Returns a mask of the protection flags that are valid for a given DIX
* operation.
*/
static inline unsigned int sd_prot_flag_mask(unsigned int prot_op)
{
const unsigned int flag_mask[] = {
[SCSI_PROT_NORMAL] = 0,
[SCSI_PROT_READ_STRIP] = SCSI_PROT_TRANSFER_PI |
SCSI_PROT_GUARD_CHECK |
SCSI_PROT_REF_CHECK |
SCSI_PROT_REF_INCREMENT,
[SCSI_PROT_READ_INSERT] = SCSI_PROT_REF_INCREMENT |
SCSI_PROT_IP_CHECKSUM,
[SCSI_PROT_READ_PASS] = SCSI_PROT_TRANSFER_PI |
SCSI_PROT_GUARD_CHECK |
SCSI_PROT_REF_CHECK |
SCSI_PROT_REF_INCREMENT |
SCSI_PROT_IP_CHECKSUM,
[SCSI_PROT_WRITE_INSERT] = SCSI_PROT_TRANSFER_PI |
SCSI_PROT_REF_INCREMENT,
[SCSI_PROT_WRITE_STRIP] = SCSI_PROT_GUARD_CHECK |
SCSI_PROT_REF_CHECK |
SCSI_PROT_REF_INCREMENT |
SCSI_PROT_IP_CHECKSUM,
[SCSI_PROT_WRITE_PASS] = SCSI_PROT_TRANSFER_PI |
SCSI_PROT_GUARD_CHECK |
SCSI_PROT_REF_CHECK |
SCSI_PROT_REF_INCREMENT |
SCSI_PROT_IP_CHECKSUM,
};
return flag_mask[prot_op];
}
/* /*
* Data Integrity Field tuple. * Data Integrity Field tuple.
*/ */
@ -178,7 +240,7 @@ struct sd_dif_tuple {
#ifdef CONFIG_BLK_DEV_INTEGRITY #ifdef CONFIG_BLK_DEV_INTEGRITY
extern void sd_dif_config_host(struct scsi_disk *); extern void sd_dif_config_host(struct scsi_disk *);
extern void sd_dif_prepare(struct request *rq, sector_t, unsigned int); extern void sd_dif_prepare(struct scsi_cmnd *scmd);
extern void sd_dif_complete(struct scsi_cmnd *, unsigned int); extern void sd_dif_complete(struct scsi_cmnd *, unsigned int);
#else /* CONFIG_BLK_DEV_INTEGRITY */ #else /* CONFIG_BLK_DEV_INTEGRITY */
@ -186,7 +248,7 @@ extern void sd_dif_complete(struct scsi_cmnd *, unsigned int);
static inline void sd_dif_config_host(struct scsi_disk *disk) static inline void sd_dif_config_host(struct scsi_disk *disk)
{ {
} }
static inline int sd_dif_prepare(struct request *rq, sector_t s, unsigned int a) static inline int sd_dif_prepare(struct scsi_cmnd *scmd)
{ {
return 0; return 0;
} }

View file

@ -21,7 +21,7 @@
*/ */
#include <linux/blkdev.h> #include <linux/blkdev.h>
#include <linux/crc-t10dif.h> #include <linux/t10-pi.h>
#include <scsi/scsi.h> #include <scsi/scsi.h>
#include <scsi/scsi_cmnd.h> #include <scsi/scsi_cmnd.h>
@ -33,268 +33,8 @@
#include <scsi/scsi_ioctl.h> #include <scsi/scsi_ioctl.h>
#include <scsi/scsicam.h> #include <scsi/scsicam.h>
#include <net/checksum.h>
#include "sd.h" #include "sd.h"
typedef __u16 (csum_fn) (void *, unsigned int);
static __u16 sd_dif_crc_fn(void *data, unsigned int len)
{
return cpu_to_be16(crc_t10dif(data, len));
}
static __u16 sd_dif_ip_fn(void *data, unsigned int len)
{
return ip_compute_csum(data, len);
}
/*
* Type 1 and Type 2 protection use the same format: 16 bit guard tag,
* 16 bit app tag, 32 bit reference tag.
*/
static void sd_dif_type1_generate(struct blk_integrity_exchg *bix, csum_fn *fn)
{
void *buf = bix->data_buf;
struct sd_dif_tuple *sdt = bix->prot_buf;
sector_t sector = bix->sector;
unsigned int i;
for (i = 0 ; i < bix->data_size ; i += bix->sector_size, sdt++) {
sdt->guard_tag = fn(buf, bix->sector_size);
sdt->ref_tag = cpu_to_be32(sector & 0xffffffff);
sdt->app_tag = 0;
buf += bix->sector_size;
sector++;
}
}
static void sd_dif_type1_generate_crc(struct blk_integrity_exchg *bix)
{
sd_dif_type1_generate(bix, sd_dif_crc_fn);
}
static void sd_dif_type1_generate_ip(struct blk_integrity_exchg *bix)
{
sd_dif_type1_generate(bix, sd_dif_ip_fn);
}
static int sd_dif_type1_verify(struct blk_integrity_exchg *bix, csum_fn *fn)
{
void *buf = bix->data_buf;
struct sd_dif_tuple *sdt = bix->prot_buf;
sector_t sector = bix->sector;
unsigned int i;
__u16 csum;
for (i = 0 ; i < bix->data_size ; i += bix->sector_size, sdt++) {
/* Unwritten sectors */
if (sdt->app_tag == 0xffff)
return 0;
if (be32_to_cpu(sdt->ref_tag) != (sector & 0xffffffff)) {
printk(KERN_ERR
"%s: ref tag error on sector %lu (rcvd %u)\n",
bix->disk_name, (unsigned long)sector,
be32_to_cpu(sdt->ref_tag));
return -EIO;
}
csum = fn(buf, bix->sector_size);
if (sdt->guard_tag != csum) {
printk(KERN_ERR "%s: guard tag error on sector %lu " \
"(rcvd %04x, data %04x)\n", bix->disk_name,
(unsigned long)sector,
be16_to_cpu(sdt->guard_tag), be16_to_cpu(csum));
return -EIO;
}
buf += bix->sector_size;
sector++;
}
return 0;
}
static int sd_dif_type1_verify_crc(struct blk_integrity_exchg *bix)
{
return sd_dif_type1_verify(bix, sd_dif_crc_fn);
}
static int sd_dif_type1_verify_ip(struct blk_integrity_exchg *bix)
{
return sd_dif_type1_verify(bix, sd_dif_ip_fn);
}
/*
* Functions for interleaving and deinterleaving application tags
*/
static void sd_dif_type1_set_tag(void *prot, void *tag_buf, unsigned int sectors)
{
struct sd_dif_tuple *sdt = prot;
u8 *tag = tag_buf;
unsigned int i, j;
for (i = 0, j = 0 ; i < sectors ; i++, j += 2, sdt++) {
sdt->app_tag = tag[j] << 8 | tag[j+1];
BUG_ON(sdt->app_tag == 0xffff);
}
}
static void sd_dif_type1_get_tag(void *prot, void *tag_buf, unsigned int sectors)
{
struct sd_dif_tuple *sdt = prot;
u8 *tag = tag_buf;
unsigned int i, j;
for (i = 0, j = 0 ; i < sectors ; i++, j += 2, sdt++) {
tag[j] = (sdt->app_tag & 0xff00) >> 8;
tag[j+1] = sdt->app_tag & 0xff;
}
}
static struct blk_integrity dif_type1_integrity_crc = {
.name = "T10-DIF-TYPE1-CRC",
.generate_fn = sd_dif_type1_generate_crc,
.verify_fn = sd_dif_type1_verify_crc,
.get_tag_fn = sd_dif_type1_get_tag,
.set_tag_fn = sd_dif_type1_set_tag,
.tuple_size = sizeof(struct sd_dif_tuple),
.tag_size = 0,
};
static struct blk_integrity dif_type1_integrity_ip = {
.name = "T10-DIF-TYPE1-IP",
.generate_fn = sd_dif_type1_generate_ip,
.verify_fn = sd_dif_type1_verify_ip,
.get_tag_fn = sd_dif_type1_get_tag,
.set_tag_fn = sd_dif_type1_set_tag,
.tuple_size = sizeof(struct sd_dif_tuple),
.tag_size = 0,
};
/*
* Type 3 protection has a 16-bit guard tag and 16 + 32 bits of opaque
* tag space.
*/
static void sd_dif_type3_generate(struct blk_integrity_exchg *bix, csum_fn *fn)
{
void *buf = bix->data_buf;
struct sd_dif_tuple *sdt = bix->prot_buf;
unsigned int i;
for (i = 0 ; i < bix->data_size ; i += bix->sector_size, sdt++) {
sdt->guard_tag = fn(buf, bix->sector_size);
sdt->ref_tag = 0;
sdt->app_tag = 0;
buf += bix->sector_size;
}
}
static void sd_dif_type3_generate_crc(struct blk_integrity_exchg *bix)
{
sd_dif_type3_generate(bix, sd_dif_crc_fn);
}
static void sd_dif_type3_generate_ip(struct blk_integrity_exchg *bix)
{
sd_dif_type3_generate(bix, sd_dif_ip_fn);
}
static int sd_dif_type3_verify(struct blk_integrity_exchg *bix, csum_fn *fn)
{
void *buf = bix->data_buf;
struct sd_dif_tuple *sdt = bix->prot_buf;
sector_t sector = bix->sector;
unsigned int i;
__u16 csum;
for (i = 0 ; i < bix->data_size ; i += bix->sector_size, sdt++) {
/* Unwritten sectors */
if (sdt->app_tag == 0xffff && sdt->ref_tag == 0xffffffff)
return 0;
csum = fn(buf, bix->sector_size);
if (sdt->guard_tag != csum) {
printk(KERN_ERR "%s: guard tag error on sector %lu " \
"(rcvd %04x, data %04x)\n", bix->disk_name,
(unsigned long)sector,
be16_to_cpu(sdt->guard_tag), be16_to_cpu(csum));
return -EIO;
}
buf += bix->sector_size;
sector++;
}
return 0;
}
static int sd_dif_type3_verify_crc(struct blk_integrity_exchg *bix)
{
return sd_dif_type3_verify(bix, sd_dif_crc_fn);
}
static int sd_dif_type3_verify_ip(struct blk_integrity_exchg *bix)
{
return sd_dif_type3_verify(bix, sd_dif_ip_fn);
}
static void sd_dif_type3_set_tag(void *prot, void *tag_buf, unsigned int sectors)
{
struct sd_dif_tuple *sdt = prot;
u8 *tag = tag_buf;
unsigned int i, j;
for (i = 0, j = 0 ; i < sectors ; i++, j += 6, sdt++) {
sdt->app_tag = tag[j] << 8 | tag[j+1];
sdt->ref_tag = tag[j+2] << 24 | tag[j+3] << 16 |
tag[j+4] << 8 | tag[j+5];
}
}
static void sd_dif_type3_get_tag(void *prot, void *tag_buf, unsigned int sectors)
{
struct sd_dif_tuple *sdt = prot;
u8 *tag = tag_buf;
unsigned int i, j;
for (i = 0, j = 0 ; i < sectors ; i++, j += 2, sdt++) {
tag[j] = (sdt->app_tag & 0xff00) >> 8;
tag[j+1] = sdt->app_tag & 0xff;
tag[j+2] = (sdt->ref_tag & 0xff000000) >> 24;
tag[j+3] = (sdt->ref_tag & 0xff0000) >> 16;
tag[j+4] = (sdt->ref_tag & 0xff00) >> 8;
tag[j+5] = sdt->ref_tag & 0xff;
BUG_ON(sdt->app_tag == 0xffff || sdt->ref_tag == 0xffffffff);
}
}
static struct blk_integrity dif_type3_integrity_crc = {
.name = "T10-DIF-TYPE3-CRC",
.generate_fn = sd_dif_type3_generate_crc,
.verify_fn = sd_dif_type3_verify_crc,
.get_tag_fn = sd_dif_type3_get_tag,
.set_tag_fn = sd_dif_type3_set_tag,
.tuple_size = sizeof(struct sd_dif_tuple),
.tag_size = 0,
};
static struct blk_integrity dif_type3_integrity_ip = {
.name = "T10-DIF-TYPE3-IP",
.generate_fn = sd_dif_type3_generate_ip,
.verify_fn = sd_dif_type3_verify_ip,
.get_tag_fn = sd_dif_type3_get_tag,
.set_tag_fn = sd_dif_type3_set_tag,
.tuple_size = sizeof(struct sd_dif_tuple),
.tag_size = 0,
};
/* /*
* Configure exchange of protection information between OS and HBA. * Configure exchange of protection information between OS and HBA.
*/ */
@ -316,22 +56,30 @@ void sd_dif_config_host(struct scsi_disk *sdkp)
return; return;
/* Enable DMA of protection information */ /* Enable DMA of protection information */
if (scsi_host_get_guard(sdkp->device->host) & SHOST_DIX_GUARD_IP) if (scsi_host_get_guard(sdkp->device->host) & SHOST_DIX_GUARD_IP) {
if (type == SD_DIF_TYPE3_PROTECTION) if (type == SD_DIF_TYPE3_PROTECTION)
blk_integrity_register(disk, &dif_type3_integrity_ip); blk_integrity_register(disk, &t10_pi_type3_ip);
else else
blk_integrity_register(disk, &dif_type1_integrity_ip); blk_integrity_register(disk, &t10_pi_type1_ip);
else
disk->integrity->flags |= BLK_INTEGRITY_IP_CHECKSUM;
} else
if (type == SD_DIF_TYPE3_PROTECTION) if (type == SD_DIF_TYPE3_PROTECTION)
blk_integrity_register(disk, &dif_type3_integrity_crc); blk_integrity_register(disk, &t10_pi_type3_crc);
else else
blk_integrity_register(disk, &dif_type1_integrity_crc); blk_integrity_register(disk, &t10_pi_type1_crc);
sd_printk(KERN_NOTICE, sdkp, sd_printk(KERN_NOTICE, sdkp,
"Enabling DIX %s protection\n", disk->integrity->name); "Enabling DIX %s protection\n", disk->integrity->name);
/* Signal to block layer that we support sector tagging */ /* Signal to block layer that we support sector tagging */
if (dif && type && sdkp->ATO) { if (dif && type) {
disk->integrity->flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
if (!sdkp)
return;
if (type == SD_DIF_TYPE3_PROTECTION) if (type == SD_DIF_TYPE3_PROTECTION)
disk->integrity->tag_size = sizeof(u16) + sizeof(u32); disk->integrity->tag_size = sizeof(u16) + sizeof(u32);
else else
@ -358,50 +106,49 @@ void sd_dif_config_host(struct scsi_disk *sdkp)
* *
* Type 3 does not have a reference tag so no remapping is required. * Type 3 does not have a reference tag so no remapping is required.
*/ */
void sd_dif_prepare(struct request *rq, sector_t hw_sector, void sd_dif_prepare(struct scsi_cmnd *scmd)
unsigned int sector_sz)
{ {
const int tuple_sz = sizeof(struct sd_dif_tuple); const int tuple_sz = sizeof(struct t10_pi_tuple);
struct bio *bio; struct bio *bio;
struct scsi_disk *sdkp; struct scsi_disk *sdkp;
struct sd_dif_tuple *sdt; struct t10_pi_tuple *pi;
u32 phys, virt; u32 phys, virt;
sdkp = rq->bio->bi_bdev->bd_disk->private_data; sdkp = scsi_disk(scmd->request->rq_disk);
if (sdkp->protection_type == SD_DIF_TYPE3_PROTECTION) if (sdkp->protection_type == SD_DIF_TYPE3_PROTECTION)
return; return;
phys = hw_sector & 0xffffffff; phys = scsi_prot_ref_tag(scmd);
__rq_for_each_bio(bio, rq) { __rq_for_each_bio(bio, scmd->request) {
struct bio_integrity_payload *bip = bio_integrity(bio);
struct bio_vec iv; struct bio_vec iv;
struct bvec_iter iter; struct bvec_iter iter;
unsigned int j; unsigned int j;
/* Already remapped? */ /* Already remapped? */
if (bio_flagged(bio, BIO_MAPPED_INTEGRITY)) if (bip->bip_flags & BIP_MAPPED_INTEGRITY)
break; break;
virt = bio->bi_integrity->bip_iter.bi_sector & 0xffffffff; virt = bip_get_seed(bip) & 0xffffffff;
bip_for_each_vec(iv, bio->bi_integrity, iter) { bip_for_each_vec(iv, bip, iter) {
sdt = kmap_atomic(iv.bv_page) pi = kmap_atomic(iv.bv_page) + iv.bv_offset;
+ iv.bv_offset;
for (j = 0; j < iv.bv_len; j += tuple_sz, sdt++) { for (j = 0; j < iv.bv_len; j += tuple_sz, pi++) {
if (be32_to_cpu(sdt->ref_tag) == virt) if (be32_to_cpu(pi->ref_tag) == virt)
sdt->ref_tag = cpu_to_be32(phys); pi->ref_tag = cpu_to_be32(phys);
virt++; virt++;
phys++; phys++;
} }
kunmap_atomic(sdt); kunmap_atomic(pi);
} }
bio->bi_flags |= (1 << BIO_MAPPED_INTEGRITY); bip->bip_flags |= BIP_MAPPED_INTEGRITY;
} }
} }
@ -411,11 +158,11 @@ void sd_dif_prepare(struct request *rq, sector_t hw_sector,
*/ */
void sd_dif_complete(struct scsi_cmnd *scmd, unsigned int good_bytes) void sd_dif_complete(struct scsi_cmnd *scmd, unsigned int good_bytes)
{ {
const int tuple_sz = sizeof(struct sd_dif_tuple); const int tuple_sz = sizeof(struct t10_pi_tuple);
struct scsi_disk *sdkp; struct scsi_disk *sdkp;
struct bio *bio; struct bio *bio;
struct sd_dif_tuple *sdt; struct t10_pi_tuple *pi;
unsigned int j, sectors, sector_sz; unsigned int j, intervals;
u32 phys, virt; u32 phys, virt;
sdkp = scsi_disk(scmd->request->rq_disk); sdkp = scsi_disk(scmd->request->rq_disk);
@ -423,39 +170,35 @@ void sd_dif_complete(struct scsi_cmnd *scmd, unsigned int good_bytes)
if (sdkp->protection_type == SD_DIF_TYPE3_PROTECTION || good_bytes == 0) if (sdkp->protection_type == SD_DIF_TYPE3_PROTECTION || good_bytes == 0)
return; return;
sector_sz = scmd->device->sector_size; intervals = good_bytes / scsi_prot_interval(scmd);
sectors = good_bytes / sector_sz; phys = scsi_prot_ref_tag(scmd);
phys = blk_rq_pos(scmd->request) & 0xffffffff;
if (sector_sz == 4096)
phys >>= 3;
__rq_for_each_bio(bio, scmd->request) { __rq_for_each_bio(bio, scmd->request) {
struct bio_integrity_payload *bip = bio_integrity(bio);
struct bio_vec iv; struct bio_vec iv;
struct bvec_iter iter; struct bvec_iter iter;
virt = bio->bi_integrity->bip_iter.bi_sector & 0xffffffff; virt = bip_get_seed(bip) & 0xffffffff;
bip_for_each_vec(iv, bio->bi_integrity, iter) { bip_for_each_vec(iv, bip, iter) {
sdt = kmap_atomic(iv.bv_page) pi = kmap_atomic(iv.bv_page) + iv.bv_offset;
+ iv.bv_offset;
for (j = 0; j < iv.bv_len; j += tuple_sz, sdt++) { for (j = 0; j < iv.bv_len; j += tuple_sz, pi++) {
if (sectors == 0) { if (intervals == 0) {
kunmap_atomic(sdt); kunmap_atomic(pi);
return; return;
} }
if (be32_to_cpu(sdt->ref_tag) == phys) if (be32_to_cpu(pi->ref_tag) == phys)
sdt->ref_tag = cpu_to_be32(virt); pi->ref_tag = cpu_to_be32(virt);
virt++; virt++;
phys++; phys++;
sectors--; intervals--;
} }
kunmap_atomic(sdt); kunmap_atomic(pi);
} }
} }
} }

View file

@ -1711,9 +1711,9 @@ sg_start_req(Sg_request *srp, unsigned char *cmd)
} }
rq = blk_get_request(q, rw, GFP_ATOMIC); rq = blk_get_request(q, rw, GFP_ATOMIC);
if (!rq) { if (IS_ERR(rq)) {
kfree(long_cmdp); kfree(long_cmdp);
return -ENOMEM; return PTR_ERR(rq);
} }
blk_rq_set_block_pc(rq); blk_rq_set_block_pc(rq);

View file

@ -490,7 +490,7 @@ static int st_scsi_execute(struct st_request *SRpnt, const unsigned char *cmd,
req = blk_get_request(SRpnt->stp->device->request_queue, write, req = blk_get_request(SRpnt->stp->device->request_queue, write,
GFP_KERNEL); GFP_KERNEL);
if (!req) if (IS_ERR(req))
return DRIVER_ERROR << 24; return DRIVER_ERROR << 24;
blk_rq_set_block_pc(req); blk_rq_set_block_pc(req);

View file

@ -1050,7 +1050,7 @@ pscsi_execute_cmd(struct se_cmd *cmd)
req = blk_get_request(pdv->pdv_sd->request_queue, req = blk_get_request(pdv->pdv_sd->request_queue,
(data_direction == DMA_TO_DEVICE), (data_direction == DMA_TO_DEVICE),
GFP_KERNEL); GFP_KERNEL);
if (!req) { if (IS_ERR(req)) {
pr_err("PSCSI: blk_get_request() failed\n"); pr_err("PSCSI: blk_get_request() failed\n");
ret = TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE; ret = TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
goto fail; goto fail;

View file

@ -50,32 +50,22 @@ inline struct block_device *I_BDEV(struct inode *inode)
EXPORT_SYMBOL(I_BDEV); EXPORT_SYMBOL(I_BDEV);
/* /*
* Move the inode from its current bdi to a new bdi. If the inode is dirty we * Move the inode from its current bdi to a new bdi. Make sure the inode
* need to move it onto the dirty list of @dst so that the inode is always on * is clean before moving so that it doesn't linger on the old bdi.
* the right list.
*/ */
static void bdev_inode_switch_bdi(struct inode *inode, static void bdev_inode_switch_bdi(struct inode *inode,
struct backing_dev_info *dst) struct backing_dev_info *dst)
{ {
struct backing_dev_info *old = inode->i_data.backing_dev_info; while (true) {
bool wakeup_bdi = false; spin_lock(&inode->i_lock);
if (!(inode->i_state & I_DIRTY)) {
if (unlikely(dst == old)) /* deadlock avoidance */ inode->i_data.backing_dev_info = dst;
return; spin_unlock(&inode->i_lock);
bdi_lock_two(&old->wb, &dst->wb); return;
spin_lock(&inode->i_lock); }
inode->i_data.backing_dev_info = dst; spin_unlock(&inode->i_lock);
if (inode->i_state & I_DIRTY) { WARN_ON_ONCE(write_inode_now(inode, true));
if (bdi_cap_writeback_dirty(dst) && !wb_has_dirty_io(&dst->wb))
wakeup_bdi = true;
list_move(&inode->i_wb_list, &dst->wb.b_dirty);
} }
spin_unlock(&inode->i_lock);
spin_unlock(&old->wb.list_lock);
spin_unlock(&dst->wb.list_lock);
if (wakeup_bdi)
bdi_wakeup_thread_delayed(dst);
} }
/* Kill _all_ buffers and pagecache , dirty or not.. */ /* Kill _all_ buffers and pagecache , dirty or not.. */
@ -1179,8 +1169,6 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
if (!ret) { if (!ret) {
bd_set_size(bdev,(loff_t)get_capacity(disk)<<9); bd_set_size(bdev,(loff_t)get_capacity(disk)<<9);
bdi = blk_get_backing_dev_info(bdev); bdi = blk_get_backing_dev_info(bdev);
if (bdi == NULL)
bdi = &default_backing_dev_info;
bdev_inode_switch_bdi(bdev->bd_inode, bdi); bdev_inode_switch_bdi(bdev->bd_inode, bdi);
} }

View file

@ -1702,7 +1702,7 @@ static int btrfs_congested_fn(void *congested_data, int bdi_bits)
if (!device->bdev) if (!device->bdev)
continue; continue;
bdi = blk_get_backing_dev_info(device->bdev); bdi = blk_get_backing_dev_info(device->bdev);
if (bdi && bdi_congested(bdi, bdi_bits)) { if (bdi_congested(bdi, bdi_bits)) {
ret = 1; ret = 1;
break; break;
} }

View file

@ -220,11 +220,9 @@ ssize_t nfs_direct_IO(int rw, struct kiocb *iocb, struct iov_iter *iter, loff_t
#else #else
VM_BUG_ON(iocb->ki_nbytes != PAGE_SIZE); VM_BUG_ON(iocb->ki_nbytes != PAGE_SIZE);
if (rw == READ || rw == KERNEL_READ) if (rw == READ)
return nfs_file_direct_read(iocb, iter, pos, return nfs_file_direct_read(iocb, iter, pos);
rw == READ ? true : false); return nfs_file_direct_write(iocb, iter, pos);
return nfs_file_direct_write(iocb, iter, pos,
rw == WRITE ? true : false);
#endif /* CONFIG_NFS_SWAP */ #endif /* CONFIG_NFS_SWAP */
} }
@ -510,7 +508,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
* cache. * cache.
*/ */
ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter, ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter,
loff_t pos, bool uio) loff_t pos)
{ {
struct file *file = iocb->ki_filp; struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping; struct address_space *mapping = file->f_mapping;
@ -879,7 +877,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
* is no atomic O_APPEND write facility in the NFS protocol. * is no atomic O_APPEND write facility in the NFS protocol.
*/ */
ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter, ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter,
loff_t pos, bool uio) loff_t pos)
{ {
ssize_t result = -EINVAL; ssize_t result = -EINVAL;
struct file *file = iocb->ki_filp; struct file *file = iocb->ki_filp;

View file

@ -172,7 +172,7 @@ nfs_file_read(struct kiocb *iocb, struct iov_iter *to)
ssize_t result; ssize_t result;
if (iocb->ki_filp->f_flags & O_DIRECT) if (iocb->ki_filp->f_flags & O_DIRECT)
return nfs_file_direct_read(iocb, to, iocb->ki_pos, true); return nfs_file_direct_read(iocb, to, iocb->ki_pos);
dprintk("NFS: read(%pD2, %zu@%lu)\n", dprintk("NFS: read(%pD2, %zu@%lu)\n",
iocb->ki_filp, iocb->ki_filp,
@ -676,7 +676,7 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
return result; return result;
if (file->f_flags & O_DIRECT) if (file->f_flags & O_DIRECT)
return nfs_file_direct_write(iocb, from, pos, true); return nfs_file_direct_write(iocb, from, pos);
dprintk("NFS: write(%pD2, %zu@%Ld)\n", dprintk("NFS: write(%pD2, %zu@%Ld)\n",
file, count, (long long) pos); file, count, (long long) pos);

View file

@ -1670,8 +1670,6 @@ xfs_alloc_buftarg(
btp->bt_dev = bdev->bd_dev; btp->bt_dev = bdev->bd_dev;
btp->bt_bdev = bdev; btp->bt_bdev = bdev;
btp->bt_bdi = blk_get_backing_dev_info(bdev); btp->bt_bdi = blk_get_backing_dev_info(bdev);
if (!btp->bt_bdi)
goto error;
if (xfs_setsize_buftarg_early(btp, bdev)) if (xfs_setsize_buftarg_early(btp, bdev))
goto error; goto error;

View file

@ -28,12 +28,10 @@ struct dentry;
* Bits in backing_dev_info.state * Bits in backing_dev_info.state
*/ */
enum bdi_state { enum bdi_state {
BDI_wb_alloc, /* Default embedded wb allocated */
BDI_async_congested, /* The async (write) queue is getting full */ BDI_async_congested, /* The async (write) queue is getting full */
BDI_sync_congested, /* The sync queue is getting full */ BDI_sync_congested, /* The sync queue is getting full */
BDI_registered, /* bdi_register() was done */ BDI_registered, /* bdi_register() was done */
BDI_writeback_running, /* Writeback is in progress */ BDI_writeback_running, /* Writeback is in progress */
BDI_unused, /* Available bits start here */
}; };
typedef int (congested_fn)(void *, int); typedef int (congested_fn)(void *, int);
@ -50,7 +48,6 @@ enum bdi_stat_item {
struct bdi_writeback { struct bdi_writeback {
struct backing_dev_info *bdi; /* our parent bdi */ struct backing_dev_info *bdi; /* our parent bdi */
unsigned int nr;
unsigned long last_old_flush; /* last old data flush */ unsigned long last_old_flush; /* last old data flush */
@ -124,7 +121,6 @@ void bdi_start_background_writeback(struct backing_dev_info *bdi);
void bdi_writeback_workfn(struct work_struct *work); void bdi_writeback_workfn(struct work_struct *work);
int bdi_has_dirty_io(struct backing_dev_info *bdi); int bdi_has_dirty_io(struct backing_dev_info *bdi);
void bdi_wakeup_thread_delayed(struct backing_dev_info *bdi); void bdi_wakeup_thread_delayed(struct backing_dev_info *bdi);
void bdi_lock_two(struct bdi_writeback *wb1, struct bdi_writeback *wb2);
extern spinlock_t bdi_lock; extern spinlock_t bdi_lock;
extern struct list_head bdi_list; extern struct list_head bdi_list;

View file

@ -292,7 +292,24 @@ static inline unsigned bio_segments(struct bio *bio)
*/ */
#define bio_get(bio) atomic_inc(&(bio)->bi_cnt) #define bio_get(bio) atomic_inc(&(bio)->bi_cnt)
enum bip_flags {
BIP_BLOCK_INTEGRITY = 1 << 0, /* block layer owns integrity data */
BIP_MAPPED_INTEGRITY = 1 << 1, /* ref tag has been remapped */
BIP_CTRL_NOCHECK = 1 << 2, /* disable HBA integrity checking */
BIP_DISK_NOCHECK = 1 << 3, /* disable disk integrity checking */
BIP_IP_CHECKSUM = 1 << 4, /* IP checksum */
};
#if defined(CONFIG_BLK_DEV_INTEGRITY) #if defined(CONFIG_BLK_DEV_INTEGRITY)
static inline struct bio_integrity_payload *bio_integrity(struct bio *bio)
{
if (bio->bi_rw & REQ_INTEGRITY)
return bio->bi_integrity;
return NULL;
}
/* /*
* bio integrity payload * bio integrity payload
*/ */
@ -301,21 +318,40 @@ struct bio_integrity_payload {
struct bvec_iter bip_iter; struct bvec_iter bip_iter;
/* kill - should just use bip_vec */
void *bip_buf; /* generated integrity data */
bio_end_io_t *bip_end_io; /* saved I/O completion fn */ bio_end_io_t *bip_end_io; /* saved I/O completion fn */
unsigned short bip_slab; /* slab the bip came from */ unsigned short bip_slab; /* slab the bip came from */
unsigned short bip_vcnt; /* # of integrity bio_vecs */ unsigned short bip_vcnt; /* # of integrity bio_vecs */
unsigned short bip_max_vcnt; /* integrity bio_vec slots */ unsigned short bip_max_vcnt; /* integrity bio_vec slots */
unsigned bip_owns_buf:1; /* should free bip_buf */ unsigned short bip_flags; /* control flags */
struct work_struct bip_work; /* I/O completion */ struct work_struct bip_work; /* I/O completion */
struct bio_vec *bip_vec; struct bio_vec *bip_vec;
struct bio_vec bip_inline_vecs[0];/* embedded bvec array */ struct bio_vec bip_inline_vecs[0];/* embedded bvec array */
}; };
static inline bool bio_integrity_flagged(struct bio *bio, enum bip_flags flag)
{
struct bio_integrity_payload *bip = bio_integrity(bio);
if (bip)
return bip->bip_flags & flag;
return false;
}
static inline sector_t bip_get_seed(struct bio_integrity_payload *bip)
{
return bip->bip_iter.bi_sector;
}
static inline void bip_set_seed(struct bio_integrity_payload *bip,
sector_t seed)
{
bip->bip_iter.bi_sector = seed;
}
#endif /* CONFIG_BLK_DEV_INTEGRITY */ #endif /* CONFIG_BLK_DEV_INTEGRITY */
extern void bio_trim(struct bio *bio, int offset, int size); extern void bio_trim(struct bio *bio, int offset, int size);
@ -342,6 +378,7 @@ static inline struct bio *bio_next_split(struct bio *bio, int sectors,
} }
extern struct bio_set *bioset_create(unsigned int, unsigned int); extern struct bio_set *bioset_create(unsigned int, unsigned int);
extern struct bio_set *bioset_create_nobvec(unsigned int, unsigned int);
extern void bioset_free(struct bio_set *); extern void bioset_free(struct bio_set *);
extern mempool_t *biovec_create_pool(int pool_entries); extern mempool_t *biovec_create_pool(int pool_entries);
@ -353,7 +390,6 @@ extern struct bio *bio_clone_fast(struct bio *, gfp_t, struct bio_set *);
extern struct bio *bio_clone_bioset(struct bio *, gfp_t, struct bio_set *bs); extern struct bio *bio_clone_bioset(struct bio *, gfp_t, struct bio_set *bs);
extern struct bio_set *fs_bio_set; extern struct bio_set *fs_bio_set;
unsigned int bio_integrity_tag_size(struct bio *bio);
static inline struct bio *bio_alloc(gfp_t gfp_mask, unsigned int nr_iovecs) static inline struct bio *bio_alloc(gfp_t gfp_mask, unsigned int nr_iovecs)
{ {
@ -661,14 +697,10 @@ struct biovec_slab {
for_each_bio(_bio) \ for_each_bio(_bio) \
bip_for_each_vec(_bvl, _bio->bi_integrity, _iter) bip_for_each_vec(_bvl, _bio->bi_integrity, _iter)
#define bio_integrity(bio) (bio->bi_integrity != NULL)
extern struct bio_integrity_payload *bio_integrity_alloc(struct bio *, gfp_t, unsigned int); extern struct bio_integrity_payload *bio_integrity_alloc(struct bio *, gfp_t, unsigned int);
extern void bio_integrity_free(struct bio *); extern void bio_integrity_free(struct bio *);
extern int bio_integrity_add_page(struct bio *, struct page *, unsigned int, unsigned int); extern int bio_integrity_add_page(struct bio *, struct page *, unsigned int, unsigned int);
extern int bio_integrity_enabled(struct bio *bio); extern bool bio_integrity_enabled(struct bio *bio);
extern int bio_integrity_set_tag(struct bio *, void *, unsigned int);
extern int bio_integrity_get_tag(struct bio *, void *, unsigned int);
extern int bio_integrity_prep(struct bio *); extern int bio_integrity_prep(struct bio *);
extern void bio_integrity_endio(struct bio *, int); extern void bio_integrity_endio(struct bio *, int);
extern void bio_integrity_advance(struct bio *, unsigned int); extern void bio_integrity_advance(struct bio *, unsigned int);
@ -680,14 +712,14 @@ extern void bio_integrity_init(void);
#else /* CONFIG_BLK_DEV_INTEGRITY */ #else /* CONFIG_BLK_DEV_INTEGRITY */
static inline int bio_integrity(struct bio *bio) static inline void *bio_integrity(struct bio *bio)
{ {
return 0; return NULL;
} }
static inline int bio_integrity_enabled(struct bio *bio) static inline bool bio_integrity_enabled(struct bio *bio)
{ {
return 0; return false;
} }
static inline int bioset_integrity_create(struct bio_set *bs, int pool_size) static inline int bioset_integrity_create(struct bio_set *bs, int pool_size)
@ -733,6 +765,11 @@ static inline void bio_integrity_init(void)
return; return;
} }
static inline bool bio_integrity_flagged(struct bio *bio, enum bip_flags flag)
{
return false;
}
#endif /* CONFIG_BLK_DEV_INTEGRITY */ #endif /* CONFIG_BLK_DEV_INTEGRITY */
#endif /* CONFIG_BLOCK */ #endif /* CONFIG_BLOCK */

View file

@ -4,6 +4,7 @@
#include <linux/blkdev.h> #include <linux/blkdev.h>
struct blk_mq_tags; struct blk_mq_tags;
struct blk_flush_queue;
struct blk_mq_cpu_notifier { struct blk_mq_cpu_notifier {
struct list_head list; struct list_head list;
@ -34,6 +35,7 @@ struct blk_mq_hw_ctx {
struct request_queue *queue; struct request_queue *queue;
unsigned int queue_num; unsigned int queue_num;
struct blk_flush_queue *fq;
void *driver_data; void *driver_data;
@ -77,8 +79,9 @@ struct blk_mq_tag_set {
struct list_head tag_list; struct list_head tag_list;
}; };
typedef int (queue_rq_fn)(struct blk_mq_hw_ctx *, struct request *); typedef int (queue_rq_fn)(struct blk_mq_hw_ctx *, struct request *, bool);
typedef struct blk_mq_hw_ctx *(map_queue_fn)(struct request_queue *, const int); typedef struct blk_mq_hw_ctx *(map_queue_fn)(struct request_queue *, const int);
typedef enum blk_eh_timer_return (timeout_fn)(struct request *, bool);
typedef int (init_hctx_fn)(struct blk_mq_hw_ctx *, void *, unsigned int); typedef int (init_hctx_fn)(struct blk_mq_hw_ctx *, void *, unsigned int);
typedef void (exit_hctx_fn)(struct blk_mq_hw_ctx *, unsigned int); typedef void (exit_hctx_fn)(struct blk_mq_hw_ctx *, unsigned int);
typedef int (init_request_fn)(void *, struct request *, unsigned int, typedef int (init_request_fn)(void *, struct request *, unsigned int,
@ -86,6 +89,9 @@ typedef int (init_request_fn)(void *, struct request *, unsigned int,
typedef void (exit_request_fn)(void *, struct request *, unsigned int, typedef void (exit_request_fn)(void *, struct request *, unsigned int,
unsigned int); unsigned int);
typedef void (busy_iter_fn)(struct blk_mq_hw_ctx *, struct request *, void *,
bool);
struct blk_mq_ops { struct blk_mq_ops {
/* /*
* Queue request * Queue request
@ -100,7 +106,7 @@ struct blk_mq_ops {
/* /*
* Called on request timeout * Called on request timeout
*/ */
rq_timed_out_fn *timeout; timeout_fn *timeout;
softirq_done_fn *complete; softirq_done_fn *complete;
@ -115,6 +121,10 @@ struct blk_mq_ops {
/* /*
* Called for every command allocated by the block layer to allow * Called for every command allocated by the block layer to allow
* the driver to set up driver specific data. * the driver to set up driver specific data.
*
* Tag greater than or equal to queue_depth is for setting up
* flush request.
*
* Ditto for exit/teardown. * Ditto for exit/teardown.
*/ */
init_request_fn *init_request; init_request_fn *init_request;
@ -160,8 +170,9 @@ struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag);
struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *, const int ctx_index); struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *, const int ctx_index);
struct blk_mq_hw_ctx *blk_mq_alloc_single_hw_queue(struct blk_mq_tag_set *, unsigned int, int); struct blk_mq_hw_ctx *blk_mq_alloc_single_hw_queue(struct blk_mq_tag_set *, unsigned int, int);
void blk_mq_end_io(struct request *rq, int error); void blk_mq_start_request(struct request *rq);
void __blk_mq_end_io(struct request *rq, int error); void blk_mq_end_request(struct request *rq, int error);
void __blk_mq_end_request(struct request *rq, int error);
void blk_mq_requeue_request(struct request *rq); void blk_mq_requeue_request(struct request *rq);
void blk_mq_add_to_requeue_list(struct request *rq, bool at_head); void blk_mq_add_to_requeue_list(struct request *rq, bool at_head);
@ -174,7 +185,8 @@ void blk_mq_stop_hw_queues(struct request_queue *q);
void blk_mq_start_hw_queues(struct request_queue *q); void blk_mq_start_hw_queues(struct request_queue *q);
void blk_mq_start_stopped_hw_queues(struct request_queue *q, bool async); void blk_mq_start_stopped_hw_queues(struct request_queue *q, bool async);
void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs); void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs);
void blk_mq_tag_busy_iter(struct blk_mq_tags *tags, void (*fn)(void *data, unsigned long *), void *data); void blk_mq_tag_busy_iter(struct blk_mq_hw_ctx *hctx, busy_iter_fn *fn,
void *priv);
/* /*
* Driver command data is immediately after the request. So subtract request * Driver command data is immediately after the request. So subtract request

View file

@ -78,9 +78,11 @@ struct bio {
struct io_context *bi_ioc; struct io_context *bi_ioc;
struct cgroup_subsys_state *bi_css; struct cgroup_subsys_state *bi_css;
#endif #endif
union {
#if defined(CONFIG_BLK_DEV_INTEGRITY) #if defined(CONFIG_BLK_DEV_INTEGRITY)
struct bio_integrity_payload *bi_integrity; /* data integrity */ struct bio_integrity_payload *bi_integrity; /* data integrity */
#endif #endif
};
unsigned short bi_vcnt; /* how many bio_vec's */ unsigned short bi_vcnt; /* how many bio_vec's */
@ -118,10 +120,8 @@ struct bio {
#define BIO_USER_MAPPED 6 /* contains user pages */ #define BIO_USER_MAPPED 6 /* contains user pages */
#define BIO_EOPNOTSUPP 7 /* not supported */ #define BIO_EOPNOTSUPP 7 /* not supported */
#define BIO_NULL_MAPPED 8 /* contains invalid user pages */ #define BIO_NULL_MAPPED 8 /* contains invalid user pages */
#define BIO_FS_INTEGRITY 9 /* fs owns integrity data, not block layer */ #define BIO_QUIET 9 /* Make BIO Quiet */
#define BIO_QUIET 10 /* Make BIO Quiet */ #define BIO_SNAP_STABLE 10 /* bio data must be snapshotted during write */
#define BIO_MAPPED_INTEGRITY 11/* integrity metadata has been remapped */
#define BIO_SNAP_STABLE 12 /* bio data must be snapshotted during write */
/* /*
* Flags starting here get preserved by bio_reset() - this includes * Flags starting here get preserved by bio_reset() - this includes
@ -162,6 +162,7 @@ enum rq_flag_bits {
__REQ_WRITE_SAME, /* write same block many times */ __REQ_WRITE_SAME, /* write same block many times */
__REQ_NOIDLE, /* don't anticipate more IO after this one */ __REQ_NOIDLE, /* don't anticipate more IO after this one */
__REQ_INTEGRITY, /* I/O includes block integrity payload */
__REQ_FUA, /* forced unit access */ __REQ_FUA, /* forced unit access */
__REQ_FLUSH, /* request for cache flush */ __REQ_FLUSH, /* request for cache flush */
@ -186,9 +187,7 @@ enum rq_flag_bits {
__REQ_FLUSH_SEQ, /* request for flush sequence */ __REQ_FLUSH_SEQ, /* request for flush sequence */
__REQ_IO_STAT, /* account I/O stat */ __REQ_IO_STAT, /* account I/O stat */
__REQ_MIXED_MERGE, /* merge of different types, fail separately */ __REQ_MIXED_MERGE, /* merge of different types, fail separately */
__REQ_KERNEL, /* direct IO to kernel pages */
__REQ_PM, /* runtime pm request */ __REQ_PM, /* runtime pm request */
__REQ_END, /* last of chain of requests */
__REQ_HASHED, /* on IO scheduler merge hash */ __REQ_HASHED, /* on IO scheduler merge hash */
__REQ_MQ_INFLIGHT, /* track inflight for MQ */ __REQ_MQ_INFLIGHT, /* track inflight for MQ */
__REQ_NR_BITS, /* stops here */ __REQ_NR_BITS, /* stops here */
@ -204,13 +203,14 @@ enum rq_flag_bits {
#define REQ_DISCARD (1ULL << __REQ_DISCARD) #define REQ_DISCARD (1ULL << __REQ_DISCARD)
#define REQ_WRITE_SAME (1ULL << __REQ_WRITE_SAME) #define REQ_WRITE_SAME (1ULL << __REQ_WRITE_SAME)
#define REQ_NOIDLE (1ULL << __REQ_NOIDLE) #define REQ_NOIDLE (1ULL << __REQ_NOIDLE)
#define REQ_INTEGRITY (1ULL << __REQ_INTEGRITY)
#define REQ_FAILFAST_MASK \ #define REQ_FAILFAST_MASK \
(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER) (REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
#define REQ_COMMON_MASK \ #define REQ_COMMON_MASK \
(REQ_WRITE | REQ_FAILFAST_MASK | REQ_SYNC | REQ_META | REQ_PRIO | \ (REQ_WRITE | REQ_FAILFAST_MASK | REQ_SYNC | REQ_META | REQ_PRIO | \
REQ_DISCARD | REQ_WRITE_SAME | REQ_NOIDLE | REQ_FLUSH | REQ_FUA | \ REQ_DISCARD | REQ_WRITE_SAME | REQ_NOIDLE | REQ_FLUSH | REQ_FUA | \
REQ_SECURE) REQ_SECURE | REQ_INTEGRITY)
#define REQ_CLONE_MASK REQ_COMMON_MASK #define REQ_CLONE_MASK REQ_COMMON_MASK
#define BIO_NO_ADVANCE_ITER_MASK (REQ_DISCARD|REQ_WRITE_SAME) #define BIO_NO_ADVANCE_ITER_MASK (REQ_DISCARD|REQ_WRITE_SAME)
@ -240,9 +240,7 @@ enum rq_flag_bits {
#define REQ_IO_STAT (1ULL << __REQ_IO_STAT) #define REQ_IO_STAT (1ULL << __REQ_IO_STAT)
#define REQ_MIXED_MERGE (1ULL << __REQ_MIXED_MERGE) #define REQ_MIXED_MERGE (1ULL << __REQ_MIXED_MERGE)
#define REQ_SECURE (1ULL << __REQ_SECURE) #define REQ_SECURE (1ULL << __REQ_SECURE)
#define REQ_KERNEL (1ULL << __REQ_KERNEL)
#define REQ_PM (1ULL << __REQ_PM) #define REQ_PM (1ULL << __REQ_PM)
#define REQ_END (1ULL << __REQ_END)
#define REQ_HASHED (1ULL << __REQ_HASHED) #define REQ_HASHED (1ULL << __REQ_HASHED)
#define REQ_MQ_INFLIGHT (1ULL << __REQ_MQ_INFLIGHT) #define REQ_MQ_INFLIGHT (1ULL << __REQ_MQ_INFLIGHT)

View file

@ -36,6 +36,7 @@ struct request;
struct sg_io_hdr; struct sg_io_hdr;
struct bsg_job; struct bsg_job;
struct blkcg_gq; struct blkcg_gq;
struct blk_flush_queue;
#define BLKDEV_MIN_RQ 4 #define BLKDEV_MIN_RQ 4
#define BLKDEV_MAX_RQ 128 /* Default maximum */ #define BLKDEV_MAX_RQ 128 /* Default maximum */
@ -455,14 +456,7 @@ struct request_queue {
*/ */
unsigned int flush_flags; unsigned int flush_flags;
unsigned int flush_not_queueable:1; unsigned int flush_not_queueable:1;
unsigned int flush_queue_delayed:1; struct blk_flush_queue *fq;
unsigned int flush_pending_idx:1;
unsigned int flush_running_idx:1;
unsigned long flush_pending_since;
struct list_head flush_queue[2];
struct list_head flush_data_in_flight;
struct request *flush_rq;
spinlock_t mq_flush_lock;
struct list_head requeue_list; struct list_head requeue_list;
spinlock_t requeue_lock; spinlock_t requeue_lock;
@ -865,7 +859,7 @@ extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *,
static inline struct request_queue *bdev_get_queue(struct block_device *bdev) static inline struct request_queue *bdev_get_queue(struct block_device *bdev)
{ {
return bdev->bd_disk->queue; return bdev->bd_disk->queue; /* this is never NULL */
} }
/* /*
@ -1285,10 +1279,9 @@ static inline int queue_alignment_offset(struct request_queue *q)
static inline int queue_limit_alignment_offset(struct queue_limits *lim, sector_t sector) static inline int queue_limit_alignment_offset(struct queue_limits *lim, sector_t sector)
{ {
unsigned int granularity = max(lim->physical_block_size, lim->io_min); unsigned int granularity = max(lim->physical_block_size, lim->io_min);
unsigned int alignment = (sector << 9) & (granularity - 1); unsigned int alignment = sector_div(sector, granularity >> 9) << 9;
return (granularity + lim->alignment_offset - alignment) return (granularity + lim->alignment_offset - alignment) % granularity;
& (granularity - 1);
} }
static inline int bdev_alignment_offset(struct block_device *bdev) static inline int bdev_alignment_offset(struct block_device *bdev)
@ -1464,32 +1457,31 @@ static inline uint64_t rq_io_start_time_ns(struct request *req)
#if defined(CONFIG_BLK_DEV_INTEGRITY) #if defined(CONFIG_BLK_DEV_INTEGRITY)
#define INTEGRITY_FLAG_READ 2 /* verify data integrity on read */ enum blk_integrity_flags {
#define INTEGRITY_FLAG_WRITE 4 /* generate data integrity on write */ BLK_INTEGRITY_VERIFY = 1 << 0,
BLK_INTEGRITY_GENERATE = 1 << 1,
BLK_INTEGRITY_DEVICE_CAPABLE = 1 << 2,
BLK_INTEGRITY_IP_CHECKSUM = 1 << 3,
};
struct blk_integrity_exchg { struct blk_integrity_iter {
void *prot_buf; void *prot_buf;
void *data_buf; void *data_buf;
sector_t sector; sector_t seed;
unsigned int data_size; unsigned int data_size;
unsigned short sector_size; unsigned short interval;
const char *disk_name; const char *disk_name;
}; };
typedef void (integrity_gen_fn) (struct blk_integrity_exchg *); typedef int (integrity_processing_fn) (struct blk_integrity_iter *);
typedef int (integrity_vrfy_fn) (struct blk_integrity_exchg *);
typedef void (integrity_set_tag_fn) (void *, void *, unsigned int);
typedef void (integrity_get_tag_fn) (void *, void *, unsigned int);
struct blk_integrity { struct blk_integrity {
integrity_gen_fn *generate_fn; integrity_processing_fn *generate_fn;
integrity_vrfy_fn *verify_fn; integrity_processing_fn *verify_fn;
integrity_set_tag_fn *set_tag_fn;
integrity_get_tag_fn *get_tag_fn;
unsigned short flags; unsigned short flags;
unsigned short tuple_size; unsigned short tuple_size;
unsigned short sector_size; unsigned short interval;
unsigned short tag_size; unsigned short tag_size;
const char *name; const char *name;
@ -1504,10 +1496,10 @@ extern int blk_integrity_compare(struct gendisk *, struct gendisk *);
extern int blk_rq_map_integrity_sg(struct request_queue *, struct bio *, extern int blk_rq_map_integrity_sg(struct request_queue *, struct bio *,
struct scatterlist *); struct scatterlist *);
extern int blk_rq_count_integrity_sg(struct request_queue *, struct bio *); extern int blk_rq_count_integrity_sg(struct request_queue *, struct bio *);
extern int blk_integrity_merge_rq(struct request_queue *, struct request *, extern bool blk_integrity_merge_rq(struct request_queue *, struct request *,
struct request *); struct request *);
extern int blk_integrity_merge_bio(struct request_queue *, struct request *, extern bool blk_integrity_merge_bio(struct request_queue *, struct request *,
struct bio *); struct bio *);
static inline static inline
struct blk_integrity *bdev_get_integrity(struct block_device *bdev) struct blk_integrity *bdev_get_integrity(struct block_device *bdev)
@ -1520,12 +1512,9 @@ static inline struct blk_integrity *blk_get_integrity(struct gendisk *disk)
return disk->integrity; return disk->integrity;
} }
static inline int blk_integrity_rq(struct request *rq) static inline bool blk_integrity_rq(struct request *rq)
{ {
if (rq->bio == NULL) return rq->cmd_flags & REQ_INTEGRITY;
return 0;
return bio_integrity(rq->bio);
} }
static inline void blk_queue_max_integrity_segments(struct request_queue *q, static inline void blk_queue_max_integrity_segments(struct request_queue *q,
@ -1590,15 +1579,15 @@ static inline unsigned short queue_max_integrity_segments(struct request_queue *
{ {
return 0; return 0;
} }
static inline int blk_integrity_merge_rq(struct request_queue *rq, static inline bool blk_integrity_merge_rq(struct request_queue *rq,
struct request *r1, struct request *r1,
struct request *r2) struct request *r2)
{ {
return 0; return 0;
} }
static inline int blk_integrity_merge_bio(struct request_queue *rq, static inline bool blk_integrity_merge_bio(struct request_queue *rq,
struct request *r, struct request *r,
struct bio *b) struct bio *b)
{ {
return 0; return 0;
} }

View file

@ -6,7 +6,8 @@
#define CRC_T10DIF_DIGEST_SIZE 2 #define CRC_T10DIF_DIGEST_SIZE 2
#define CRC_T10DIF_BLOCK_SIZE 1 #define CRC_T10DIF_BLOCK_SIZE 1
__u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer, size_t len); extern __u16 crc_t10dif_generic(__u16 crc, const unsigned char *buffer,
__u16 crc_t10dif(unsigned char const *, size_t); size_t len);
extern __u16 crc_t10dif(unsigned char const *, size_t);
#endif #endif

View file

@ -192,8 +192,6 @@ typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
#define READ 0 #define READ 0
#define WRITE RW_MASK #define WRITE RW_MASK
#define READA RWA_MASK #define READA RWA_MASK
#define KERNEL_READ (READ|REQ_KERNEL)
#define KERNEL_WRITE (WRITE|REQ_KERNEL)
#define READ_SYNC (READ | REQ_SYNC) #define READ_SYNC (READ | REQ_SYNC)
#define WRITE_SYNC (WRITE | REQ_SYNC | REQ_NOIDLE) #define WRITE_SYNC (WRITE | REQ_SYNC | REQ_NOIDLE)

View file

@ -448,10 +448,10 @@ static inline struct rpc_cred *nfs_file_cred(struct file *file)
extern ssize_t nfs_direct_IO(int, struct kiocb *, struct iov_iter *, loff_t); extern ssize_t nfs_direct_IO(int, struct kiocb *, struct iov_iter *, loff_t);
extern ssize_t nfs_file_direct_read(struct kiocb *iocb, extern ssize_t nfs_file_direct_read(struct kiocb *iocb,
struct iov_iter *iter, struct iov_iter *iter,
loff_t pos, bool uio); loff_t pos);
extern ssize_t nfs_file_direct_write(struct kiocb *iocb, extern ssize_t nfs_file_direct_write(struct kiocb *iocb,
struct iov_iter *iter, struct iov_iter *iter,
loff_t pos, bool uio); loff_t pos);
/* /*
* linux/fs/nfs/dir.c * linux/fs/nfs/dir.c

22
include/linux/t10-pi.h Normal file
View file

@ -0,0 +1,22 @@
#ifndef _LINUX_T10_PI_H
#define _LINUX_T10_PI_H
#include <linux/types.h>
#include <linux/blkdev.h>
/*
* T10 Protection Information tuple.
*/
struct t10_pi_tuple {
__be16 guard_tag; /* Checksum */
__be16 app_tag; /* Opaque storage */
__be32 ref_tag; /* Target LBA or indirect LBA */
};
extern struct blk_integrity t10_pi_type1_crc;
extern struct blk_integrity t10_pi_type1_ip;
extern struct blk_integrity t10_pi_type3_crc;
extern struct blk_integrity t10_pi_type3_ip;
#endif

View file

@ -10,9 +10,10 @@
#include <scsi/scsi_device.h> #include <scsi/scsi_device.h>
struct Scsi_Host; struct Scsi_Host;
struct scsi_device;
struct scsi_driver; struct scsi_driver;
#include <scsi/scsi_device.h>
/* /*
* MAX_COMMAND_SIZE is: * MAX_COMMAND_SIZE is:
* The longest fixed-length SCSI CDB as per the SCSI standard. * The longest fixed-length SCSI CDB as per the SCSI standard.
@ -81,6 +82,7 @@ struct scsi_cmnd {
unsigned char prot_op; unsigned char prot_op;
unsigned char prot_type; unsigned char prot_type;
unsigned char prot_flags;
unsigned short cmd_len; unsigned short cmd_len;
enum dma_data_direction sc_data_direction; enum dma_data_direction sc_data_direction;
@ -252,6 +254,14 @@ static inline unsigned char scsi_get_prot_op(struct scsi_cmnd *scmd)
return scmd->prot_op; return scmd->prot_op;
} }
enum scsi_prot_flags {
SCSI_PROT_TRANSFER_PI = 1 << 0,
SCSI_PROT_GUARD_CHECK = 1 << 1,
SCSI_PROT_REF_CHECK = 1 << 2,
SCSI_PROT_REF_INCREMENT = 1 << 3,
SCSI_PROT_IP_CHECKSUM = 1 << 4,
};
/* /*
* The controller usually does not know anything about the target it * The controller usually does not know anything about the target it
* is communicating with. However, when DIX is enabled the controller * is communicating with. However, when DIX is enabled the controller
@ -280,6 +290,17 @@ static inline sector_t scsi_get_lba(struct scsi_cmnd *scmd)
return blk_rq_pos(scmd->request); return blk_rq_pos(scmd->request);
} }
static inline unsigned int scsi_prot_interval(struct scsi_cmnd *scmd)
{
return scmd->device->sector_size;
}
static inline u32 scsi_prot_ref_tag(struct scsi_cmnd *scmd)
{
return blk_rq_pos(scmd->request) >>
(ilog2(scsi_prot_interval(scmd)) - 9) & 0xffffffff;
}
static inline unsigned scsi_prot_sg_count(struct scsi_cmnd *cmd) static inline unsigned scsi_prot_sg_count(struct scsi_cmnd *cmd)
{ {
return cmd->prot_sdb ? cmd->prot_sdb->table.nents : 0; return cmd->prot_sdb ? cmd->prot_sdb->table.nents : 0;
@ -316,17 +337,12 @@ static inline void set_driver_byte(struct scsi_cmnd *cmd, char status)
static inline unsigned scsi_transfer_length(struct scsi_cmnd *scmd) static inline unsigned scsi_transfer_length(struct scsi_cmnd *scmd)
{ {
unsigned int xfer_len = scsi_out(scmd)->length; unsigned int xfer_len = scsi_out(scmd)->length;
unsigned int prot_op = scsi_get_prot_op(scmd); unsigned int prot_interval = scsi_prot_interval(scmd);
unsigned int sector_size = scmd->device->sector_size;
switch (prot_op) { if (scmd->prot_flags & SCSI_PROT_TRANSFER_PI)
case SCSI_PROT_NORMAL: xfer_len += (xfer_len >> ilog2(prot_interval)) * 8;
case SCSI_PROT_WRITE_STRIP:
case SCSI_PROT_READ_INSERT:
return xfer_len;
}
return xfer_len + (xfer_len >> ilog2(sector_size)) * 8; return xfer_len;
} }
#endif /* _SCSI_SCSI_CMND_H */ #endif /* _SCSI_SCSI_CMND_H */

View file

@ -40,7 +40,7 @@ LIST_HEAD(bdi_list);
/* bdi_wq serves all asynchronous writeback tasks */ /* bdi_wq serves all asynchronous writeback tasks */
struct workqueue_struct *bdi_wq; struct workqueue_struct *bdi_wq;
void bdi_lock_two(struct bdi_writeback *wb1, struct bdi_writeback *wb2) static void bdi_lock_two(struct bdi_writeback *wb1, struct bdi_writeback *wb2)
{ {
if (wb1 < wb2) { if (wb1 < wb2) {
spin_lock(&wb1->list_lock); spin_lock(&wb1->list_lock);
@ -376,13 +376,7 @@ static void bdi_wb_shutdown(struct backing_dev_info *bdi)
mod_delayed_work(bdi_wq, &bdi->wb.dwork, 0); mod_delayed_work(bdi_wq, &bdi->wb.dwork, 0);
flush_delayed_work(&bdi->wb.dwork); flush_delayed_work(&bdi->wb.dwork);
WARN_ON(!list_empty(&bdi->work_list)); WARN_ON(!list_empty(&bdi->work_list));
WARN_ON(delayed_work_pending(&bdi->wb.dwork));
/*
* This shouldn't be necessary unless @bdi for some reason has
* unflushed dirty IO after work_list is drained. Do it anyway
* just in case.
*/
cancel_delayed_work_sync(&bdi->wb.dwork);
} }
/* /*
@ -402,21 +396,15 @@ static void bdi_prune_sb(struct backing_dev_info *bdi)
void bdi_unregister(struct backing_dev_info *bdi) void bdi_unregister(struct backing_dev_info *bdi)
{ {
struct device *dev = bdi->dev; if (bdi->dev) {
if (dev) {
bdi_set_min_ratio(bdi, 0); bdi_set_min_ratio(bdi, 0);
trace_writeback_bdi_unregister(bdi); trace_writeback_bdi_unregister(bdi);
bdi_prune_sb(bdi); bdi_prune_sb(bdi);
bdi_wb_shutdown(bdi); bdi_wb_shutdown(bdi);
bdi_debug_unregister(bdi); bdi_debug_unregister(bdi);
device_unregister(bdi->dev);
spin_lock_bh(&bdi->wb_lock);
bdi->dev = NULL; bdi->dev = NULL;
spin_unlock_bh(&bdi->wb_lock);
device_unregister(dev);
} }
} }
EXPORT_SYMBOL(bdi_unregister); EXPORT_SYMBOL(bdi_unregister);
@ -487,8 +475,17 @@ void bdi_destroy(struct backing_dev_info *bdi)
int i; int i;
/* /*
* Splice our entries to the default_backing_dev_info, if this * Splice our entries to the default_backing_dev_info. This
* bdi disappears * condition shouldn't happen. @wb must be empty at this point and
* dirty inodes on it might cause other issues. This workaround is
* added by ce5f8e779519 ("writeback: splice dirty inode entries to
* default bdi on bdi_destroy()") without root-causing the issue.
*
* http://lkml.kernel.org/g/1253038617-30204-11-git-send-email-jens.axboe@oracle.com
* http://thread.gmane.org/gmane.linux.file-systems/35341/focus=35350
*
* We should probably add WARN_ON() to find out whether it still
* happens and track it down if so.
*/ */
if (bdi_has_dirty_io(bdi)) { if (bdi_has_dirty_io(bdi)) {
struct bdi_writeback *dst = &default_backing_dev_info.wb; struct bdi_writeback *dst = &default_backing_dev_info.wb;
@ -503,12 +500,7 @@ void bdi_destroy(struct backing_dev_info *bdi)
bdi_unregister(bdi); bdi_unregister(bdi);
/* WARN_ON(delayed_work_pending(&bdi->wb.dwork));
* If bdi_unregister() had already been called earlier, the dwork
* could still be pending because bdi_prune_sb() can race with the
* bdi_wakeup_thread_delayed() calls from __mark_inode_dirty().
*/
cancel_delayed_work_sync(&bdi->wb.dwork);
for (i = 0; i < NR_BDI_STAT_ITEMS; i++) for (i = 0; i < NR_BDI_STAT_ITEMS; i++)
percpu_counter_destroy(&bdi->bdi_stat[i]); percpu_counter_destroy(&bdi->bdi_stat[i]);