mirror of
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-08-05 16:54:27 +00:00

Currently, disks primarily implement the write zeroes command (aka REQ_OP_WRITE_ZEROES) through two mechanisms: the first involves physically writing zeros to the disk media (e.g., HDDs), while the second performs an unmap operation on the logical blocks, effectively putting them into a deallocated state (e.g., SSDs). The first method is generally slow, while the second method is typically very fast. For example, on certain NVMe SSDs that support NVME_NS_DEAC, submitting REQ_OP_WRITE_ZEROES requests with the NVME_WZ_DEAC bit can accelerate the write zeros operation by placing disk blocks into a deallocated state, which opportunistically avoids writing zeroes to media while still guaranteeing that subsequent reads from the specified block range will return zeroed data. This is a best-effort optimization, not a mandatory requirement, some devices may partially fall back to writing physical zeroes due to factors such as misalignment or being asked to clear a block range smaller than the device's internal allocation unit. Therefore, the speed of this operation is not guaranteed. It is difficult to determine whether the storage device supports unmap write zeroes operation. We cannot determine this by only querying bdev_limits(bdev)->max_write_zeroes_sectors. Therefore, first, add a new hardware queue limit parameters, max_hw_wzeroes_unmap_sectors, to indicate whether a device supports this unmap write zeroes operation. Then, add two new counterpart software queue limits, max_wzeroes_unmap_sectors and max_user_wzeroes_unmap_sectors, which allow users to disable this operation if the speed is very slow on some sepcial devices. Finally, for the stacked devices cases, initialize these two parameters to UINT_MAX. This operation should be enabled by both the stacking driver and all underlying devices. Thanks to Martin K. Petersen for optimizing the documentation of the write_zeroes_unmap sysfs interface. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/20250619111806.3546162-2-yi.zhang@huaweicloud.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
887 lines
32 KiB
Text
887 lines
32 KiB
Text
What: /sys/block/<disk>/alignment_offset
|
||
Date: April 2009
|
||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||
Description:
|
||
Storage devices may report a physical block size that is
|
||
bigger than the logical block size (for instance a drive
|
||
with 4KB physical sectors exposing 512-byte logical
|
||
blocks to the operating system). This parameter
|
||
indicates how many bytes the beginning of the device is
|
||
offset from the disk's natural alignment.
|
||
|
||
|
||
What: /sys/block/<disk>/discard_alignment
|
||
Date: May 2011
|
||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||
Description:
|
||
Devices that support discard functionality may
|
||
internally allocate space in units that are bigger than
|
||
the exported logical block size. The discard_alignment
|
||
parameter indicates how many bytes the beginning of the
|
||
device is offset from the internal allocation unit's
|
||
natural alignment.
|
||
|
||
What: /sys/block/<disk>/atomic_write_max_bytes
|
||
Date: February 2024
|
||
Contact: Himanshu Madhani <himanshu.madhani@oracle.com>
|
||
Description:
|
||
[RO] This parameter specifies the maximum atomic write
|
||
size reported by the device. This parameter is relevant
|
||
for merging of writes, where a merged atomic write
|
||
operation must not exceed this number of bytes.
|
||
This parameter may be greater than the value in
|
||
atomic_write_unit_max_bytes as
|
||
atomic_write_unit_max_bytes will be rounded down to a
|
||
power-of-two and atomic_write_unit_max_bytes may also be
|
||
limited by some other queue limits, such as max_segments.
|
||
This parameter - along with atomic_write_unit_min_bytes
|
||
and atomic_write_unit_max_bytes - will not be larger than
|
||
max_hw_sectors_kb, but may be larger than max_sectors_kb.
|
||
|
||
|
||
What: /sys/block/<disk>/atomic_write_unit_min_bytes
|
||
Date: February 2024
|
||
Contact: Himanshu Madhani <himanshu.madhani@oracle.com>
|
||
Description:
|
||
[RO] This parameter specifies the smallest block which can
|
||
be written atomically with an atomic write operation. All
|
||
atomic write operations must begin at a
|
||
atomic_write_unit_min boundary and must be multiples of
|
||
atomic_write_unit_min. This value must be a power-of-two.
|
||
|
||
|
||
What: /sys/block/<disk>/atomic_write_unit_max_bytes
|
||
Date: February 2024
|
||
Contact: Himanshu Madhani <himanshu.madhani@oracle.com>
|
||
Description:
|
||
[RO] This parameter defines the largest block which can be
|
||
written atomically with an atomic write operation. This
|
||
value must be a multiple of atomic_write_unit_min and must
|
||
be a power-of-two. This value will not be larger than
|
||
atomic_write_max_bytes.
|
||
|
||
|
||
What: /sys/block/<disk>/atomic_write_boundary_bytes
|
||
Date: February 2024
|
||
Contact: Himanshu Madhani <himanshu.madhani@oracle.com>
|
||
Description:
|
||
[RO] A device may need to internally split an atomic write I/O
|
||
which straddles a given logical block address boundary. This
|
||
parameter specifies the size in bytes of the atomic boundary if
|
||
one is reported by the device. This value must be a
|
||
power-of-two and at least the size as in
|
||
atomic_write_unit_max_bytes.
|
||
Any attempt to merge atomic write I/Os must not result in a
|
||
merged I/O which crosses this boundary (if any).
|
||
|
||
|
||
What: /sys/block/<disk>/diskseq
|
||
Date: February 2021
|
||
Contact: Matteo Croce <teknoraver@meta.com>
|
||
Description:
|
||
The /sys/block/<disk>/diskseq files reports the disk
|
||
sequence number, which is a monotonically increasing
|
||
number assigned to every drive.
|
||
Some devices, like the loop device, refresh such number
|
||
every time the backing file is changed.
|
||
The value type is 64 bit unsigned.
|
||
|
||
|
||
What: /sys/block/<disk>/inflight
|
||
Date: October 2009
|
||
Contact: Jens Axboe <axboe@kernel.dk>, Nikanth Karthikesan <knikanth@suse.de>
|
||
Description:
|
||
Reports the number of I/O requests currently in progress
|
||
(pending / in flight) in a device driver. This can be less
|
||
than the number of requests queued in the block device queue.
|
||
The report contains 2 fields: one for read requests
|
||
and one for write requests.
|
||
The value type is unsigned int.
|
||
Cf. Documentation/block/stat.rst which contains a single value for
|
||
requests in flight.
|
||
This is related to /sys/block/<disk>/queue/nr_requests
|
||
and for SCSI device also its queue_depth.
|
||
|
||
|
||
What: /sys/block/<disk>/integrity/device_is_integrity_capable
|
||
Date: July 2014
|
||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||
Description:
|
||
Indicates whether a storage device is capable of storing
|
||
integrity metadata. Set if the device is T10 PI-capable.
|
||
This flag is set to 1 if the storage media is formatted
|
||
with T10 Protection Information. If the storage media is
|
||
not formatted with T10 Protection Information, this flag
|
||
is set to 0.
|
||
|
||
|
||
What: /sys/block/<disk>/integrity/format
|
||
Date: June 2008
|
||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||
Description:
|
||
Metadata format for integrity capable block device.
|
||
E.g. T10-DIF-TYPE1-CRC.
|
||
This field describes the type of T10 Protection Information
|
||
that the block device can send and receive.
|
||
If the device can store application integrity metadata but
|
||
no T10 Protection Information profile is used, this field
|
||
contains "nop".
|
||
If the device does not support integrity metadata, this
|
||
field contains "none".
|
||
|
||
|
||
What: /sys/block/<disk>/integrity/protection_interval_bytes
|
||
Date: July 2015
|
||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||
Description:
|
||
Describes the number of data bytes which are protected
|
||
by one integrity tuple. Typically the device's logical
|
||
block size.
|
||
|
||
|
||
What: /sys/block/<disk>/integrity/read_verify
|
||
Date: June 2008
|
||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||
Description:
|
||
Indicates whether the block layer should verify the
|
||
integrity of read requests serviced by devices that
|
||
support sending integrity metadata.
|
||
|
||
|
||
What: /sys/block/<disk>/integrity/tag_size
|
||
Date: June 2008
|
||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||
Description:
|
||
Number of bytes of integrity tag space available per
|
||
protection_interval_bytes, which is typically
|
||
the device's logical block size.
|
||
This field describes the size of the application tag
|
||
if the storage device is formatted with T10 Protection
|
||
Information and permits use of the application tag.
|
||
The tag_size is reported in bytes and indicates the
|
||
space available for adding an opaque tag to each block
|
||
(protection_interval_bytes).
|
||
If the device does not support T10 Protection Information
|
||
(even if the device provides application integrity
|
||
metadata space), this field is set to 0.
|
||
|
||
|
||
What: /sys/block/<disk>/integrity/write_generate
|
||
Date: June 2008
|
||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||
Description:
|
||
Indicates whether the block layer should automatically
|
||
generate checksums for write requests bound for
|
||
devices that support receiving integrity metadata.
|
||
|
||
|
||
What: /sys/block/<disk>/partscan
|
||
Date: May 2024
|
||
Contact: Christoph Hellwig <hch@lst.de>
|
||
Description:
|
||
The /sys/block/<disk>/partscan files reports if partition
|
||
scanning is enabled for the disk. It returns "1" if partition
|
||
scanning is enabled, or "0" if not. The value type is a 32-bit
|
||
unsigned integer, but only "0" and "1" are valid values.
|
||
|
||
|
||
What: /sys/block/<disk>/<partition>/alignment_offset
|
||
Date: April 2009
|
||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||
Description:
|
||
Storage devices may report a physical block size that is
|
||
bigger than the logical block size (for instance a drive
|
||
with 4KB physical sectors exposing 512-byte logical
|
||
blocks to the operating system). This parameter
|
||
indicates how many bytes the beginning of the partition
|
||
is offset from the disk's natural alignment.
|
||
|
||
|
||
What: /sys/block/<disk>/<partition>/discard_alignment
|
||
Date: May 2011
|
||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||
Description:
|
||
Devices that support discard functionality may
|
||
internally allocate space in units that are bigger than
|
||
the exported logical block size. The discard_alignment
|
||
parameter indicates how many bytes the beginning of the
|
||
partition is offset from the internal allocation unit's
|
||
natural alignment.
|
||
|
||
|
||
What: /sys/block/<disk>/<partition>/stat
|
||
Date: February 2008
|
||
Contact: Jerome Marchand <jmarchan@redhat.com>
|
||
Description:
|
||
The /sys/block/<disk>/<partition>/stat files display the
|
||
I/O statistics of partition <partition>. The format is the
|
||
same as the format of /sys/block/<disk>/stat.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/add_random
|
||
Date: June 2010
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RW] This file allows to turn off the disk entropy contribution.
|
||
Default value of this file is '1'(on).
|
||
|
||
|
||
What: /sys/block/<disk>/queue/chunk_sectors
|
||
Date: September 2016
|
||
Contact: Hannes Reinecke <hare@suse.com>
|
||
Description:
|
||
[RO] chunk_sectors has different meaning depending on the type
|
||
of the disk. For a RAID device (dm-raid), chunk_sectors
|
||
indicates the size in 512B sectors of the RAID volume stripe
|
||
segment. For a zoned block device, either host-aware or
|
||
host-managed, chunk_sectors indicates the size in 512B sectors
|
||
of the zones of the device, with the eventual exception of the
|
||
last zone of the device which may be smaller.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/crypto/
|
||
Date: February 2022
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
The presence of this subdirectory of /sys/block/<disk>/queue/
|
||
indicates that the device supports inline encryption. This
|
||
subdirectory contains files which describe the inline encryption
|
||
capabilities of the device. For more information about inline
|
||
encryption, refer to Documentation/block/inline-encryption.rst.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/crypto/hw_wrapped_keys
|
||
Date: February 2025
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RO] The presence of this file indicates that the device
|
||
supports hardware-wrapped inline encryption keys, i.e. key blobs
|
||
that can only be unwrapped and used by dedicated hardware. For
|
||
more information about hardware-wrapped inline encryption keys,
|
||
see Documentation/block/inline-encryption.rst.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/crypto/max_dun_bits
|
||
Date: February 2022
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RO] This file shows the maximum length, in bits, of data unit
|
||
numbers accepted by the device in inline encryption requests.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/crypto/modes/<mode>
|
||
Date: February 2022
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RO] For each crypto mode (i.e., encryption/decryption
|
||
algorithm) the device supports with inline encryption, a file
|
||
will exist at this location. It will contain a hexadecimal
|
||
number that is a bitmask of the supported data unit sizes, in
|
||
bytes, for that crypto mode.
|
||
|
||
Currently, the crypto modes that may be supported are:
|
||
|
||
* AES-256-XTS
|
||
* AES-128-CBC-ESSIV
|
||
* Adiantum
|
||
|
||
For example, if a device supports AES-256-XTS inline encryption
|
||
with data unit sizes of 512 and 4096 bytes, the file
|
||
/sys/block/<disk>/queue/crypto/modes/AES-256-XTS will exist and
|
||
will contain "0x1200".
|
||
|
||
|
||
What: /sys/block/<disk>/queue/crypto/num_keyslots
|
||
Date: February 2022
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RO] This file shows the number of keyslots the device has for
|
||
use with inline encryption.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/crypto/raw_keys
|
||
Date: February 2025
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RO] The presence of this file indicates that the device
|
||
supports raw inline encryption keys, i.e. keys that are managed
|
||
in raw, plaintext form in software.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/dax
|
||
Date: June 2016
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RO] This file indicates whether the device supports Direct
|
||
Access (DAX), used by CPU-addressable storage to bypass the
|
||
pagecache. It shows '1' if true, '0' if not.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/discard_granularity
|
||
Date: May 2011
|
||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||
Description:
|
||
[RO] Devices that support discard functionality may internally
|
||
allocate space using units that are bigger than the logical
|
||
block size. The discard_granularity parameter indicates the size
|
||
of the internal allocation unit in bytes if reported by the
|
||
device. Otherwise the discard_granularity will be set to match
|
||
the device's physical block size. A discard_granularity of 0
|
||
means that the device does not support discard functionality.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/discard_max_bytes
|
||
Date: May 2011
|
||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||
Description:
|
||
[RW] While discard_max_hw_bytes is the hardware limit for the
|
||
device, this setting is the software limit. Some devices exhibit
|
||
large latencies when large discards are issued, setting this
|
||
value lower will make Linux issue smaller discards and
|
||
potentially help reduce latencies induced by large discard
|
||
operations.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/discard_max_hw_bytes
|
||
Date: July 2015
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RO] Devices that support discard functionality may have
|
||
internal limits on the number of bytes that can be trimmed or
|
||
unmapped in a single operation. The `discard_max_hw_bytes`
|
||
parameter is set by the device driver to the maximum number of
|
||
bytes that can be discarded in a single operation. Discard
|
||
requests issued to the device must not exceed this limit. A
|
||
`discard_max_hw_bytes` value of 0 means that the device does not
|
||
support discard functionality.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/discard_zeroes_data
|
||
Date: May 2011
|
||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||
Description:
|
||
[RO] Will always return 0. Don't rely on any specific behavior
|
||
for discards, and don't read this file.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/dma_alignment
|
||
Date: May 2022
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
Reports the alignment that user space addresses must have to be
|
||
used for raw block device access with O_DIRECT and other driver
|
||
specific passthrough mechanisms.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/fua
|
||
Date: May 2018
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RO] Whether or not the block driver supports the FUA flag for
|
||
write requests. FUA stands for Force Unit Access. If the FUA
|
||
flag is set that means that write requests must bypass the
|
||
volatile cache of the storage device.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/hw_sector_size
|
||
Date: January 2008
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RO] This is the hardware sector size of the device, in bytes.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/independent_access_ranges/
|
||
Date: October 2021
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RO] The presence of this sub-directory of the
|
||
/sys/block/xxx/queue/ directory indicates that the device is
|
||
capable of executing requests targeting different sector ranges
|
||
in parallel. For instance, single LUN multi-actuator hard-disks
|
||
will have an independent_access_ranges directory if the device
|
||
correctly advertises the sector ranges of its actuators.
|
||
|
||
The independent_access_ranges directory contains one directory
|
||
per access range, with each range described using the sector
|
||
(RO) attribute file to indicate the first sector of the range
|
||
and the nr_sectors (RO) attribute file to indicate the total
|
||
number of sectors in the range starting from the first sector of
|
||
the range. For example, a dual-actuator hard-disk will have the
|
||
following independent_access_ranges entries.::
|
||
|
||
$ tree /sys/block/<disk>/queue/independent_access_ranges/
|
||
/sys/block/<disk>/queue/independent_access_ranges/
|
||
|-- 0
|
||
| |-- nr_sectors
|
||
| `-- sector
|
||
`-- 1
|
||
|-- nr_sectors
|
||
`-- sector
|
||
|
||
The sector and nr_sectors attributes use 512B sector unit,
|
||
regardless of the actual block size of the device. Independent
|
||
access ranges do not overlap and include all sectors within the
|
||
device capacity. The access ranges are numbered in increasing
|
||
order of the range start sector, that is, the sector attribute
|
||
of range 0 always has the value 0.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/io_poll
|
||
Date: November 2015
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RW] When read, this file shows whether polling is enabled (1)
|
||
or disabled (0). Writing '0' to this file will disable polling
|
||
for this device. Writing any non-zero value will enable this
|
||
feature.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/io_poll_delay
|
||
Date: November 2016
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RW] This was used to control what kind of polling will be
|
||
performed. It is now fixed to -1, which is classic polling.
|
||
In this mode, the CPU will repeatedly ask for completions
|
||
without giving up any time.
|
||
<deprecated>
|
||
|
||
|
||
What: /sys/block/<disk>/queue/io_timeout
|
||
Date: November 2018
|
||
Contact: Weiping Zhang <zhangweiping@didiglobal.com>
|
||
Description:
|
||
[RW] io_timeout is the request timeout in milliseconds. If a
|
||
request does not complete in this time then the block driver
|
||
timeout handler is invoked. That timeout handler can decide to
|
||
retry the request, to fail it or to start a device recovery
|
||
strategy.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/iostats
|
||
Date: January 2009
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RW] This file is used to control (on/off) the iostats
|
||
accounting of the disk.
|
||
|
||
What: /sys/block/<disk>/queue/iostats_passthrough
|
||
Date: October 2024
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RW] This file is used to control (on/off) the iostats
|
||
accounting of the disk for passthrough commands.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/logical_block_size
|
||
Date: May 2009
|
||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||
Description:
|
||
[RO] This is the smallest unit the storage device can address.
|
||
It is typically 512 bytes.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/max_active_zones
|
||
Date: July 2020
|
||
Contact: Niklas Cassel <niklas.cassel@wdc.com>
|
||
Description:
|
||
[RO] For zoned block devices (zoned attribute indicating
|
||
"host-managed" or "host-aware"), the sum of zones belonging to
|
||
any of the zone states: EXPLICIT OPEN, IMPLICIT OPEN or CLOSED,
|
||
is limited by this value. If this value is 0, there is no limit.
|
||
|
||
If the host attempts to exceed this limit, the driver should
|
||
report this error with BLK_STS_ZONE_ACTIVE_RESOURCE, which user
|
||
space may see as the EOVERFLOW errno.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/max_discard_segments
|
||
Date: February 2017
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RO] The maximum number of DMA scatter/gather entries in a
|
||
discard request.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/max_hw_sectors_kb
|
||
Date: September 2004
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RO] This is the maximum number of kilobytes supported in a
|
||
single data transfer.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/max_integrity_segments
|
||
Date: September 2010
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RO] Maximum number of elements in a DMA scatter/gather list
|
||
with integrity data that will be submitted by the block layer
|
||
core to the associated block driver.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/max_open_zones
|
||
Date: July 2020
|
||
Contact: Niklas Cassel <niklas.cassel@wdc.com>
|
||
Description:
|
||
[RO] For zoned block devices (zoned attribute indicating
|
||
"host-managed" or "host-aware"), the sum of zones belonging to
|
||
any of the zone states: EXPLICIT OPEN or IMPLICIT OPEN, is
|
||
limited by this value. If this value is 0, there is no limit.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/max_sectors_kb
|
||
Date: September 2004
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RW] This is the maximum number of kilobytes that the block
|
||
layer will allow for a filesystem request. Must be smaller than
|
||
or equal to the maximum size allowed by the hardware. Write 0
|
||
to use default kernel settings.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/max_segment_size
|
||
Date: March 2010
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RO] Maximum size in bytes of a single element in a DMA
|
||
scatter/gather list.
|
||
|
||
What: /sys/block/<disk>/queue/max_write_streams
|
||
Date: November 2024
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RO] Maximum number of write streams supported, 0 if not
|
||
supported. If supported, valid values are 1 through
|
||
max_write_streams, inclusive.
|
||
|
||
What: /sys/block/<disk>/queue/write_stream_granularity
|
||
Date: November 2024
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RO] Granularity of a write stream in bytes. The granularity
|
||
of a write stream is the size that should be discarded or
|
||
overwritten together to avoid write amplification in the device.
|
||
|
||
What: /sys/block/<disk>/queue/max_segments
|
||
Date: March 2010
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RO] Maximum number of elements in a DMA scatter/gather list
|
||
that is submitted to the associated block driver.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/minimum_io_size
|
||
Date: April 2009
|
||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||
Description:
|
||
[RO] Storage devices may report a granularity or preferred
|
||
minimum I/O size which is the smallest request the device can
|
||
perform without incurring a performance penalty. For disk
|
||
drives this is often the physical block size. For RAID arrays
|
||
it is often the stripe chunk size. A properly aligned multiple
|
||
of minimum_io_size is the preferred request size for workloads
|
||
where a high number of I/O operations is desired.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/nomerges
|
||
Date: January 2010
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RW] Standard I/O elevator operations include attempts to merge
|
||
contiguous I/Os. For known random I/O loads these attempts will
|
||
always fail and result in extra cycles being spent in the
|
||
kernel. This allows one to turn off this behavior on one of two
|
||
ways: When set to 1, complex merge checks are disabled, but the
|
||
simple one-shot merges with the previous I/O request are
|
||
enabled. When set to 2, all merge tries are disabled. The
|
||
default value is 0 - which enables all types of merge tries.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/nr_requests
|
||
Date: July 2003
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RW] This controls how many requests may be allocated in the
|
||
block layer for read or write requests. Note that the total
|
||
allocated number may be twice this amount, since it applies only
|
||
to reads or writes (not the accumulated sum).
|
||
|
||
To avoid priority inversion through request starvation, a
|
||
request queue maintains a separate request pool per each cgroup
|
||
when CONFIG_BLK_CGROUP is enabled, and this parameter applies to
|
||
each such per-block-cgroup request pool. IOW, if there are N
|
||
block cgroups, each request queue may have up to N request
|
||
pools, each independently regulated by nr_requests.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/nr_zones
|
||
Date: November 2018
|
||
Contact: Damien Le Moal <damien.lemoal@wdc.com>
|
||
Description:
|
||
[RO] nr_zones indicates the total number of zones of a zoned
|
||
block device ("host-aware" or "host-managed" zone model). For
|
||
regular block devices, the value is always 0.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/optimal_io_size
|
||
Date: April 2009
|
||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||
Description:
|
||
[RO] Storage devices may report an optimal I/O size, which is
|
||
the device's preferred unit for sustained I/O. This is rarely
|
||
reported for disk drives. For RAID arrays it is usually the
|
||
stripe width or the internal track size. A properly aligned
|
||
multiple of optimal_io_size is the preferred request size for
|
||
workloads where sustained throughput is desired. If no optimal
|
||
I/O size is reported this file contains 0.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/physical_block_size
|
||
Date: May 2009
|
||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||
Description:
|
||
[RO] This is the smallest unit a physical storage device can
|
||
write atomically. It is usually the same as the logical block
|
||
size but may be bigger. One example is SATA drives with 4KB
|
||
sectors that expose a 512-byte logical block size to the
|
||
operating system. For stacked block devices the
|
||
physical_block_size variable contains the maximum
|
||
physical_block_size of the component devices.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/read_ahead_kb
|
||
Date: May 2004
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RW] Maximum number of kilobytes to read-ahead for filesystems
|
||
on this block device.
|
||
|
||
For MADV_HUGEPAGE, the readahead size may exceed this setting
|
||
since its granularity is based on the hugepage size.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/rotational
|
||
Date: January 2009
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RW] This file is used to stat if the device is of rotational
|
||
type or non-rotational type.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/rq_affinity
|
||
Date: September 2008
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RW] If this option is '1', the block layer will migrate request
|
||
completions to the cpu "group" that originally submitted the
|
||
request. For some workloads this provides a significant
|
||
reduction in CPU cycles due to caching effects.
|
||
|
||
For storage configurations that need to maximize distribution of
|
||
completion processing setting this option to '2' forces the
|
||
completion to run on the requesting cpu (bypassing the "group"
|
||
aggregation logic).
|
||
|
||
|
||
What: /sys/block/<disk>/queue/scheduler
|
||
Date: October 2004
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RW] When read, this file will display the current and available
|
||
IO schedulers for this block device. The currently active IO
|
||
scheduler will be enclosed in [] brackets. Writing an IO
|
||
scheduler name to this file will switch control of this block
|
||
device to that new IO scheduler. Note that writing an IO
|
||
scheduler name to this file will attempt to load that IO
|
||
scheduler module, if it isn't already present in the system.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/stable_writes
|
||
Date: September 2020
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RW] This file will contain '1' if memory must not be modified
|
||
while it is being used in a write request to this device. When
|
||
this is the case and the kernel is performing writeback of a
|
||
page, the kernel will wait for writeback to complete before
|
||
allowing the page to be modified again, rather than allowing
|
||
immediate modification as is normally the case. This
|
||
restriction arises when the device accesses the memory multiple
|
||
times where the same data must be seen every time -- for
|
||
example, once to calculate a checksum and once to actually write
|
||
the data. If no such restriction exists, this file will contain
|
||
'0'. This file is writable for testing purposes.
|
||
|
||
What: /sys/block/<disk>/queue/virt_boundary_mask
|
||
Date: April 2021
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RO] This file shows the I/O segment memory alignment mask for
|
||
the block device. I/O requests to this device will be split
|
||
between segments wherever either the memory address of the end
|
||
of the previous segment or the memory address of the beginning
|
||
of the current segment is not aligned to virt_boundary_mask + 1
|
||
bytes.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/wbt_lat_usec
|
||
Date: November 2016
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RW] If the device is registered for writeback throttling, then
|
||
this file shows the target minimum read latency. If this latency
|
||
is exceeded in a given window of time (see wb_window_usec), then
|
||
the writeback throttling will start scaling back writes. Writing
|
||
a value of '0' to this file disables the feature. Writing a
|
||
value of '-1' to this file resets the value to the default
|
||
setting.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/write_cache
|
||
Date: April 2016
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RW] When read, this file will display whether the device has
|
||
write back caching enabled or not. It will return "write back"
|
||
for the former case, and "write through" for the latter. Writing
|
||
to this file can change the kernels view of the device, but it
|
||
doesn't alter the device state. This means that it might not be
|
||
safe to toggle the setting from "write back" to "write through",
|
||
since that will also eliminate cache flushes issued by the
|
||
kernel.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/write_same_max_bytes
|
||
Date: January 2012
|
||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||
Description:
|
||
[RO] Some devices support a write same operation in which a
|
||
single data block can be written to a range of several
|
||
contiguous blocks on storage. This can be used to wipe areas on
|
||
disk or to initialize drives in a RAID configuration.
|
||
write_same_max_bytes indicates how many bytes can be written in
|
||
a single write same command. If write_same_max_bytes is 0, write
|
||
same is not supported by the device.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/write_zeroes_max_bytes
|
||
Date: November 2016
|
||
Contact: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
|
||
Description:
|
||
[RO] Devices that support write zeroes operation in which a
|
||
single request can be issued to zero out the range of contiguous
|
||
blocks on storage without having any payload in the request.
|
||
This can be used to optimize writing zeroes to the devices.
|
||
write_zeroes_max_bytes indicates how many bytes can be written
|
||
in a single write zeroes command. If write_zeroes_max_bytes is
|
||
0, write zeroes is not supported by the device.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/write_zeroes_unmap_max_hw_bytes
|
||
Date: January 2025
|
||
Contact: Zhang Yi <yi.zhang@huawei.com>
|
||
Description:
|
||
[RO] This file indicates whether a device supports zeroing data
|
||
in a specified block range without incurring the cost of
|
||
physically writing zeroes to the media for each individual
|
||
block. If this parameter is set to write_zeroes_max_bytes, the
|
||
device implements a zeroing operation which opportunistically
|
||
avoids writing zeroes to media while still guaranteeing that
|
||
subsequent reads from the specified block range will return
|
||
zeroed data. This operation is a best-effort optimization, a
|
||
device may fall back to physically writing zeroes to the media
|
||
due to other factors such as misalignment or being asked to
|
||
clear a block range smaller than the device's internal
|
||
allocation unit. If this parameter is set to 0, the device may
|
||
have to write each logical block media during a zeroing
|
||
operation.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/write_zeroes_unmap_max_bytes
|
||
Date: January 2025
|
||
Contact: Zhang Yi <yi.zhang@huawei.com>
|
||
Description:
|
||
[RW] While write_zeroes_unmap_max_hw_bytes is the hardware limit
|
||
for the device, this setting is the software limit. Since the
|
||
unmap write zeroes operation is a best-effort optimization, some
|
||
devices may still physically writing zeroes to media. So the
|
||
speed of this operation is not guaranteed. Writing a value of
|
||
'0' to this file disables this operation. Otherwise, this
|
||
parameter should be equal to write_zeroes_unmap_max_hw_bytes.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/zone_append_max_bytes
|
||
Date: May 2020
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RO] This is the maximum number of bytes that can be written to
|
||
a sequential zone of a zoned block device using a zone append
|
||
write operation (REQ_OP_ZONE_APPEND). This value is always 0 for
|
||
regular block devices.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/zone_write_granularity
|
||
Date: January 2021
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RO] This indicates the alignment constraint, in bytes, for
|
||
write operations in sequential zones of zoned block devices
|
||
(devices with a zoned attributed that reports "host-managed" or
|
||
"host-aware"). This value is always 0 for regular block devices.
|
||
|
||
|
||
What: /sys/block/<disk>/queue/zoned
|
||
Date: September 2016
|
||
Contact: Damien Le Moal <damien.lemoal@wdc.com>
|
||
Description:
|
||
[RO] zoned indicates if the device is a zoned block device and
|
||
the zone model of the device if it is indeed zoned. The
|
||
possible values indicated by zoned are "none" for regular block
|
||
devices and "host-aware" or "host-managed" for zoned block
|
||
devices. The characteristics of host-aware and host-managed
|
||
zoned block devices are described in the ZBC (Zoned Block
|
||
Commands) and ZAC (Zoned Device ATA Command Set) standards.
|
||
These standards also define the "drive-managed" zone model.
|
||
However, since drive-managed zoned block devices do not support
|
||
zone commands, they will be treated as regular block devices and
|
||
zoned will report "none".
|
||
|
||
|
||
What: /sys/block/<disk>/hidden
|
||
Date: March 2023
|
||
Contact: linux-block@vger.kernel.org
|
||
Description:
|
||
[RO] the block device is hidden. it doesn’t produce events, and
|
||
can’t be opened from userspace or using blkdev_get*.
|
||
Used for the underlying components of multipath devices.
|
||
|
||
|
||
What: /sys/block/<disk>/stat
|
||
Date: February 2008
|
||
Contact: Jerome Marchand <jmarchan@redhat.com>
|
||
Description:
|
||
The /sys/block/<disk>/stat files displays the I/O
|
||
statistics of disk <disk>. They contain 11 fields:
|
||
|
||
== ==============================================
|
||
1 reads completed successfully
|
||
2 reads merged
|
||
3 sectors read
|
||
4 time spent reading (ms)
|
||
5 writes completed
|
||
6 writes merged
|
||
7 sectors written
|
||
8 time spent writing (ms)
|
||
9 I/Os currently in progress
|
||
10 time spent doing I/Os (ms)
|
||
11 weighted time spent doing I/Os (ms)
|
||
12 discards completed
|
||
13 discards merged
|
||
14 sectors discarded
|
||
15 time spent discarding (ms)
|
||
16 flush requests completed
|
||
17 time spent flushing (ms)
|
||
== ==============================================
|
||
|
||
For more details refer Documentation/admin-guide/iostats.rst
|