The controller may frequently enter and exit suspend for each I/O which we
need to deal with. This is inefficient and may cause too much suspend and
resume activity for the controller. To avoid this, use a default 5s
autosuspend for the controller to stop frequently suspending and
resuming. This value may still be modified via sysfs interfaces.
Link: https://lore.kernel.org/r/1639999298-244569-16-git-send-email-chenxiang66@hisilicon.com
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
It is possible that controller may become suspended between processing a
phyup interrupt and the event being processed by libsas. As such, we can't
ensure the controller is active when processing the phyup event - this may
cause the phyup event to be lost or other issues. To avoid any possible
issues, add pm_runtime_get_noresume() in phyup interrupt handler and
pm_runtime_put_sync() in the work handler exit to ensure that we stay
always active. Since we only want to call pm_runtime_get_noresume() for v3
hw, signal this will a new event, HISI_PHYE_PHY_UP_PM.
Link: https://lore.kernel.org/r/1639999298-244569-14-git-send-email-chenxiang66@hisilicon.com
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For the hisi_sas driver, if a directly attached disk is removed during
suspend, a hang will occur in the resume process:
The background is that in commit 16fd4a7c59 ("scsi: hisi_sas: Add device
link between SCSI devices and hisi_hba"), it is ensured that the HBA device
cannot be runtime suspended when any SCSI device associated is active.
Other drivers which use libsas don't worry about this as none support
runtime suspend.
The mentioned hang occurs when an disk is removed during suspend. In the
removal process - from PHYE_RESUME_TIMEOUT event processing - we call into
scsi_remove_device(), which is being processed in the HA event workqueue.
Here we wait for all suppliers of the SCSI device to resume, which includes
the HBA device (from the above commit). However the HBA device cannot
resume, as it is waiting for the PHYE_RESUME_TIMEOUT to be processed (from
calling sas_resume_ha() -> sas_drain_work()). This is the deadlock.
There does not appear to be any need for the sas_drain_work() to be called
at all in sas_resume_ha() as it is not syncing against anything, so allow
LLDDs to avoid this by providing a variant of sas_resume_ha() which does
"sync", i.e. doesn't drain the event workqueue.
Link: https://lore.kernel.org/r/1639999298-244569-2-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The OOB interrupt and phyup interrupt handlers may run out-of-order in high
CPU usage scenarios. Since the hisi_sas_phy.timer is added in
hisi_sas_phy_oob_ready() and disarmed in phy_up_v3_hw(), this out-of-order
execution will cause hisi_sas_phy.timer timeout to trigger.
To solve, protect hisi_sas_phy.timer and .attached with a lock, and ensure
that the timer won't be added after phyup handler completes.
Link: https://lore.kernel.org/r/1639579061-179473-8-git-send-email-john.garry@huawei.com
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If we issue a controller reset command during executing a FLR a hung task
may be found:
Call trace:
__switch_to+0x158/0x1cc
__schedule+0x2e8/0x85c
schedule+0x7c/0x110
schedule_timeout+0x190/0x1cc
__down+0x7c/0xd4
down+0x5c/0x7c
hisi_sas_task_exec+0x510/0x680 [hisi_sas_main]
hisi_sas_queue_command+0x24/0x30 [hisi_sas_main]
smp_execute_task_sg+0xf4/0x23c [libsas]
sas_smp_phy_control+0x110/0x1e0 [libsas]
transport_sas_phy_reset+0xc8/0x190 [libsas]
phy_reset_work+0x2c/0x40 [libsas]
process_one_work+0x1dc/0x48c
worker_thread+0x15c/0x464
kthread+0x160/0x170
ret_from_fork+0x10/0x18
This is a race condition which occurs when the FLR completes first.
Here the host HISI_SAS_RESETTING_BIT flag out gets of sync as
HISI_SAS_RESETTING_BIT is not always cleared with the hisi_hba.sem held, so
now only set/unset HISI_SAS_RESETTING_BIT under hisi_hba.sem .
Link: https://lore.kernel.org/r/1639579061-179473-7-git-send-email-john.garry@huawei.com
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This series consists of the usual driver updates (ufs, smartpqi, lpfc,
target, megaraid_sas, hisi_sas, qla2xxx) and minor updates and bug
fixes. Notable core changes are the removal of scsi->tag which caused
some churn in obsolete drivers and a sweep through all drivers to call
scsi_done() directly instead of scsi->done() which removes a pointer
indirection from the hot path and a move to register core sysfs files
earlier, which means they're available to KOBJ_ADD processing, which
necessitates switching all drivers to using attribute groups.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYYUfBCYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishbUJAQDZt4oc
vUx9JpyrdHxxTCuOzVFd8W1oJn0k5ltCBuz4yAD8DNbGhGm93raMSJ3FOOlzLEbP
RG8vBdpxMudlvxAPi/A=
=BSFz
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This consists of the usual driver updates (ufs, smartpqi, lpfc,
target, megaraid_sas, hisi_sas, qla2xxx) and minor updates and bug
fixes.
Notable core changes are the removal of scsi->tag which caused some
churn in obsolete drivers and a sweep through all drivers to call
scsi_done() directly instead of scsi->done() which removes a pointer
indirection from the hot path and a move to register core sysfs files
earlier, which means they're available to KOBJ_ADD processing, which
necessitates switching all drivers to using attribute groups"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (279 commits)
scsi: lpfc: Update lpfc version to 14.0.0.3
scsi: lpfc: Allow fabric node recovery if recovery is in progress before devloss
scsi: lpfc: Fix link down processing to address NULL pointer dereference
scsi: lpfc: Allow PLOGI retry if previous PLOGI was aborted
scsi: lpfc: Fix use-after-free in lpfc_unreg_rpi() routine
scsi: lpfc: Correct sysfs reporting of loop support after SFP status change
scsi: lpfc: Wait for successful restart of SLI3 adapter during host sg_reset
scsi: lpfc: Revert LOG_TRACE_EVENT back to LOG_INIT prior to driver_resource_setup()
scsi: ufs: ufshcd-pltfrm: Fix memory leak due to probe defer
scsi: ufs: mediatek: Avoid sched_clock() misuse
scsi: mpt3sas: Make mpt3sas_dev_attrs static
scsi: scsi_transport_sas: Add 22.5 Gbps link rate definitions
scsi: target: core: Stop using bdevname()
scsi: aha1542: Use memcpy_{from,to}_bvec()
scsi: sr: Add error handling support for add_disk()
scsi: sd: Add error handling support for add_disk()
scsi: target: Perform ALUA group changes in one step
scsi: target: Replace lun_tg_pt_gp_lock with rcu in I/O path
scsi: target: Fix alua_tg_pt_gps_count tracking
scsi: target: Fix ordered tag handling
...
Drop various include not actually used in blkdev.h itself.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
struct device supports attribute groups directly but does not support
struct device_attribute directly. Hence switch to attribute groups.
Link: https://lore.kernel.org/r/20211012233558.4066756-21-bvanassche@acm.org
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When issuing a hardreset/linkreset/phy_set_linkrate from sysfs, the phy
will be disabled and re-enabled for the directly attached scenario.
It takes some time for the phy to come back up after re-enabling the phy.
If the controller becomes suspended while waiting for the phy to come back,
the phy up may be lost (along with the disk).
To solve this problem, wait for the phy up to occur with a timeout. Indeed
this is already done in hisi_sas_debug_I_T_nexus_reset() for local phys, so
just relocate the functionality to hisi_sas_control_phy().
Since the HA workqueue is drained when suspending the controller, and the
phy control function is called from the same workqueue, we can guarantee
that the controller will not be suspended during this period.
Link: https://lore.kernel.org/r/1634041588-74824-3-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Perform driver-specific SCSI device initialization in the designated SCSI
midlayer callback instead of relying on the libsas "device found" callback.
The SCSI midlayer .slave_alloc interface is called prior to sending any I/O
to the device.
Link: https://lore.kernel.org/r/1634041588-74824-2-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The hisi_hba debugfs_dump_index member should increased after a dump
insertion completed, and not before it has started, so fix the code to do
so.
Link: https://lore.kernel.org/r/1629799260-120116-6-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some usage of del_timer() in the driver is potentially unsafe.
When running the sas_task->slow_task timer in
hisi_sas_exec_internal_tmf_task(), execution may be blocked in function
hisi_sas_task_exec(); so it is possible that the timer is running when the
callback to disable the timer is running. This could be dangerous, as we
immediately release resources which the timer callback uses after disabling
the timer. The same situation may be found at other sites, such as
_hisi_sas_internal_task_abort().
Change calls to del_timer() to del_timer_sync() as necessary, to ensure any
timer has finished when disabling.
Also remove calls to timer_pending() prior to del_timer() as it is not
necessary.
Link: https://lore.kernel.org/r/1629799260-120116-5-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
HISI_SAS_RESET_BIT means that the controller is being reset, and so the
name is a bit vague. Rename it to HISI_SAS_RESETTING_BIT.
Link: https://lore.kernel.org/r/1629799260-120116-4-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use managed PCI functions such as pcim_enable_device() and
pcim_iomap_regions() to simplify exception handling code.
Link: https://lore.kernel.org/r/1629799260-120116-2-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Prepare for removal of the request pointer by using scsi_cmd_to_rq()
instead. This patch does not change any functionality.
Link: https://lore.kernel.org/r/20210809230355.8186-22-bvanassche@acm.org
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This is a set of minor fixes and clean ups in the core and various
drivers. The only core change in behaviour is the I/O retry for
spinup notify, but that shouldn't impact anything other than the
failing case.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYOqWWyYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishYBtAQCpqVdl
Axi1SpD6/UuKOgRmboWscoKD8FLHwvLDMRyCRQEAnLu3XdB9HcQrwZOkTG14vrfB
q2XB5cP4XAITxFLN1qo=
=9AO9
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"This is a set of minor fixes and clean ups in the core and various
drivers.
The only core change in behaviour is the I/O retry for spinup notify,
but that shouldn't impact anything other than the failing case"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (23 commits)
scsi: virtio_scsi: Add validation for residual bytes from response
scsi: ipr: System crashes when seeing type 20 error
scsi: core: Retry I/O for Notify (Enable Spinup) Required error
scsi: mpi3mr: Fix warnings reported by smatch
scsi: qedf: Add check to synchronize abort and flush
scsi: MAINTAINERS: Add mpi3mr driver maintainers
scsi: libfc: Fix array index out of bound exception
scsi: mvsas: Use DEVICE_ATTR_RO()/RW() macro
scsi: megaraid_mbox: Use DEVICE_ATTR_ADMIN_RO() macro
scsi: qedf: Use DEVICE_ATTR_RO() macro
scsi: qedi: Use DEVICE_ATTR_RO() macro
scsi: message: mptfc: Switch from pci_ to dma_ API
scsi: be2iscsi: Fix some missing space in some messages
scsi: be2iscsi: Fix an error handling path in beiscsi_dev_probe()
scsi: ufs: Fix build warning without CONFIG_PM
scsi: bnx2fc: Remove meaningless bnx2fc_abts_cleanup() return value assignment
scsi: qla2xxx: Add heartbeat check
scsi: virtio_scsi: Do not overwrite SCSI status
scsi: libsas: Add LUN number check in .slave_alloc callback
scsi: core: Inline scsi_mq_alloc_queue()
...
This series consists of the usual driver updates (ufs, ibmvfc,
megaraid_sas, lpfc, elx, mpi3mr, qedi, iscsi, storvsc, mpt3sas) with
elx and mpi3mr being new drivers. The major core change is a rework
to drop the status byte handling macros and the old bit shifted
definitions and the rest of the updates are minor fixes.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYN7I6iYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishXpRAQCkngYZ
35yQrqOxgOk2pfrysE95tHrV1MfJm2U49NFTwAEAuZutEvBUTfBF+sbcJ06r6q7i
H0hkJN/Io7enFs5v3WA=
=zwIa
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This series consists of the usual driver updates (ufs, ibmvfc,
megaraid_sas, lpfc, elx, mpi3mr, qedi, iscsi, storvsc, mpt3sas) with
elx and mpi3mr being new drivers.
The major core change is a rework to drop the status byte handling
macros and the old bit shifted definitions and the rest of the updates
are minor fixes"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (287 commits)
scsi: aha1740: Avoid over-read of sense buffer
scsi: arcmsr: Avoid over-read of sense buffer
scsi: ips: Avoid over-read of sense buffer
scsi: ufs: ufs-mediatek: Add missing of_node_put() in ufs_mtk_probe()
scsi: elx: libefc: Fix IRQ restore in efc_domain_dispatch_frame()
scsi: elx: libefc: Fix less than zero comparison of a unsigned int
scsi: elx: efct: Fix pointer error checking in debugfs init
scsi: elx: efct: Fix is_originator return code type
scsi: elx: efct: Fix link error for _bad_cmpxchg
scsi: elx: efct: Eliminate unnecessary boolean check in efct_hw_command_cancel()
scsi: elx: efct: Do not use id uninitialized in efct_lio_setup_session()
scsi: elx: efct: Fix error handling in efct_hw_init()
scsi: elx: efct: Remove redundant initialization of variable lun
scsi: elx: efct: Fix spelling mistake "Unexected" -> "Unexpected"
scsi: lpfc: Fix build error in lpfc_scsi.c
scsi: target: iscsi: Remove redundant continue statement
scsi: qla4xxx: Remove redundant continue statement
scsi: ppa: Switch to use module_parport_driver()
scsi: imm: Switch to use module_parport_driver()
scsi: mpt3sas: Fix error return value in _scsih_expander_add()
...
Offlining a SATA device connected to a hisi SAS controller and then
scanning the host will result in detecting 255 non-existent devices:
# lsscsi
[2:0:0:0] disk ATA Samsung SSD 860 2B6Q /dev/sda
[2:0:1:0] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdb
[2:0:2:0] disk SEAGATE ST600MM0006 B001 /dev/sdc
# echo "offline" > /sys/block/sdb/device/state
# echo "- - -" > /sys/class/scsi_host/host2/scan
# lsscsi
[2:0:0:0] disk ATA Samsung SSD 860 2B6Q /dev/sda
[2:0:1:0] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdb
[2:0:1:1] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdh
...
[2:0:1:255] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdjb
After a REPORT LUN command issued to the offline device fails, the SCSI
midlayer tries to do a sequential scan of all devices whose LUN number is
not 0. However, SATA does not support LUN numbers at all.
Introduce a generic sas_slave_alloc() handler which will return -ENXIO for
SATA devices if the requested LUN number is larger than 0 and make libsas
drivers use this function as their .slave_alloc callback.
Link: https://lore.kernel.org/r/20210622034037.1467088-1-yuyufen@huawei.com
Reported-by: Wu Bo <wubo40@huawei.com>
Suggested-by: John Garry <john.garry@huawei.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch prepares for converting SAM status codes into an enum. Without
this patch converting SAM status codes into an enumeration type would
trigger complaints about enum type mismatches for the SAS code.
Link: https://lore.kernel.org/r/20210524025457.11299-2-bvanassche@acm.org
Cc: Hannes Reinecke <hare@suse.com>
Cc: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Cc: Jason Yan <yanaijie@huawei.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
irqs allocated with devm_request_irq() should not be freed using
free_irq(). Doing so causes a dangling pointer and a subsequent double
free.
Link: https://lore.kernel.org/r/20210519130519.2661938-1-yangyingliang@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If a channel interrupt occurs without any status bit set, the handler will
return directly. However, if such redundant interrupts are received, it's
better to check what happen, so add logs for this.
Link: https://lore.kernel.org/r/1617709711-195853-6-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: Yihang Li <liyihang6@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The debugfs snapshot should be executed before the reset occurs to ensure
that the register contents are saved properly.
As such, it is incorrect to queue the debugfs dump when running a reset as
the reset will occur prior to the snapshot work item is handler.
Therefore, directly snapshot registers in the reset work handler.
Link: https://lore.kernel.org/r/1617709711-195853-5-git-send-email-john.garry@huawei.com
Signed-off-by: Jianqin Xie <xiejianqin@hisilicon.com>
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Function sas_unregister_ha() needs to be called to roll back if
hisi_hba->hw->hw_init() fails in function hisi_sas_probe() or
hisi_sas_v3_probe(). Make that change.
Link: https://lore.kernel.org/r/1617709711-195853-4-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
To help debugging efforts, print the device SAS address for v3 hw erroneous
completion log.
Here is an example print:
hisi_sas_v3_hw 0000:b4:02.0: erroneous completion iptt=2193 task=000000002b0c13f8 dev id=17 addr=570fd45f9d17b001
Link: https://lore.kernel.org/r/1617709711-195853-3-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The controller provides trace FIFO DFX tool to assist link fault debugging
and link optimization. This tool can be helpful when debugging link faults
without SAS analyzers. Each PHY has an independent trace FIFO interface.
The user can configure the trace FIFO tool of one PHY by using the
following six interfaces:
signal_sel: select signal group applies to different scenarios.
0x0: linkrate negotiation
0x1: Host 12G TX train
0x2: Disk 12G TX train
0x3: SAS PHY CTRL DFX 0
0x4: SAS PHY CTRL DFX 1
0x5: SAS PCS DFX
other: linkrate negotiation
dump_mask: The masked hardware status bit will not be updated.
dump_mode: determines how to dump data after trigger signal is generated.
0x0: dump forever
0x1: dump 32 data after trigger signal is generated
0x2: no more dump after trigger signal is generated
trigger_mode: determines the trigger mode, level or edge.
0x0: dump when trigger signal changed
0x1: dump when trigger signal's level equal to trigger_level
0x2: dump when trigger signal's level different from trigger_level
trigger_level: determines the trigger level.
trigger_msk: mask trigger signal
The user can get 32-byte values from hardware by reading the rd_data.
These values consitute the status record of the hardware at different time
points.
Link: https://lore.kernel.org/r/1611659068-131975-6-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If the controller reset occurs at the same time as driver removal, it may
be possible that the interrupts have been released prior to the host
softreset, and calling pci_irq_vector() there causes a WARN:
WARNING: CPU: 37 PID: 1542 /pci/msi.c:1275 pci_irq_vector+0xc0/0xd0
Call trace:
pci_irq_vector+0xc0/0xd0
disable_host_v3_hw+0x58/0x5b0 [hisi_sas_v3_hw]
soft_reset_v3_hw+0x40/0xc0 [hisi_sas_v3_hw]
hisi_sas_controller_reset+0x150/0x260 [hisi_sas_main]
hisi_sas_rst_work_handler+0x3c/0x58 [hisi_sas_main]
To fix, flush the driver workqueue prior to releasing the interrupts to
ensure any resets have been completed.
Link: https://lore.kernel.org/r/1611659068-131975-5-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
libsas event notifiers required an extension where gfp_t flags must be
explicitly passed. For bisectability, a temporary _gfp() variant of such
functions were added. All call sites then got converted use the _gfp()
variants and explicitly pass GFP context. Having no callers left, the
original libsas notifiers were then modified to accept gfp_t flags by
default.
Switch back to the original libas API, while still passing GFP context.
The libsas _gfp() variants will be removed afterwards.
Link: https://lore.kernel.org/r/20210118100955.1761652-14-a.darwish@linutronix.de
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use the new libsas event notifiers API, which requires callers to
explicitly pass the gfp_t memory allocation flags.
Below are the context analysis for modified functions:
=> hisi_sas_bytes_dmaed():
Since it is invoked from both process and atomic contexts, let its callers
pass the gfp_t flags:
* hisi_sas_main.c:
------------------
hisi_sas_phyup_work(): workqueue context
-> hisi_sas_bytes_dmaed(..., GFP_KERNEL)
hisi_sas_controller_reset_done(): has an msleep()
-> hisi_sas_rescan_topology()
-> hisi_sas_phy_down()
-> hisi_sas_bytes_dmaed(..., GFP_KERNEL)
hisi_sas_debug_I_T_nexus_reset(): calls wait_for_completion_timeout()
-> hisi_sas_phy_down()
-> hisi_sas_bytes_dmaed(..., GFP_KERNEL)
* hisi_sas_v1_hw.c:
-------------------
int_abnormal_v1_hw(): irq handler
-> hisi_sas_phy_down()
-> hisi_sas_bytes_dmaed(..., GFP_ATOMIC)
* hisi_sas_v[23]_hw.c:
----------------------
int_phy_updown_v[23]_hw(): irq handler
-> phy_down_v[23]_hw()
-> hisi_sas_phy_down()
-> hisi_sas_bytes_dmaed(..., GFP_ATOMIC)
=> int_bcast_v1_hw() and phy_bcast_v3_hw():
Both are invoked exclusively from irq handlers. Pass GFP_ATOMIC.
Link: https://lore.kernel.org/r/20210118100955.1761652-12-a.darwish@linutronix.de
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
LLDDs report events to libsas with .notify_port_event and .notify_phy_event
callbacks.
These callbacks are fixed and so there is no reason why the functions
cannot be called directly, so do that.
This neatens the code slightly, makes it more obvious, and reduces function
pointer usage, which is generally a good thing. Downside is that there are
2x more symbol exports.
[a.darwish@linutronix.de: Remove the now unused "sas_ha" local variables]
Link: https://lore.kernel.org/r/20210118100955.1761652-3-a.darwish@linutronix.de
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This series consists of the usual driver updates (ufs, qla2xxx,
smartpqi, target, zfcp, fnic, mpt3sas, ibmvfc) plus a load of
cleanups, a major power management rework and a load of assorted minor
updates. There are a few core updates (formatting fixes being the big
one) but nothing major this cycle.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCX9o0KSYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishbOZAP9D5NTN
J7dJUo2MIMy84YBu+d9ag7yLlNiRWVY2yw5vHwD/Z7JjAVLwz/tzmyjU9//o2J6w
hwhOv6Uto89gLCWSEz8=
=KUPT
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This consists of the usual driver updates (ufs, qla2xxx, smartpqi,
target, zfcp, fnic, mpt3sas, ibmvfc) plus a load of cleanups, a major
power management rework and a load of assorted minor updates.
There are a few core updates (formatting fixes being the big one) but
nothing major this cycle"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (279 commits)
scsi: mpt3sas: Update driver version to 36.100.00.00
scsi: mpt3sas: Handle trigger page after firmware update
scsi: mpt3sas: Add persistent MPI trigger page
scsi: mpt3sas: Add persistent SCSI sense trigger page
scsi: mpt3sas: Add persistent Event trigger page
scsi: mpt3sas: Add persistent Master trigger page
scsi: mpt3sas: Add persistent trigger pages support
scsi: mpt3sas: Sync time periodically between driver and firmware
scsi: qla2xxx: Update version to 10.02.00.104-k
scsi: qla2xxx: Fix device loss on 4G and older HBAs
scsi: qla2xxx: If fcport is undergoing deletion complete I/O with retry
scsi: qla2xxx: Fix the call trace for flush workqueue
scsi: qla2xxx: Fix flash update in 28XX adapters on big endian machines
scsi: qla2xxx: Handle aborts correctly for port undergoing deletion
scsi: qla2xxx: Fix N2N and NVMe connect retry failure
scsi: qla2xxx: Fix FW initialization error on big endian machines
scsi: qla2xxx: Fix crash during driver load on big endian machines
scsi: qla2xxx: Fix compilation issue in PPC systems
scsi: qla2xxx: Don't check for fw_started while posting NVMe command
scsi: qla2xxx: Tear down session if FW say it is down
...
For when managed interrupts are used (and shost->nr_hw_queues is set), a
fixed queue - set per-device - is still used for internal I/Os.
If all the CPUs mapped to that queue are offlined, then the completions for
that queue are not serviced and any internal I/Os will time out.
Fix by selecting a queue for internal I/Os from the queue mapped from the
current CPU in this scenario.
This is still not ideal as it does not deal with CPU hotplug for inflight
internal I/Os, and needs proper support from [0].
[0] https://lore.kernel.org/linux-scsi/20200703130122.111448-1-hare@suse.de/T/#m7d77d049b18f33a24ef206af69ebb66d07440556
Link: https://lore.kernel.org/r/1607347855-59091-1-git-send-email-john.garry@huawei.com
Fixes: 8d98416a55 ("scsi: hisi_sas: Switch v3 hw to MQ")
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Relocate all the debugfs code for DFX to v3 hw since no other versions
support it.
Link: https://lore.kernel.org/r/1606207594-196362-4-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix some rollbacks in function hisi_sas_v3_probe() and
interrupt_init_v3_hw().
Link: https://lore.kernel.org/r/1606207594-196362-3-git-send-email-john.garry@huawei.com
Fixes: 8d98416a55 ("scsi: hisi_sas: Switch v3 hw to MQ")
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Sometimes local functions are called indirectly from the hw driver, which
only makes the code harder to follow. Remove these.
Method .hw_init is only called from platform driver probe, which is not
relevant, so don't set this either.
Link: https://lore.kernel.org/r/1606207594-196362-2-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Both runtime_suspend_v3_hw() and runtime_resume_v3_hw() do nothing else but
invoke suspend_v3_hw() and resume_v3_hw() respectively. This is the case of
unnecessary function calls. To use those functions for runtime pm as well,
simply use UNIVERSAL_DEV_PM_OPS.
make -j$(nproc) W=1, with CONFIG_PM disabled, throws '-Wunused-function'
warning for runtime_suspend_v3_hw() and runtime_resume_v3_hw(). After
dropping those function definitions, the warning was thrown for
suspend_v3_hw() and resume_v3_hw(). Hence, mark them as '__maybe_unused'.
Link: https://lore.kernel.org/r/20201102164730.324035-15-vaibhavgupta40@gmail.com
Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Drivers using new-framework/generic-framework should not handle standard
power management operations. These operations were performed by legacy
framework through PCI helper functions like pci_save/restore_state(),
pci_set_power_state(), etc.
Drivers should not use them now.
Link: https://lore.kernel.org/r/20201102164730.324035-14-vaibhavgupta40@gmail.com
Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver calls pci_enable_wake(...., false) in hisi_sas_v3_resume(), and
there is no corresponding pci_enable_wake(...., true) in
hisi_sas_v3_suspend(). Either it should do enable-wake the device in
.suspend() or should not invoke pci_enable_wake() at all.
Concluding that this driver doesn't support enable-wake and PCI core calls
pci_enable_wake(pci_dev, PCI_D0, false) during resume, drop it from
hisi_sas_v3_resume().
Link: https://lore.kernel.org/r/20201102164730.324035-13-vaibhavgupta40@gmail.com
Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This series consists of the usual driver updates (ufs, qla2xxx, tcmu,
ibmvfc, lpfc, smartpqi, hisi_sas, qedi, qedf, mpt3sas) and minor bug
fixes. There are only three core changes: adding sense codes,
cleaning up noretry and adding an option for limitless retries.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCX4YulyYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishaZDAQCT7rwG
UEZYHgYkU9EX9ERVBQM0SW4mLrxf3g3P5ioJsAEAtkclCM4QsIOP+MIPjIa0EyUY
khu0kcrmeFR2YwA8zhw=
=4w4S
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"The usual driver updates (ufs, qla2xxx, tcmu, ibmvfc, lpfc, smartpqi,
hisi_sas, qedi, qedf, mpt3sas) and minor bug fixes.
There are only three core changes: adding sense codes, cleaning up
noretry and adding an option for limitless retries"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (226 commits)
scsi: hisi_sas: Recover PHY state according to the status before reset
scsi: hisi_sas: Filter out new PHY up events during suspend
scsi: hisi_sas: Add device link between SCSI devices and hisi_hba
scsi: hisi_sas: Add check for methods _PS0 and _PR0
scsi: hisi_sas: Add controller runtime PM support for v3 hw
scsi: hisi_sas: Switch to new framework to support suspend and resume
scsi: hisi_sas: Use hisi_hba->cq_nvecs for calling calling synchronize_irq()
scsi: qedf: Remove redundant assignment to variable 'rc'
scsi: lpfc: Remove unneeded variable 'status' in lpfc_fcp_cpu_map_store()
scsi: snic: Convert to use DEFINE_SEQ_ATTRIBUTE macro
scsi: qla4xxx: Delete unneeded variable 'status' in qla4xxx_process_ddb_changed
scsi: sun_esp: Use module_platform_driver to simplify the code
scsi: sun3x_esp: Use module_platform_driver to simplify the code
scsi: sni_53c710: Use module_platform_driver to simplify the code
scsi: qlogicpti: Use module_platform_driver to simplify the code
scsi: mac_esp: Use module_platform_driver to simplify the code
scsi: jazz_esp: Use module_platform_driver to simplify the code
scsi: mvumi: Fix error return in mvumi_io_attach()
scsi: lpfc: Drop nodelist reference on error in lpfc_gen_req()
scsi: be2iscsi: Fix a theoretical leak in beiscsi_create_eqs()
...
Runtime PM of SCSI devices is already supported in SCSI layer, we can
suspend/resume every SCSI device separately. But if there is no link
between hisi_hba and SCSI devices or SCSI targets it will cause issues if
the controller is suspended while SCSI devices are still resuming. Only
when all the SCSI devices under the controller are suspended, the
controller can be suspended. Add the device link between SCSI devices
and the controller.
Link: https://lore.kernel.org/r/1601649038-25534-6-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
To support system suspend/resume or runtime suspend/resume, need to use the
function pci_set_power_state() to change the power state which requires at
least method _PS0 or _PR0 be filled by platform for v3 hw. So check whether
the method is supported, if not, print a warning.
A Kconfig dependency is added as there is no stub for
acpi_device_power_manageable().
Link: https://lore.kernel.org/r/1601649038-25534-5-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For v3 hw we will add support for runtime PM which is only supported in new
framework. Legacy PM support and new framework are not allowed to be used
together. Switch to new framework to support suspend and resume.
Link: https://lore.kernel.org/r/1601649038-25534-3-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A call trace is observed when running function level reset with online CPUs
less than 16 and MSI auto-affinity enabled.
[16538.348038] Call trace:
[16538.348422] pci_irq_vector+0x98/0xc0
[16538.348947] disable_host_v3_hw+0x8c/0x288 [hisi_sas_v3_hw]
[16538.349706] hisi_sas_reset_prepare_v3_hw+0x60/0x88 [hisi_sas_v3_hw]
[16538.350631] pci_dev_save_and_disable+0x38/0x68
[16538.351290] pci_reset_function+0x44/0x88
[16538.351846] reset_store+0x6c/0xb8
[16538.352429] dev_attr_store+0x44/0x60
[16538.353035] sysfs_kf_write+0x58/0x80
[16538.353558] kernfs_fop_write+0x140/0x230
[16538.354175] __vfs_write+0x48/0x80
[16538.354675] vfs_write+0xb8/0x1d8
[16538.355145] ksys_write+0x74/0xf8
[16538.355615] __arm64_sys_write+0x24/0x30
[16538.356240] el0_svc_common.constprop.4+0x80/0x1f0
[16538.356905] do_el0_svc+0x2c/0x38
[16538.357408] el0_svc+0x14/0x40
[16538.357848] el0_sync_handler+0xbc/0x2ec
[16538.358388] el0_sync+0x140/0x180
The reason is that if we use pci_alloc_irq_vectors_affinity() to allocate
IRQs, the number of CQ IRQs can only be less than or equal to the number of
online CPUs, but we use hisi_hba->queue_count (always 16) to iterate during
interrupt_disable_v3_hw().
Use hisi_hba->cq_nvecs to replace hisi_hba->queue_count to avoid
synchronize IRQ on a CPU which does not exist.
Link: https://lore.kernel.org/r/1601649038-25534-2-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Now that the block layer provides a shared tag, we can switch the driver
to expose all HW queues.
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>