scsi: mpt3sas: Remove usage of device_busy counter
Remove usage of device_busy counter from driver. Instead of device_busy
counter now driver uses 'nr_active' counter of request_queue to get the
number of inflight request for a LUN.
Link: https://<email address hidden>
Signed-off-by: Sreekanth Reddy <email address hidden>
Signed-off-by: Martin K. Petersen <email address hidden>
(cherry picked from commit c50ed99cd56ee725d9e14dffec8e8f1641b8ca30)
Signed-off-by: Michael Reed <email address hidden>
scsi: mpt3sas: Print function name in which cmd timed out
Print the function name in which MPT command got timed out. This will
facilitate debugging in which path corresponding MPT command got timeout in
first failure instance of log itself.
Link: https://<email address hidden>
Signed-off-by: Sreekanth Reddy <email address hidden>
Signed-off-by: Martin K. Petersen <email address hidden>
(cherry picked from commit c6bdb6a10892d1130638a5e28d1523a813e45d5e)
Signed-off-by: Michael Reed <email address hidden>
This improves mpt3sas driver default debug information collection and
allows for a higher percentage of issues being able to be resolved with a
first-time data capture. However, this improvement to balance the amount
of debug data captured with the performance of driver.
Enabled below print messages with out affecting the IO performance,
1. When task abort TM is received then print IO commands's timeout value
and how much time this command has been outstanding.
2. Whenever hard reset occurs then print from where this hard reset has
been issued.
3. Failure message should be displayed for failure scenarios without any
logging level.
4. Added a print after driver successfully register or unregistered a
target drive with the SML. This print will be useful for debugging the
issue where the drive addition or deletion is hanging at SML.
5. During driver load time print request, reply, sense and config page
pool's information such as its address, length and size. Also printed
sg_tablesize information.
Link: https://<email address hidden>
Signed-off-by: Sreekanth Reddy <email address hidden>
Signed-off-by: Martin K. Petersen <email address hidden>
(cherry picked from commit 5b061980e362820894d7d884370b37005bed23ec)
Signed-off-by: Michael Reed <email address hidden>
scsi: mpt3sas: Handle CoreDump state from watchdog thread
Watchdog thread polls for IOC state every 1 second. If it detects that IOC
state is in CoreDump state then it immediately stops the IOs and also
clears the outstanding commands issued to the HBA firmware and then it will
poll for IOC state to be out of CoreDump state and once it detects that IOC
state is changed from CoreDump state to Fault state (or) CoreDumpTOSec
number of seconds are elapsed then it will issue host reset operation and
moves the IOC state to Operational state and resumes the IOs.
Whenever any TM is received from SML then if driver detects the IOC state
is in CoreDump state then it will wait for CoreDump state to be cleared and
will host reset operation.
Link: https://<email address hidden>
Signed-off-by: Sreekanth Reddy <email address hidden>
Signed-off-by: Martin K. Petersen <email address hidden>
(cherry picked from commit fce0aa08792b3ae725395fa25d44507dee0b603b)
Signed-off-by: Michael Reed <email address hidden>
scsi: mpt3sas: Add support IOCs new state named COREDUMP
New feature is added in HBA firmware where it copies the collected firmware
logs in flash region named 'CoreDump' whenever HBA firmware faults occur.
For copying the logs to CoreDump flash region firmware needs some time and
hence it has introduced a new IOC state named "CoreDump" State.
Whenever driver detects the CoreDump state then it means that some firmware
fault has occurred and firmware is copying the logs to the coredump flash
region. During this time driver should not perform any operation with the
HBA, driver should wait for HBA firmware to move the IOC state from
'CoreDump' state to 'Fault' state once it's done with copying the logs to
coredump region. Once driver detects the Fault state then it will issue the
diag reset/host reset operation to move the IOC state from Fault to
Operational state.
Here the valid IOC state transactions w.r.t to this CoreDump state feature,
Operational -> Fault:
The IOC transitions to the Fault state when an operational error occurs AND
CoreDump is not supported (or disabled) by the firmware(FW).
Operational -> CoreDump:
The IOC transitions to the CoreDump state when an operational error occurs
AND CoreDump is supported & enabled by the FW.
CoreDump -> Fault:
A transition from CoreDump state to Fault state happens when the FW
completes the CoreDump collection.
CoreDump -> Reset:
A transition out of the CoreDump state happens when the host sets the Reset
Adapter bit in the System Diagnostic Register (Hard Reset). This reset
action indicates that CoreDump took longer than the host time out.
Firmware informs the driver about the maximum time that driver has to wait
for firmware to transition the IOC state from 'CoreDump' to 'FAULT' state
through 'CoreDumpTOSec' field of ManufacturingPage11 page. if this
'CoreDumpTOSec' field value is zero then driver will wait for max 15
seconds.
Driver informs the HBA firmware that it supports this new IOC state named
'CoreDump' state by enabling COREDUMP_ENABLE flag in ConfigurationFlags
field of ioc init request message.
Current patch handles the CoreDump state only during HBA initialization and
release scenarios where watchdog thread (which polls the IOC state in every
one second) is disabled. Next subsequent patch handle the CoreDump state
when watchdog thread is enabled.
During HBA initialization or release execution time if driver detects the
CoreDump state then driver will wait for maximum CoreDumpTOSec value
seconds for FW to copy the logs. After that it will issue the diag reset
operation to move the IOC state to Operational state.
Link: https://<email address hidden>
Signed-off-by: Sreekanth Reddy <email address hidden>
Signed-off-by: Martin K. Petersen <email address hidden>
(cherry picked from commit e8c2307e6a690db9aaff84153b2857c5c4dfd969)
Signed-off-by: Michael Reed <email address hidden>
scsi: mpt3sas: renamed _base_after_reset_handler function
Renamed _base_after_reset_handler function to
_base_clear_outstanding_commands so that it can be used in multiple
scenarios with suitable name which matches with the operation it does.
Also renamed its child functions. No functional changes.
Link: https://<email address hidden>
Signed-off-by: Sreekanth Reddy <email address hidden>
Signed-off-by: Martin K. Petersen <email address hidden>
(cherry picked from commit 36c6c7f75b0998f5a4b5c79cbb94ee1ab4ee35c0)
Signed-off-by: Michael Reed <email address hidden>
Introduce function _scsih_nvme_shutdown() to issue IO Unit Control message
to IOC firmware with operation code 'shutdown'. This causes IOC firmware to
issue NVMe shutdown commands to all NVMe drives attached to it.
NVMe Shutdown:
NVMe devices need to have a specific shutdown sequence performed before
power is removed. For this, the IOC firmware needs to be notified when the
system is being shutdown. So during the system shutdown time, driver issues
an IO Unit Control request with operation code MPI26_CTRL_OP_SHUTDOWN to
inform firmware that a shutdown is initiated.
This shutdown command is issued only if NVMe devices are attached to the
controller.
During each NVMe device addition, driver reads pcie device page2 to get
shutdown latency (e.g. drive's RTD3 Entry Latency) and updates the max
latency value among the added NVMe drives in ioc->max_shutdown_latency.
This is used as the timeout value for IO Unit Control command at the time
of shutdown.
When a NVMe drive is removed and its shutdown latency matches which
ioc->max_shutdown_latency then ioc->max_shutdown_latency is updated to next
max value (by iterating over the list of available devices). If the
shutdown latency is 0, then default timeout is set to six seconds.
Link: https://<email address hidden>
Signed-off-by: Sreekanth Reddy <email address hidden>
Signed-off-by: Martin K. Petersen <email address hidden>
(cherry picked from commit d3f623ae8e0323ca434ee9029100312a8be37773)
Signed-off-by: Michael Reed <email address hidden>