iSCSI Kernel Messages
See original GitHub issueGot an alert that Prometheus had a problem with compacting blocks (using an iSCSI PVC), has about 8 GiB free space still. I checked the kernel logs on the node hosting Prometheus and found the entries below. Not sure what they mean, but figured I’d document it here it case it is useful, if not we can just close the issue. There wasn’t an issue with TrueNAS, it was up and running and no outages or issues with it.
|Jul 11 09:00:11 k3s01 kernel: [343031.095444] connection2:0: detected conn error (1020) │
│Jul 11 09:00:11 k3s01 iscsid: Kernel reported iSCSI connection 2:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3) │
│Jul 11 09:00:13 k3s01 kernel: [343033.112804] sd 2:0:0:0: Power-on or device reset occurred │
│Jul 11 09:00:13 k3s01 iscsid: connection2:0 is operational after recovery (1 attempts) │
│Jul 11 09:00:18 k3s01 kernel: [343038.136438] connection2:0: detected conn error (1020) │
│Jul 11 09:00:18 k3s01 iscsid: Kernel reported iSCSI connection 2:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3) │
│Jul 11 09:00:20 k3s01 kernel: [343040.152722] sd 2:0:0:0: Power-on or device reset occurred │
│Jul 11 09:00:20 k3s01 iscsid: connection2:0 is operational after recovery (1 attempts) │
│Jul 11 09:00:25 k3s01 kernel: [343045.139682] connection2:0: detected conn error (1020) │
│Jul 11 09:00:25 k3s01 iscsid: Kernel reported iSCSI connection 2:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3) │
│Jul 11 09:00:28 k3s01 kernel: [343048.156800] sd 2:0:0:0: Power-on or device reset occurred │
│Jul 11 09:00:28 k3s01 iscsid: connection2:0 is operational after recovery (1 attempts) │
│Jul 11 09:00:33 k3s01 kernel: [343053.141647] connection2:0: detected conn error (1020) │
│Jul 11 09:00:33 k3s01 iscsid: Kernel reported iSCSI connection 2:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3) │
│Jul 11 09:00:37 k3s01 kernel: [343057.161031] sd 2:0:0:0: Power-on or device reset occurred │
│Jul 11 09:00:37 k3s01 iscsid: connection2:0 is operational after recovery (1 attempts) │
│Jul 11 09:00:42 k3s01 kernel: [343062.231074] connection2:0: detected conn error (1020) │
│Jul 11 09:00:43 k3s01 iscsid: Kernel reported iSCSI connection 2:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3) │
│Jul 11 09:00:44 k3s01 kernel: [343064.252751] sd 2:0:0:0: Power-on or device reset occurred │
│Jul 11 09:00:45 k3s01 iscsid: connection2:0 is operational after recovery (1 attempts) │
│Jul 11 09:00:47 k3s01 kernel: [343067.536233] connection1:0: detected conn error (1020) │
│Jul 11 09:00:48 k3s01 iscsid: Kernel reported iSCSI connection 1:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3) │
│Jul 11 09:00:49 k3s01 kernel: [343069.352159] connection2:0: detected conn error (1020) │
│Jul 11 09:00:49 k3s01 kernel: [343069.352284] scsi_io_completion_action: 13 callbacks suppressed │
│Jul 11 09:00:49 k3s01 kernel: [343069.352291] sd 2:0:0:0: [sdc] tag#29 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK │
│Jul 11 09:00:49 k3s01 kernel: [343069.352297] sd 2:0:0:0: [sdc] tag#29 CDB: Write(10) 2a 00 00 07 9e 5a 00 01 00 00 │
│Jul 11 09:00:49 k3s01 kernel: [343069.352298] print_req_error: 13 callbacks suppressed │
│Jul 11 09:00:49 k3s01 kernel: [343069.352302] blk_update_request: I/O error, dev sdc, sector 3994320 op 0x1:(WRITE) flags 0x4800 phys_seg 256 prio class 0 │
│Jul 11 09:00:49 k3s01 kernel: [343069.352389] sd 2:0:0:0: [sdc] tag#30 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK │
│Jul 11 09:00:49 k3s01 kernel: [343069.352392] sd 2:0:0:0: [sdc] tag#30 CDB: Write(10) 2a 00 00 07 99 5a 00 01 00 00 │
│Jul 11 09:00:49 k3s01 kernel: [343069.352394] blk_update_request: I/O error, dev sdc, sector 3984080 op 0x1:(WRITE) flags 0x4800 phys_seg 256 prio class 0 │
│Jul 11 09:00:49 k3s01 kernel: [343069.352460] sd 2:0:0:0: [sdc] tag#31 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK │
│Jul 11 09:00:49 k3s01 kernel: [343069.352462] sd 2:0:0:0: [sdc] tag#31 CDB: Write(10) 2a 00 00 07 97 5a 00 01 00 00 │
│Jul 11 09:00:49 k3s01 kernel: [343069.352464] blk_update_request: I/O error, dev sdc, sector 3979984 op 0x1:(WRITE) flags 0x4800 phys_seg 256 prio class 0 │
│Jul 11 09:00:49 k3s01 kernel: [343069.352529] sd 2:0:0:0: [sdc] tag#35 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK │
│Jul 11 09:00:49 k3s01 kernel: [343069.352531] sd 2:0:0:0: [sdc] tag#35 CDB: Write(10) 2a 00 00 07 96 5a 00 01 00 00 │
│Jul 11 09:00:49 k3s01 kernel: [343069.352533] blk_update_request: I/O error, dev sdc, sector 3977936 op 0x1:(WRITE) flags 0x4800 phys_seg 256 prio class 0 │
│Jul 11 09:00:49 k3s01 kernel: [343069.352599] sd 2:0:0:0: [sdc] tag#26 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK │
│Jul 11 09:00:49 k3s01 kernel: [343069.352601] sd 2:0:0:0: [sdc] tag#26 CDB: Write(10) 2a 00 00 07 a8 5a 00 01 00 00 │
│Jul 11 09:00:49 k3s01 kernel: [343069.352603] blk_update_request: I/O error, dev sdc, sector 4014800 op 0x1:(WRITE) flags 0x4800 phys_seg 256 prio class 0 │
│Jul 11 09:00:49 k3s01 kernel: [343069.352667] sd 2:0:0:0: [sdc] tag#36 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK │
│Jul 11 09:00:49 k3s01 kernel: [343069.352669] sd 2:0:0:0: [sdc] tag#36 CDB: Write(10) 2a 00 00 07 9b 5a 00 01 00 00 │
│Jul 11 09:00:49 k3s01 kernel: [343069.352671] blk_update_request: I/O error, dev sdc, sector 3988176 op 0x1:(WRITE) flags 0x4800 phys_seg 256 prio class 0 │
│Jul 11 09:00:49 k3s01 kernel: [343069.352736] sd 2:0:0:0: [sdc] tag#37 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK │
│Jul 11 09:00:49 k3s01 kernel: [343069.352738] sd 2:0:0:0: [sdc] tag#37 CDB: Write(10) 2a 00 00 07 9c 5a 00 01 00 00 │
│Jul 11 09:00:49 k3s01 kernel: [343069.352740] blk_update_request: I/O error, dev sdc, sector 3990224 op 0x1:(WRITE) flags 0x4800 phys_seg 256 prio class 0 │
│Jul 11 09:00:49 k3s01 kernel: [343069.352804] sd 2:0:0:0: [sdc] tag#38 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK │
│Jul 11 09:00:49 k3s01 kernel: [343069.352806] sd 2:0:0:0: [sdc] tag#38 CDB: Write(10) 2a 00 00 07 a0 5a 00 01 00 00 │
│Jul 11 09:00:49 k3s01 kernel: [343069.352807] blk_update_request: I/O error, dev sdc, sector 3998416 op 0x1:(WRITE) flags 0x4800 phys_seg 256 prio class 0 │
│Jul 11 09:00:49 k3s01 kernel: [343069.352871] sd 2:0:0:0: [sdc] tag#39 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK │
│Jul 11 09:00:49 k3s01 kernel: [343069.352873] sd 2:0:0:0: [sdc] tag#39 CDB: Write(10) 2a 00 00 07 a1 5a 00 01 00 00 │
│Jul 11 09:00:49 k3s01 kernel: [343069.352875] blk_update_request: I/O error, dev sdc, sector 4000464 op 0x1:(WRITE) flags 0x4800 phys_seg 256 prio class 0 │
│Jul 11 09:00:49 k3s01 kernel: [343069.352939] sd 2:0:0:0: [sdc] tag#28 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK │
│Jul 11 09:00:49 k3s01 kernel: [343069.352940] sd 2:0:0:0: [sdc] tag#28 CDB: Write(10) 2a 00 00 07 a9 5a 00 01 00 00
|Jul 11 09:00:49 k3s01 kernel: [343069.352942] blk_update_request: I/O error, dev sdc, sector 4016848 op 0x1:(WRITE) flags 0x4800 phys_seg 256 prio class 0 │
│Jul 11 09:00:50 k3s01 iscsid: Kernel reported iSCSI connection 2:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3) │
│Jul 11 09:00:51 k3s01 iscsid: connection1:0 is operational after recovery (1 attempts) │
│Jul 11 09:00:51 k3s01 kernel: [343071.365756] sd 2:0:0:0: Power-on or device reset occurred │
│Jul 11 09:00:52 k3s01 iscsid: connection2:0 is operational after recovery (1 attempts) │
│Jul 11 09:00:56 k3s01 kernel: [343076.423472] connection2:0: detected conn error (1020) │
│Jul 11 09:00:57 k3s01 iscsid: Kernel reported iSCSI connection 2:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3) │
│Jul 11 09:00:58 k3s01 kernel: [343078.444692] sd 2:0:0:0: Power-on or device reset occurred │
│Jul 11 09:00:58 k3s01 kernel: [343078.555498] XFS (sdc): writeback error on sector 4055304 │
│Jul 11 09:00:59 k3s01 iscsid: connection2:0 is operational after recovery (1 attempts)
Looking at the Prometheus container logs, it looks like this caused a container restart and its been fine since. Previous container log ended with:
ts=2022-07-10T03:00:24.393Z caller=head.go:1009 level=info component=tsdb msg="WAL checkpoint complete" first=1751 last=1756 duration=5.742387925s
panic: sync /prometheus/chunks_head/000548: input/output error
goroutine 13676 [running]:
github.com/prometheus/prometheus/tsdb.handleChunkWriteError({0x38910c0?, 0xc0b18f5ef0?})
/app/tsdb/head_append.go:598 +0x76
github.com/prometheus/prometheus/tsdb/chunks.(*ChunkDiskMapper).WriteChunk(0xc000be2000, 0x40d9a7?, 0x28?, 0x2cedbc0?, {0x38b3380, 0xc09fa94380}, 0x3489870)
/app/tsdb/chunks/head_chunks.go:392 +0x151
github.com/prometheus/prometheus/tsdb.(*memSeries).mmapCurrentHeadChunk(0xc016f242a0, 0x60bce7d4?)
/app/tsdb/head_append.go:587 +0x53
github.com/prometheus/prometheus/tsdb.(*memSeries).cutNewHeadChunk(0xc016f242a0, 0x181e67a9b43, 0x4070400000000000?)
/app/tsdb/head_append.go:561 +0x2a
github.com/prometheus/prometheus/tsdb.(*memSeries).append(0xc016f242a0, 0x181e67a9b43, 0x4070400000000000, 0x2074f8, 0x2385565?)
/app/tsdb/head_append.go:528 +0x1a5
github.com/prometheus/prometheus/tsdb.(*headAppender).Commit(0xc04e9ef540)
/app/tsdb/head_append.go:459 +0x612
github.com/prometheus/prometheus/tsdb.dbAppender.Commit({{0x38acd48?, 0xc04e9ef540?}, 0xc000b18820?})
/app/tsdb/db.go:870 +0x35
github.com/prometheus/prometheus/storage.(*fanoutAppender).Commit(0xc050290300)
/app/storage/fanout.go:176 +0x3f
github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport.func1()
/app/scrape/scrape.go:1271 +0x45
github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport(0xc09420ed20, {0xeda5c53d3?, 0x4faeaa0?, 0x4faeaa0?}, {0xeda5c53d3?, 0x4faeaa0?, 0x4faeaa0?}, 0x0)
/app/scrape/scrape.go:1342 +0xf70
github.com/prometheus/prometheus/scrape.(*scrapeLoop).run(0xc09420ed20, 0xc09418f830?)
/app/scrape/scrape.go:1224 +0x345
created by github.com/prometheus/prometheus/scrape.(*scrapePool).sync
/app/scrape/scrape.go:588 +0xa1e
Issue Analytics
- State:
- Created a year ago
- Comments:8 (3 by maintainers)
Top Results From Across the Web
target and iSCSI Interfaces Guide - The Linux Kernel Archives
The kernel passes SCSI commands to userspace by putting a struct tcmu_cmd_entry in the ring, updating mailbox->cmd_head, and poking userspace via UIO's ...
Read more >Why does kernel report error message "iSCSI connection 53:0 ...
Issue. Traffic read or writes to iSCSI devices are delayed for short periods of time. The following errors are seen in /var/log/messages ......
Read more >Using Tracepoints to Debug iSCSI Modules - Oracle Blogs
Oracle Linux kernel developer Fred Herard offered this blog post on how to use tracepoints with iSCSI kernel modules.
Read more >iSCSI/Boot - ArchWiki - Arch Linux
The kernel log messages may contain helpful information if accessing the target fails. Continue installing Arch on the iSCSI target as usual ...
Read more >A Quick Guide to iSCSI on Linux - Cuddletech
In syslog (/var/log/messages) we see a new target auto-discovered. kernel: scsi singledevice 1 0 2 0 kernel: Vendor: LINUX. Model: ISCSI. Rev: ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Not a single RST message logged since I disable that hourly snapshot. I’ll keep monitoring it. I don’t really need hourly snapshots, I’ll leave it disabled for now and keep the nightly. If I do enable, I’ll make it not exactly on the hour.
Thanks for the comments, it’s appreciated.
ok, makes sense… I’ll close the issue. I did see some notes on the TrueNAS side which agrees with some connectivity issues. Weird, they are both on the same switch.