question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

iSCSI Kernel Messages

See original GitHub issue

Got an alert that Prometheus had a problem with compacting blocks (using an iSCSI PVC), has about 8 GiB free space still. I checked the kernel logs on the node hosting Prometheus and found the entries below. Not sure what they mean, but figured I’d document it here it case it is useful, if not we can just close the issue. There wasn’t an issue with TrueNAS, it was up and running and no outages or issues with it.

|Jul 11 09:00:11 k3s01 kernel: [343031.095444]  connection2:0: detected conn error (1020)                                                                                │
│Jul 11 09:00:11 k3s01 iscsid: Kernel reported iSCSI connection 2:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)                             │
│Jul 11 09:00:13 k3s01 kernel: [343033.112804] sd 2:0:0:0: Power-on or device reset occurred                                                                             │
│Jul 11 09:00:13 k3s01 iscsid: connection2:0 is operational after recovery (1 attempts)                                                                                  │
│Jul 11 09:00:18 k3s01 kernel: [343038.136438]  connection2:0: detected conn error (1020)                                                                                │
│Jul 11 09:00:18 k3s01 iscsid: Kernel reported iSCSI connection 2:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)                             │
│Jul 11 09:00:20 k3s01 kernel: [343040.152722] sd 2:0:0:0: Power-on or device reset occurred                                                                             │
│Jul 11 09:00:20 k3s01 iscsid: connection2:0 is operational after recovery (1 attempts)                                                                                  │
│Jul 11 09:00:25 k3s01 kernel: [343045.139682]  connection2:0: detected conn error (1020)                                                                                │
│Jul 11 09:00:25 k3s01 iscsid: Kernel reported iSCSI connection 2:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)                             │
│Jul 11 09:00:28 k3s01 kernel: [343048.156800] sd 2:0:0:0: Power-on or device reset occurred                                                                             │
│Jul 11 09:00:28 k3s01 iscsid: connection2:0 is operational after recovery (1 attempts)                                                                                  │
│Jul 11 09:00:33 k3s01 kernel: [343053.141647]  connection2:0: detected conn error (1020)                                                                                │
│Jul 11 09:00:33 k3s01 iscsid: Kernel reported iSCSI connection 2:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)                             │
│Jul 11 09:00:37 k3s01 kernel: [343057.161031] sd 2:0:0:0: Power-on or device reset occurred                                                                             │
│Jul 11 09:00:37 k3s01 iscsid: connection2:0 is operational after recovery (1 attempts)                                                                                  │
│Jul 11 09:00:42 k3s01 kernel: [343062.231074]  connection2:0: detected conn error (1020)                                                                                │
│Jul 11 09:00:43 k3s01 iscsid: Kernel reported iSCSI connection 2:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)                             │
│Jul 11 09:00:44 k3s01 kernel: [343064.252751] sd 2:0:0:0: Power-on or device reset occurred                                                                             │
│Jul 11 09:00:45 k3s01 iscsid: connection2:0 is operational after recovery (1 attempts)                                                                                  │
│Jul 11 09:00:47 k3s01 kernel: [343067.536233]  connection1:0: detected conn error (1020)                                                                                │
│Jul 11 09:00:48 k3s01 iscsid: Kernel reported iSCSI connection 1:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)                             │
│Jul 11 09:00:49 k3s01 kernel: [343069.352159]  connection2:0: detected conn error (1020)                                                                                │
│Jul 11 09:00:49 k3s01 kernel: [343069.352284] scsi_io_completion_action: 13 callbacks suppressed                                                                        │
│Jul 11 09:00:49 k3s01 kernel: [343069.352291] sd 2:0:0:0: [sdc] tag#29 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK                             │
│Jul 11 09:00:49 k3s01 kernel: [343069.352297] sd 2:0:0:0: [sdc] tag#29 CDB: Write(10) 2a 00 00 07 9e 5a 00 01 00 00                                                     │
│Jul 11 09:00:49 k3s01 kernel: [343069.352298] print_req_error: 13 callbacks suppressed                                                                                  │
│Jul 11 09:00:49 k3s01 kernel: [343069.352302] blk_update_request: I/O error, dev sdc, sector 3994320 op 0x1:(WRITE) flags 0x4800 phys_seg 256 prio class 0              │
│Jul 11 09:00:49 k3s01 kernel: [343069.352389] sd 2:0:0:0: [sdc] tag#30 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK                             │
│Jul 11 09:00:49 k3s01 kernel: [343069.352392] sd 2:0:0:0: [sdc] tag#30 CDB: Write(10) 2a 00 00 07 99 5a 00 01 00 00                                                     │
│Jul 11 09:00:49 k3s01 kernel: [343069.352394] blk_update_request: I/O error, dev sdc, sector 3984080 op 0x1:(WRITE) flags 0x4800 phys_seg 256 prio class 0              │
│Jul 11 09:00:49 k3s01 kernel: [343069.352460] sd 2:0:0:0: [sdc] tag#31 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK                             │
│Jul 11 09:00:49 k3s01 kernel: [343069.352462] sd 2:0:0:0: [sdc] tag#31 CDB: Write(10) 2a 00 00 07 97 5a 00 01 00 00                                                     │
│Jul 11 09:00:49 k3s01 kernel: [343069.352464] blk_update_request: I/O error, dev sdc, sector 3979984 op 0x1:(WRITE) flags 0x4800 phys_seg 256 prio class 0              │
│Jul 11 09:00:49 k3s01 kernel: [343069.352529] sd 2:0:0:0: [sdc] tag#35 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK                             │
│Jul 11 09:00:49 k3s01 kernel: [343069.352531] sd 2:0:0:0: [sdc] tag#35 CDB: Write(10) 2a 00 00 07 96 5a 00 01 00 00                                                     │
│Jul 11 09:00:49 k3s01 kernel: [343069.352533] blk_update_request: I/O error, dev sdc, sector 3977936 op 0x1:(WRITE) flags 0x4800 phys_seg 256 prio class 0              │
│Jul 11 09:00:49 k3s01 kernel: [343069.352599] sd 2:0:0:0: [sdc] tag#26 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK                             │
│Jul 11 09:00:49 k3s01 kernel: [343069.352601] sd 2:0:0:0: [sdc] tag#26 CDB: Write(10) 2a 00 00 07 a8 5a 00 01 00 00                                                     │
│Jul 11 09:00:49 k3s01 kernel: [343069.352603] blk_update_request: I/O error, dev sdc, sector 4014800 op 0x1:(WRITE) flags 0x4800 phys_seg 256 prio class 0              │
│Jul 11 09:00:49 k3s01 kernel: [343069.352667] sd 2:0:0:0: [sdc] tag#36 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK                             │
│Jul 11 09:00:49 k3s01 kernel: [343069.352669] sd 2:0:0:0: [sdc] tag#36 CDB: Write(10) 2a 00 00 07 9b 5a 00 01 00 00                                                     │
│Jul 11 09:00:49 k3s01 kernel: [343069.352671] blk_update_request: I/O error, dev sdc, sector 3988176 op 0x1:(WRITE) flags 0x4800 phys_seg 256 prio class 0              │
│Jul 11 09:00:49 k3s01 kernel: [343069.352736] sd 2:0:0:0: [sdc] tag#37 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK                             │
│Jul 11 09:00:49 k3s01 kernel: [343069.352738] sd 2:0:0:0: [sdc] tag#37 CDB: Write(10) 2a 00 00 07 9c 5a 00 01 00 00                                                     │
│Jul 11 09:00:49 k3s01 kernel: [343069.352740] blk_update_request: I/O error, dev sdc, sector 3990224 op 0x1:(WRITE) flags 0x4800 phys_seg 256 prio class 0              │
│Jul 11 09:00:49 k3s01 kernel: [343069.352804] sd 2:0:0:0: [sdc] tag#38 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK                             │
│Jul 11 09:00:49 k3s01 kernel: [343069.352806] sd 2:0:0:0: [sdc] tag#38 CDB: Write(10) 2a 00 00 07 a0 5a 00 01 00 00                                                     │
│Jul 11 09:00:49 k3s01 kernel: [343069.352807] blk_update_request: I/O error, dev sdc, sector 3998416 op 0x1:(WRITE) flags 0x4800 phys_seg 256 prio class 0              │
│Jul 11 09:00:49 k3s01 kernel: [343069.352871] sd 2:0:0:0: [sdc] tag#39 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK                             │
│Jul 11 09:00:49 k3s01 kernel: [343069.352873] sd 2:0:0:0: [sdc] tag#39 CDB: Write(10) 2a 00 00 07 a1 5a 00 01 00 00                                                     │
│Jul 11 09:00:49 k3s01 kernel: [343069.352875] blk_update_request: I/O error, dev sdc, sector 4000464 op 0x1:(WRITE) flags 0x4800 phys_seg 256 prio class 0              │
│Jul 11 09:00:49 k3s01 kernel: [343069.352939] sd 2:0:0:0: [sdc] tag#28 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK                             │
│Jul 11 09:00:49 k3s01 kernel: [343069.352940] sd 2:0:0:0: [sdc] tag#28 CDB: Write(10) 2a 00 00 07 a9 5a 00 01 00 00
|Jul 11 09:00:49 k3s01 kernel: [343069.352942] blk_update_request: I/O error, dev sdc, sector 4016848 op 0x1:(WRITE) flags 0x4800 phys_seg 256 prio class 0              │
│Jul 11 09:00:50 k3s01 iscsid: Kernel reported iSCSI connection 2:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)                             │
│Jul 11 09:00:51 k3s01 iscsid: connection1:0 is operational after recovery (1 attempts)                                                                                  │
│Jul 11 09:00:51 k3s01 kernel: [343071.365756] sd 2:0:0:0: Power-on or device reset occurred                                                                             │
│Jul 11 09:00:52 k3s01 iscsid: connection2:0 is operational after recovery (1 attempts)                                                                                  │
│Jul 11 09:00:56 k3s01 kernel: [343076.423472]  connection2:0: detected conn error (1020)                                                                                │
│Jul 11 09:00:57 k3s01 iscsid: Kernel reported iSCSI connection 2:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)                             │
│Jul 11 09:00:58 k3s01 kernel: [343078.444692] sd 2:0:0:0: Power-on or device reset occurred                                                                             │
│Jul 11 09:00:58 k3s01 kernel: [343078.555498] XFS (sdc): writeback error on sector 4055304                                                                              │
│Jul 11 09:00:59 k3s01 iscsid: connection2:0 is operational after recovery (1 attempts)

Looking at the Prometheus container logs, it looks like this caused a container restart and its been fine since. Previous container log ended with:

ts=2022-07-10T03:00:24.393Z caller=head.go:1009 level=info component=tsdb msg="WAL checkpoint complete" first=1751 last=1756 duration=5.742387925s
panic: sync /prometheus/chunks_head/000548: input/output error

goroutine 13676 [running]:
github.com/prometheus/prometheus/tsdb.handleChunkWriteError({0x38910c0?, 0xc0b18f5ef0?})
	/app/tsdb/head_append.go:598 +0x76
github.com/prometheus/prometheus/tsdb/chunks.(*ChunkDiskMapper).WriteChunk(0xc000be2000, 0x40d9a7?, 0x28?, 0x2cedbc0?, {0x38b3380, 0xc09fa94380}, 0x3489870)
	/app/tsdb/chunks/head_chunks.go:392 +0x151
github.com/prometheus/prometheus/tsdb.(*memSeries).mmapCurrentHeadChunk(0xc016f242a0, 0x60bce7d4?)
	/app/tsdb/head_append.go:587 +0x53
github.com/prometheus/prometheus/tsdb.(*memSeries).cutNewHeadChunk(0xc016f242a0, 0x181e67a9b43, 0x4070400000000000?)
	/app/tsdb/head_append.go:561 +0x2a
github.com/prometheus/prometheus/tsdb.(*memSeries).append(0xc016f242a0, 0x181e67a9b43, 0x4070400000000000, 0x2074f8, 0x2385565?)
	/app/tsdb/head_append.go:528 +0x1a5
github.com/prometheus/prometheus/tsdb.(*headAppender).Commit(0xc04e9ef540)
	/app/tsdb/head_append.go:459 +0x612
github.com/prometheus/prometheus/tsdb.dbAppender.Commit({{0x38acd48?, 0xc04e9ef540?}, 0xc000b18820?})
	/app/tsdb/db.go:870 +0x35
github.com/prometheus/prometheus/storage.(*fanoutAppender).Commit(0xc050290300)
	/app/storage/fanout.go:176 +0x3f
github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport.func1()
	/app/scrape/scrape.go:1271 +0x45
github.com/prometheus/prometheus/scrape.(*scrapeLoop).scrapeAndReport(0xc09420ed20, {0xeda5c53d3?, 0x4faeaa0?, 0x4faeaa0?}, {0xeda5c53d3?, 0x4faeaa0?, 0x4faeaa0?}, 0x0)
	/app/scrape/scrape.go:1342 +0xf70
github.com/prometheus/prometheus/scrape.(*scrapeLoop).run(0xc09420ed20, 0xc09418f830?)
	/app/scrape/scrape.go:1224 +0x345
created by github.com/prometheus/prometheus/scrape.(*scrapePool).sync
	/app/scrape/scrape.go:588 +0xa1e

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
reeflandcommented, Jul 11, 2022

Not a single RST message logged since I disable that hourly snapshot. I’ll keep monitoring it. I don’t really need hourly snapshots, I’ll leave it disabled for now and keep the nightly. If I do enable, I’ll make it not exactly on the hour.

Thanks for the comments, it’s appreciated.

1reaction
reeflandcommented, Jul 11, 2022

ok, makes sense… I’ll close the issue. I did see some notes on the TrueNAS side which agrees with some connectivity issues. Weird, they are both on the same switch.

Jul 11 09:00:11 truenas WARNING: 192.168.10.215 (iqn.1993-08.org.debian:01:bddd1e9c267): no ping reply (NOP-Out) after 5 seconds; dropping connection
Jul 11 09:00:11 truenas WARNING: 192.168.10.215 (iqn.1993-08.org.debian:01:bddd1e9c267): waiting for CTL to terminate 15 tasks
Jul 11 09:00:11 truenas WARNING: 192.168.10.215 (iqn.1993-08.org.debian:01:bddd1e9c267): tasks terminated
Jul 11 09:00:18 truenas WARNING: 192.168.10.215 (iqn.1993-08.org.debian:01:bddd1e9c267): no ping reply (NOP-Out) after 5 seconds; dropping connection
Jul 11 09:00:18 truenas WARNING: 192.168.10.215 (iqn.1993-08.org.debian:01:bddd1e9c267): waiting for CTL to terminate 17 tasks
Jul 11 09:00:18 truenas WARNING: 192.168.10.215 (iqn.1993-08.org.debian:01:bddd1e9c267): tasks terminated
Jul 11 09:00:19 truenas kernel: Limiting open port RST response from 314 to 200 packets/sec
Jul 11 09:00:25 truenas kernel[2136]: Last message 'Limiting open port R' repeated 1 times, suppressed by syslog-ng on truenas
Jul 11 09:00:25 truenas WARNING: 192.168.10.215 (iqn.1993-08.org.debian:01:bddd1e9c267): no ping reply (NOP-Out) after 5 seconds; dropping connection
Jul 11 09:00:25 truenas WARNING: 192.168.10.215 (iqn.1993-08.org.debian:01:bddd1e9c267): waiting for CTL to terminate 19 tasks
Jul 11 09:00:25 truenas WARNING: 192.168.10.215 (iqn.1993-08.org.debian:01:bddd1e9c267): tasks terminated
Jul 11 09:00:27 truenas kernel: Limiting open port RST response from 333 to 200 packets/sec
Jul 11 09:00:33 truenas kernel[2136]: Last message 'Limiting open port R' repeated 1 times, suppressed by syslog-ng on truenas
Jul 11 09:00:33 truenas WARNING: 192.168.10.215 (iqn.1993-08.org.debian:01:bddd1e9c267): no ping reply (NOP-Out) after 5 seconds; dropping connection
Jul 11 09:00:33 truenas WARNING: 192.168.10.215 (iqn.1993-08.org.debian:01:bddd1e9c267): waiting for CTL to terminate 21 tasks
Jul 11 09:00:33 truenas WARNING: 192.168.10.215 (iqn.1993-08.org.debian:01:bddd1e9c267): tasks terminated
Jul 11 09:00:36 truenas kernel: Limiting open port RST response from 240 to 200 packets/sec
Jul 11 09:00:42 truenas kernel[2136]: Last message 'Limiting open port R' repeated 1 times, suppressed by syslog-ng on truenas
Jul 11 09:00:42 truenas WARNING: 192.168.10.215 (iqn.1993-08.org.debian:01:bddd1e9c267): no ping reply (NOP-Out) after 5 seconds; dropping connection
Jul 11 09:00:42 truenas WARNING: 192.168.10.215 (iqn.1993-08.org.debian:01:bddd1e9c267): waiting for CTL to terminate 22 tasks
Jul 11 09:00:42 truenas WARNING: 192.168.10.215 (iqn.1993-08.org.debian:01:bddd1e9c267): tasks terminated
Jul 11 09:00:44 truenas kernel: Limiting open port RST response from 279 to 200 packets/sec
Read more comments on GitHub >

github_iconTop Results From Across the Web

target and iSCSI Interfaces Guide - The Linux Kernel Archives
The kernel passes SCSI commands to userspace by putting a struct tcmu_cmd_entry in the ring, updating mailbox->cmd_head, and poking userspace via UIO's ...
Read more >
Why does kernel report error message "iSCSI connection 53:0 ...
Issue. Traffic read or writes to iSCSI devices are delayed for short periods of time. The following errors are seen in /var/log/messages ......
Read more >
Using Tracepoints to Debug iSCSI Modules - Oracle Blogs
Oracle Linux kernel developer Fred Herard offered this blog post on how to use tracepoints with iSCSI kernel modules.
Read more >
iSCSI/Boot - ArchWiki - Arch Linux
The kernel log messages may contain helpful information if accessing the target fails. Continue installing Arch on the iSCSI target as usual ...
Read more >
A Quick Guide to iSCSI on Linux - Cuddletech
In syslog (/var/log/messages) we see a new target auto-discovered. kernel: scsi singledevice 1 0 2 0 kernel: Vendor: LINUX. Model: ISCSI. Rev: ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found