실행
결함 #13005
restart 가 자주 반복되는 현상
이 헌제님이 약 2달 전에 추가함. 약 2달 전에 수정됨.
발견 버전:
반영 버전:
난이도:
쉬움
중요도:
하
발견 유형:
조력자:
회사:
연락처:
점수:
1.25
설명
개요¶
Probe 에 실패하여 restart 가 자주 반복되는 현상이 있어 이를 해결해야 함.
[root@localhost gluesys-csi-v1.0.0]# kubectl get pods -n storage-1
NAME READY STATUS RESTARTS AGE
gluesys-csi-controller-57c7d4bb95-lwjrk 1/1 Running 0 16h
gluesys-csi-dhm44 7/7 Running 22 (4h15m ago) 16h
Probe 는 단순히 vip:port 로 dial 을 3초간 전송하는데, timeout 으로 인해 실패함
다른 에러로는 삭제 실패가 있음
{"time":"2026-02-24T01:16:23.971821416Z","level":"ERROR","msg":"error during unary call","node":"localhost.localdomain","method":"/csi.v1.Controller/DeleteVolume","error":"rpc error: code = Aborted desc = Deleting LogicalVolume CRD: pvc-b312f317-93a4-4ea7-a187-78f9d3329e85"}
이것이 원인일지..
이 헌제님이 약 2달 전에 변경
- 점수 항목을 변경했습니다 (0.00 => 2.75)
- driver log
{"time":"2026-02-24T08:39:37.375048679Z","level":"INFO","msg":"Start to check TCP health","node":"localhost.localdomain"}
{"time":"2026-02-24T08:39:38.376635436Z","level":"ERROR","msg":"Storage TCP health check failed","node":"localhost.localdomain","address":"192.168.39.170:80","err":"dial tcp 192.168.39.170:80: i/o timeout"}
{"time":"2026-02-24T08:39:38.376709518Z","level":"ERROR","msg":"error during unary call","node":"localhost.localdomain","method":"/csi.v1.Identity/Probe","error":"rpc error: code = Unavailable desc = Storage backend unreachable: %!w(*net.OpError=&{dial tcp <nil> 0xc000027980 0x29a1ec0})"}
{"time":"2026-02-24T08:39:39.380506506Z","level":"INFO","msg":"Start to check TCP health","node":"localhost.localdomain"}
{"time":"2026-02-24T08:39:40.375807027Z","level":"ERROR","msg":"Storage TCP health check failed","node":"localhost.localdomain","address":"192.168.39.170:80","err":"dial tcp 192.168.39.170:80: i/o timeout"}
{"time":"2026-02-24T08:39:40.375973438Z","level":"ERROR","msg":"error during unary call","node":"localhost.localdomain","method":"/csi.v1.Identity/Probe","error":"rpc error: code = Unavailable desc = Storage backend unreachable: %!w(*net.OpError=&{dial tcp <nil> 0xc00071ad50 0x29a1ec0})"}
{"time":"2026-02-24T08:39:41.377937503Z","level":"INFO","msg":"Start to check TCP health","node":"localhost.localdomain"}
{"time":"2026-02-24T08:39:42.381448707Z","level":"ERROR","msg":"Storage TCP health check failed","node":"localhost.localdomain","address":"192.168.39.170:80","err":"dial tcp 192.168.39.170:80: i/o timeout"}
{"time":"2026-02-24T08:39:42.381552952Z","level":"ERROR","msg":"error during unary call","node":"localhost.localdomain","method":"/csi.v1.Identity/Probe","error":"rpc error: code = Unavailable desc = Storage backend unreachable: %!w(*net.OpError=&{dial tcp <nil> 0xc000710210 0x29a1ec0})"}
{"time":"2026-02-24T08:39:43.377079913Z","level":"INFO","msg":"Start to check TCP health","node":"localhost.localdomain"}
{"time":"2026-02-24T08:39:43.889092154Z","level":"WARN","msg":"Volume is already deleted","node":"localhost.localdomain","volume":"pvc-809c8ae2-06d8-44d6-97ba-db384a3dba78"}
{"time":"2026-02-24T08:39:43.892596107Z","level":"INFO","msg":"Request GetCapacity","node":"localhost.localdomain","type":"thin"}
{"time":"2026-02-24T08:39:43.938825709Z","level":"WARN","msg":"Volume is already deleted","node":"localhost.localdomain","volume":"pvc-6629f7fe-5123-4168-9e41-cd5551b84774"}
{"time":"2026-02-24T08:39:44.144849916Z","level":"WARN","msg":"Volume is already deleted","node":"localhost.localdomain","volume":"pvc-f79bdd98-3ce8-4ade-b11f-cc4a1ce4e5cb"}
{"time":"2026-02-24T08:39:44.379040199Z","level":"ERROR","msg":"Storage TCP health check failed","node":"localhost.localdomain","address":"192.168.39.170:80","err":"dial tcp 192.168.39.170:80: i/o timeout"}
{"time":"2026-02-24T08:39:44.379118543Z","level":"ERROR","msg":"error during unary call","node":"localhost.localdomain","method":"/csi.v1.Identity/Probe","error":"rpc error: code = Unavailable desc = Storage backend unreachable: %!w(*net.OpError=&{dial tcp <nil> 0xc000710630 0x29a1ec0})"}
- controller 로그
{"time":"2026-02-24T08:36:43.281957759Z","level":"ERROR","msg":"Already contains finalizer","component":"LogicalVolumeReconciler","volumeID":""}
{"time":"2026-02-24T08:36:43.282004253Z","level":"INFO","msg":"Start to delete volume","component":"LogicalVolumeReconciler","volumeID":"","LV":"pvc-4e44b540-fa5f-4073-9bcb-355f95e8b7bf"}
{"time":"2026-02-24T08:36:50.05246034Z","level":"INFO","msg":"Deleted Share","component":"LogicalVolumeReconciler","name":"pvc-4e44b540-fa5f-4073-9bcb-355f95e8b7bf"}
{"time":"2026-02-24T08:37:17.090216591Z","level":"INFO","msg":"Deleted LVM resource by template","component":"LogicalVolumeReconciler","name":"pvc-4e44b540-fa5f-4073-9bcb-355f95e8b7bf"}
{"time":"2026-02-24T08:37:28.881837717Z","level":"INFO","msg":"Deleted LV","component":"LogicalVolumeReconciler","name":"pvc-4e44b540-fa5f-4073-9bcb-355f95e8b7bf"}
{"time":"2026-02-24T08:37:28.88196448Z","level":"INFO","msg":"Successfully delete volume","component":"LogicalVolumeReconciler","volumeID":"","LV":"pvc-4e44b540-fa5f-4073-9bcb-355f95e8b7bf"}
{"time":"2026-02-24T08:37:28.897622425Z","level":"ERROR","msg":"Already contains finalizer","component":"LogicalVolumeReconciler","volumeID":""}
{"time":"2026-02-24T08:37:28.897657004Z","level":"INFO","msg":"Start to delete volume","component":"LogicalVolumeReconciler","volumeID":"","LV":"pvc-809c8ae2-06d8-44d6-97ba-db384a3dba78"}
{"time":"2026-02-24T08:37:35.145565892Z","level":"INFO","msg":"Deleted Share","component":"LogicalVolumeReconciler","name":"pvc-809c8ae2-06d8-44d6-97ba-db384a3dba78"}
{"time":"2026-02-24T08:37:58.461266426Z","level":"INFO","msg":"Deleted LVM resource by template","component":"LogicalVolumeReconciler","name":"pvc-809c8ae2-06d8-44d6-97ba-db384a3dba78"}
{"time":"2026-02-24T08:38:09.120999008Z","level":"INFO","msg":"Deleted LV","component":"LogicalVolumeReconciler","name":"pvc-809c8ae2-06d8-44d6-97ba-db384a3dba78"}
{"time":"2026-02-24T08:38:09.121044674Z","level":"INFO","msg":"Successfully delete volume","component":"LogicalVolumeReconciler","volumeID":"","LV":"pvc-809c8ae2-06d8-44d6-97ba-db384a3dba78"}
{"time":"2026-02-24T08:38:09.133255852Z","level":"ERROR","msg":"Already contains finalizer","component":"LogicalVolumeReconciler","volumeID":""}
{"time":"2026-02-24T08:38:09.133433853Z","level":"INFO","msg":"Start to delete volume","component":"LogicalVolumeReconciler","volumeID":"","LV":"pvc-6629f7fe-5123-4168-9e41-cd5551b84774"}
{"time":"2026-02-24T08:38:18.446242448Z","level":"INFO","msg":"Deleted Share","component":"LogicalVolumeReconciler","name":"pvc-6629f7fe-5123-4168-9e41-cd5551b84774"}
{"time":"2026-02-24T08:38:39.489277015Z","level":"INFO","msg":"Deleted LVM resource by template","component":"LogicalVolumeReconciler","name":"pvc-6629f7fe-5123-4168-9e41-cd5551b84774"}
{"time":"2026-02-24T08:38:48.886926308Z","level":"INFO","msg":"Deleted LV","component":"LogicalVolumeReconciler","name":"pvc-6629f7fe-5123-4168-9e41-cd5551b84774"}
{"time":"2026-02-24T08:38:48.886999147Z","level":"INFO","msg":"Successfully delete volume","component":"LogicalVolumeReconciler","volumeID":"","LV":"pvc-6629f7fe-5123-4168-9e41-cd5551b84774"}
{"time":"2026-02-24T08:38:48.902190697Z","level":"ERROR","msg":"Already contains finalizer","component":"LogicalVolumeReconciler","volumeID":""}
{"time":"2026-02-24T08:38:48.90222863Z","level":"INFO","msg":"Start to delete volume","component":"LogicalVolumeReconciler","volumeID":"","LV":"pvc-f79bdd98-3ce8-4ade-b11f-cc4a1ce4e5cb"}
{"time":"2026-02-24T08:38:54.095111147Z","level":"INFO","msg":"Deleted Share","component":"LogicalVolumeReconciler","name":"pvc-f79bdd98-3ce8-4ade-b11f-cc4a1ce4e5cb"}
{"time":"2026-02-24T08:39:14.572487413Z","level":"INFO","msg":"Deleted LVM resource by template","component":"LogicalVolumeReconciler","name":"pvc-f79bdd98-3ce8-4ade-b11f-cc4a1ce4e5cb"}
{"time":"2026-02-24T08:39:25.10810683Z","level":"INFO","msg":"Deleted LV","component":"LogicalVolumeReconciler","name":"pvc-f79bdd98-3ce8-4ade-b11f-cc4a1ce4e5cb"}
{"time":"2026-02-24T08:39:25.108147166Z","level":"INFO","msg":"Successfully delete volume","component":"LogicalVolumeReconciler","volumeID":"","LV":"pvc-f79bdd98-3ce8-4ade-b11f-cc4a1ce4e5cb"}
{"time":"2026-02-24T08:39:25.121220614Z","level":"ERROR","msg":"Already contains finalizer","component":"LogicalVolumeReconciler","volumeID":""}
{"time":"2026-02-24T08:39:25.121259074Z","level":"INFO","msg":"Start to delete volume","component":"LogicalVolumeReconciler","volumeID":"","LV":"pvc-1f0616e6-cbc3-4cb0-a414-53a46c58835c"}
{"time":"2026-02-24T08:39:30.152526687Z","level":"INFO","msg":"Deleted Share","component":"LogicalVolumeReconciler","name":"pvc-1f0616e6-cbc3-4cb0-a414-53a46c58835c"}
{"time":"2026-02-24T08:40:07.398247797Z","level":"INFO","msg":"Deleted LVM resource by template","component":"LogicalVolumeReconciler","name":"pvc-1f0616e6-cbc3-4cb0-a414-53a46c58835c"}
{"time":"2026-02-24T08:40:16.223360332Z","level":"INFO","msg":"Deleted LV","component":"LogicalVolumeReconciler","name":"pvc-1f0616e6-cbc3-4cb0-a414-53a46c58835c"}
{"time":"2026-02-24T08:40:16.223401845Z","level":"INFO","msg":"Successfully delete volume","component":"LogicalVolumeReconciler","volumeID":"","LV":"pvc-1f0616e6-cbc3-4cb0-a414-53a46c58835c"}
- restart 현상 재현됌
이 헌제님이 약 2달 전에 변경
Resource Group: VIP-group-1
vip_192.168.39.170 (ocf::heartbeat:IPaddr2): Stopped
rsc_VG1 (ocf::heartbeat:LVM): Started ASE333-1
rsc_VG1_pvc-a42b00c1-fe5c-42e3-93aa-3690f976512a (ocf::heartbeat:Filesystem): Started ASE333-1
share_VG1_pvc-a42b00c1-fe5c-42e3-93aa-3690f976512a (ocf::anystor-e:ShareCtl): Stopped (disabled)
가장 마지막 ShareCtl 을 disable 하는 경우 vip 가 stop 되는 경우가 있음
이 헌제님이 약 2달 전에 변경
관련 로그 (시간 순서)¶
Feb 25 13:24:49 ASE333-1 pacemaker-schedulerd[2753] (common_print) info: share_VG1_pvc-a42b00c1-fe5c-42e3-93aa-3690f976512a (ocf::anystor-e:ShareCtl): Started ASE333-1 Feb 25 13:24:58 ASE333-1 pacemaker-based [2746] (cib_perform_op) info: ++ /cib/configuration/resources/primitive[@id='share_VG1_pvc-a42b00c1-fe5c-42e3-93aa-3690f976512a']/meta_attributes[@id='share_VG1_pvc-a42b00c1-fe5c-42e3-93aa-3690f976512a-meta_attributes']: <nvpair id="share_VG1_pvc-a42b00c1-fe5c-42e3-93aa-3690f976512a-meta_attributes-target-role" name="target-role" value="Stopped"/> Feb 25 13:24:58 ASE333-1 pacemaker-controld [2754] (abort_transition_graph) info: Transition 4561 aborted by share_VG1_pvc-a42b00c1-fe5c-42e3-93aa-3690f976512a-meta_attributes-target-role doing create target-role=Stopped: Configuration change | cib=0.2864.0 Feb 25 13:24:58 ASE333-1 pacemaker-schedulerd[2753] (common_print) info: share_VG1_pvc-a42b00c1-fe5c-42e3-93aa-3690f976512a (ocf::anystor-e:ShareCtl): Started ASE333-1 (disabled) Feb 25 13:24:58 ASE333-1 pacemaker-schedulerd[2753] (native_color) info: Resource share_VG1_pvc-a42b00c1-fe5c-42e3-93aa-3690f976512a cannot run anywhere Feb 25 13:24:58 ASE333-1 pacemaker-schedulerd[2753] (LogAction) notice: * Stop vip_192.168.39.170 ( ASE333-1 ) due to unrunnable one-or-more:order_set_vip_192.168.39.170_set Feb 25 13:24:58 ASE333-1 pacemaker-controld [2754] (te_rsc_command) notice: Initiating stop operation vip_192.168.39.170_stop_0 locally on ASE333-1 | action 45 Feb 25 13:24:58 ASE333-1 pacemaker-execd [2751] (log_execute) info: executing - rsc:vip_192.168.39.170 action:stop call_id:9826 Feb 25 13:24:58 ASE333-1 pacemaker-execd [2751] (log_finished) info: finished - rsc:vip_192.168.39.170 action:stop call_id=9826 pid:26821 exit-code:0 exec-time:51ms queue-time:0ms Feb 25 13:24:58 ASE333-1 pacemaker-controld [2754] (process_lrm_event) notice: Result of stop operation for vip_192.168.39.170 on ASE333-1: 0 (ok) | call=9826 key=vip_192.168.39.170_stop_0 confirmed=true cib-update=19271
원인 요약¶
share_VG1_pvc-a42b00c1-fe5c-42e3-93aa-3690f976512a리소스의target-role이Stopped로 설정됨- 해당 share 리소스가 실행할 수 없는 상태가 됨 (cannot run anywhere)
order_set_vip_192.168.39.170_setconstraint로 인해 vip도 함께 stop됨- vip stop 작업 성공 (exit-code:0)
실행