question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] Resource cleanup on deleting automated rule is broken

See original GitHub issue

Current Behavior:

Currently, deleting an automated rule with clean up does not actually prune the resources that it creates (i.e. deleting active recording).

Expected Behavior:

Active recording on target created by this automated rule should be deleted if the rule is deleted.

Screenshots

bug-on-resource-deletion

Steps To Reproduce:

  1. Start smoketest on main.
  2. Create an automated rule that matches a target (For example, es.andrewazor.demo.Main 9093).
      {
        "name": "testing_9093",
        "description": "",
        "matchExpression": "target.annotations.cryostat[PORT] == 9093",
        "eventSpecifier": "template=Profiling,type=TARGET",
        "archivalPeriodSeconds": 0,
        "initialDelaySeconds": 0,
        "preservedArchives": 0,
        "maxAgeSeconds": 0,
        "maxSizeBytes": 0,
        "enabled": true
      }
    
  3. Now, delete it with Clean is checked.
  4. Go to Recording tab.
  5. Notice the active recording created by this rule is stilled there, even after reloading.

Anything else:

Seeing this in logs when deleting a rule:

INFO: 10.0.2.100 - - [Thu, 15 Dec 2022 01:34:39 GMT] 1ms "DELETE /api/v2/rules/testing_9093?clean=true HTTP/1.1" 200 73 bytes "http://localhost:9000/" "Mozilla/5.0 (X11; Linux x86_64; rv:107.0) Gecko/20100101 Firefox/107.0"
Hibernate: 
    /* select
        generatedAlias0 
    from
        PluginInfo as generatedAlias0 */ select
            plugininfo0_.id as id1_0_,
            plugininfo0_.callback as callback2_0_,
            plugininfo0_.realm as realm3_0_,
            plugininfo0_.subtree as subtree4_0_ 
        from
            PluginInfo plugininfo0_
Hibernate: 
    /* select
        generatedAlias0 
    from
        PluginInfo as generatedAlias0 */ select
            plugininfo0_.id as id1_0_,
            plugininfo0_.callback as callback2_0_,
            plugininfo0_.realm as realm3_0_,
            plugininfo0_.subtree as subtree4_0_ 
        from
            PluginInfo plugininfo0_
Dec 15, 2022 1:34:39 AM io.cryostat.core.log.Logger error
SEVERE: Exception thrown
java.lang.NullPointerException
	at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1693)
	at io.cryostat.net.TargetConnectionManager.executeConnectedTask(TargetConnectionManager.java:161)
	at io.cryostat.recordings.RecordingTargetHelper.stopRecording(RecordingTargetHelper.java:295)
	at io.cryostat.recordings.RecordingTargetHelper.stopRecording(RecordingTargetHelper.java:288)
	at io.cryostat.net.web.http.api.v2.RuleDeleteHandler.lambda$1(RuleDeleteHandler.java:170)
	at io.vertx.core.impl.ContextBase.lambda$null$0(ContextBase.java:137)
	at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:264)
	at io.vertx.core.impl.ContextBase.lambda$executeBlocking$1(ContextBase.java:135)
	at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:833)

Issue Analytics

  • State:closed
  • Created 9 months ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
andrewazorescommented, Dec 15, 2022

re. the NPE in logs, it stems from this code in the RuleDeleteHandler:

    private void cleanup(RequestParameters params, Rule rule) {
        for (ServiceRef ref : storage.listDiscoverableServices()) {
            vertx.executeBlocking(
                    promise -> {
                        try {
                            if (ruleRegistry.applies(rule, ref)) {
                                ConnectionDescriptor cd = getConnectionDescriptorFromParams(params);
                                recordings.stopRecording(cd, rule.getRecordingName());
                            }
                            promise.complete();
                        } catch (Exception e) {
                            logger.error(e);
                            promise.fail(e);
                        }
                    });
        }
    }

getConnectionDescriptorFromParams assumes that there is a :targetId path parameter, but there is not for this handler. Should be easy enough to resolve since there is a ServiceRef there that also provides the targetId, and the handler already has a handle on the CredentialsManager.

@andrewazores This should be the expected behaviour right?

The ?clean=true query param is intended only to stop recordings that were started by the rule, not to delete them. This is because leaving the recording behind in the stopped state should have very little performance or cost impact on the target, but if the user has been interested in collecting data from that target then it’s likely better to err on the side of caution and preserve the recording data in case they want to collect it later.

1reaction
andrewazorescommented, Dec 15, 2022

The getConnectionDescriptorFromParams root cause of the bug that I described can just be partially reverted and put back to the implementation from that original PR.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Istio operator delete hangs during uninstall · Issue #24038
It looks like the operator is not able to delete the control plane resources. The alternative (not suited for an automated process) is...
Read more >
Fix automated rule issues - Google Ads Help
You can stop email updates about automated rules by either changing the email settings for a rule, pausing the rule, or removing the...
Read more >
Clean up Rake tasks - GitLab Docs
With this Rake task, you can remove invalid references from the database, which allows garbage collection of LFS files. For example: # omnibus-gitlab...
Read more >
Delete and recover Azure Log Analytics workspace
Installed solutions and linked services like your Azure Automation account are permanently removed from the workspace at deletion time and can't be recovered....
Read more >
Automate AMI lifecycles - Amazon Elastic Compute Cloud
In the Target resources section, for Target resource tags, ... If you remove the AMI deprecation rule from a schedule, Amazon Data Lifecycle...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found