Discovery has not completed after a few weeks
See original GitHub issueIf your issue relates to the Discovery Process, please first follow the steps described in the implementation guide Debugging the Discovery Component
Describe the bug The discovery process has not been completed after a few weeks of being deployed.
We have deployed this into our production account with AWS Config enabled at the same time (we had previously disabled it for cost reasons). We are attempting to discover the resources within a single region (not us-east-1) and within the same account as Perspective is deployed.
Config has only been enabled within the region we are trying to discover. We have no aggregator created as we are only discovering a single account right now, but the account shows 14,286 resources. However, Perspective only shows 991 resources. It is slowly increasing, a few per day, but has not mapped the vast majority of our systems.
I have walked through the debugging of the discovery process. The CloudWatch monitoring of the GremlinFunction shows no errors. I tried searching the logs for “400” and “500”, but our account ID contains both of those so every log line showed up. I did a search for “Exception” and can see a few errors:
{
"detailedMessage": "The traversal has tried to use a null or non-existent value in the step: [GraphStep(vertex,[46338a5d8e02c6828364c145c3ce003c])]",
"code": "IllegalArgumentException",
"requestId": "eb848188-2216-4709-8792-fdc82feaa988"
}
{
"detailedMessage": "Vertex with id already exists: ",
"code": "ConstraintViolationException",
"requestId": "aa027916-a3c2-45f8-9bf7-5773f3a1fc48"
}
In the past week we have seen 6 of the first error and 5 of the second, with no exceptions in the last 24 hours.
I also looked at the ECS container task and can see in the logs a few errors as well from the past week.
{
"message": "Error Message: You have specified a resource that is either unknown or has not been discovered.",
"level": "error"
}
{
"message": "CanImportRun Error:",
"level": "error"
}
As well as OOM messages from v8.
<--- JS stacktrace --->
==== JS stack trace =========================================
0: ExitFrame [pc: 0x13162b9]
Security context: 0x0367c4840911 <JSObject>
1: copy [0x1294d9f4fe21] [/code/node_modules/ramda/src/internal/_clone.js:~21] [pc=0x1ea30af92bb1](this=0x31c303c84d49 <JSGlobal Object>,0x1294d9f4fe61 <JSArray[0]>)
2: copy [0x20ee9326fba1] [/code/node_modules/ramda/src/internal/_clone.js:~21] [pc=0x1ea30af91da7](this=0x31c303c84d49 <JSGlobal Object>,0x20ee9326fbe1 <JSArray[34]>)
3: copy [0x20ee9326f...
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
Failed to open Node.js report file: report.20210722.083713.1.0.001.json (errno: 13)
1: 0x9aedf0 node::Abort() [node]
2: 0x9aff86 node::OnFatalError(char const*, char const*) [node]
3: 0xb078ce v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
4: 0xb07c49 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
5: 0xce4ae5 [node]
6: 0xcf032b v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [node]
7: 0xcf1047 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
8: 0xcf3b78 v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [node]
9: 0xcbd487 v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType) [node]
10: 0xf94048 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [node]
11: 0x13162b9 [node]
<br class="Apple-interchange-newline" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;">
One thing I did notice that felt a bit strange was that there were ~50 tasks running at a time. I’m not sure if this is expected or if it’s because the tasks are not completing in time.
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
Great, thanks! I look forward to the next release
Here’s are some logs from the task