question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

storcli.py shows battery_backup_healthy when it needs attention

See original GitHub issue

I have some megaraid controllers which are returning the following:

megaraid_healthy 0   <== there's a problem
megaraid_failed 0
megaraid_degraded 0
megaraid_battery_backup_healthy 1

This is odd: the controller says it needs attention, but it’s not obvious why.

On closer inspection: storcli.py returns battery_backup_healthy 1 if the BBU state is 0 or 32. I’m getting 32, and the battery is also “Degraded”:

# /opt/MegaRAID/storcli/storcli64 /cALL show all J | less
...
                "Status" : {
                  ==>   "Controller Status" : "Needs Attention",
                        "Memory Correctable Errors" : 0,
                        "Memory Uncorrectable Errors" : 0,
                        "ECC Bucket Count" : 0,
                        "Any Offline VD Cache Preserved" : "No",
                  ==>   "BBU Status" : 32,
                        "PD Firmware Download in progress" : "No",
                        "Support PD Firmware Download" : "No",
                        "Lock Key Assigned" : "No",
                        "Failed to get lock key on bootup" : "No",
                        "Lock key has not been backed up" : "No",
                        "Bios was not detected during boot" : "No",
                        "Controller must be rebooted to complete security operation" : "No",
                        "A rollback operation is in progress" : "No",
                        "At least one PFK exists in NVRAM" : "No",
                        "SSC Policy is WB" : "No",
                        "Controller has booted into safe mode" : "No",
                        "Controller shutdown required" : "No"
                },
...
                "BBU_Info" : [
                        {
                                "Model" : "iBBU",
                         ==>    "State" : "Dgd (Needs Attention)",
                                "RetentionTime" : "48 hours +",
                                "Temp" : "29C",
                                "Mode" : "-",
                                "MfgDate" : "2014/02/10",
                                "Next Learn" : "2019/06/27  01:33:42"
                        }
                ]

My best guess is that the controller “Needs Attention” because of the battery status, but I can’t find documentation for what status=32 means. Can you point to some info which says that 32 is healthy?

For comparison, here’s what MegaCLI says on the same controller:

# /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -GetBbuStatus -aALL

BBU status for Adapter: 0

BatteryType: iBBU
Voltage: 4014 mV
Current: 0 mA
Temperature: 29 C
Battery State: Degraded(Need Attention)
		A manual learn is required.
BBU Firmware Status:

  Charging Status              : None
  Voltage                                 : OK
  Temperature                             : OK
  Learn Cycle Requested	                  : Yes
  Learn Cycle Active                      : No
  Learn Cycle Status                      : OK
  Learn Cycle Timeout                     : No
  I2c Errors Detected                     : No
  Battery Pack Missing                    : No
  Battery Replacement required            : No
  Remaining Capacity Low                  : No
  Periodic Learn Required                 : No
  Transparent Learn                       : No
  No space to cache offload               : No
  Pack is about to fail & should be replaced : No
  Cache Offload premium feature required  : No
  Module microcode update required        : No


GasGuageStatus:
  Fully Discharged        : No
  Fully Charged           : No
  Discharging             : Yes
  Initialized             : Yes
  Remaining Time Alarm    : No
  Discharge Terminated    : No
  Over Temperature        : No
  Charging Terminated     : No
  Over Charged            : No
  Relative State of Charge: 75 %
  Charger System State: 49169
  Charger System Ctrl: 0
  Charging current: 512 mA
  Absolute state of charge: 77 %
  Max Error: 9 %

Exit Code: 0x00

Perhaps 32 means “manual learn is required”? But in that case, I’d say it’s not “healthy”, in the sense that some attention is required.

On another controller, which is healthy, the BBU state is 0. This one has CacheVault_Info rather than BBU_Info:

                "Cachevault_Info" : [
                        {
                                "Model" : "CVPM02",
                                "State" : "Optimal",
                                "Temp" : "30C",
                                "Mode" : "-",
                                "MfgDate" : "2014/05/30"
                        }
                ]

(Aside 1: storcli.py provides a metric megaraid_cv_temperature for the temperature from Cachevault_Info, but not the temperature from BBU_Info)

On a different controller, which doesn’t have a BBU at all, I get megaraid_battery_backup_healthy 0. In other words: it’s flagging as a battery “bad” even though the controller is healthy and there’s no action required. The JSON contains:

                        "BBU Status" : "NA",

(Aside 2: I would be inclined in this state to drop the megaraid_battery_backup_healthy metric entirely. Otherwise we get a false alarm about a bad battery, especially since there’s no other metric saying whether the BBU is present or not. On the other hand, I can suppress this alarm if megaraid_healthy is 1, which is is)

So in summary:

  • Can anyone confirm what BBU status 32 means?
  • Is it correct for storcli.py to report the battery as “healthy” in this condition, even though the overall controller health is “needs attention”?
  • Should we return BBU_Info temperature as a different metric, e.g. megaraid_bbu_temperature?
  • Should we suppress the megaraid_battery_backup_healthy metric if the BBU is not present (status=“NA”)? Or have a different metric for BBU present/absent?

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:20 (16 by maintainers)

github_iconTop GitHub Comments

1reaction
candlerbcommented, Mar 23, 2021

Can you install the old MegaCli64 and show the full output of /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -GetBbuStatus -aALL? Perhaps also the "BBU_Info" or "Cachevault_Info" sections from storcli? Maybe by comparing those with the previously-posted examples you’ll be able to identify a flag which 8192 matches.

1reaction
candlerbcommented, Dec 22, 2019

The reason why you would eventually mind is because I’ll be ripping off part of your changes in PR #20 (the ones that move to the dedicated parser loop)

Not a problem. It would be good if we could get other PRs merged first (#22, #31) as there may be merge conflicts and rebasing required anyway; that then leaves #20 / #32 for BBU

Read more comments on GitHub >

github_iconTop Results From Across the Web

StorCLI Reference Manual
This command shows the list of controllers and controller-associated information, information about the drives that need attention, and advanced software ...
Read more >
StorCLI commands available at the operating system - IBM
To show a list of all controllers and drives that need attention: storcli show all. To show information about the drives:
Read more >
RAID configuration using StorCLI - nine Support
StorCLI To configure the hardware RAID, and to edit or monitor it ... Locked = No Needs EKM Attention = No PI Eligible...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found