Discussion:
[Apcupsd-users] apcupsd shutting off too soon
Shawn McNeece
2006-04-14 11:11:23 UTC
Permalink
I had a power failure yesterday and found out apcupsd shut my system
down after about 14 minutes when it should run for ~200 minutes. I
tested it again later with the same results. I'm running Fedora core 4.

I've removed the host/ups names.

APC : 001,035,0885
DATE : Fri Apr 14 05:07:47 MDT 2006
HOSTNAME : XXXXXXXXXXXXXXXXXXXXXXXX
RELEASE : 3.12.2
VERSION : 3.12.2 (18 January 2006) redhat
UPSNAME : XXXXXXXXXXXXXXXXXXXXXXXX
CABLE : USB Cable
MODEL : Back-UPS RS 1500
UPSMODE : Stand Alone
STARTTIME: Fri Apr 14 05:04:44 MDT 2006
STATUS : ONLINE
LINEV : 122.0 Volts
LOADPCT : 16.0 Percent Load Capacity
BCHARGE : 100.0 Percent
TIMELEFT : 217.0 Minutes
MBATTCHG : 5 Percent
MINTIMEL : 3 Minutes
MAXTIME : 0 Seconds
LOTRANS : 104.0 Volts
HITRANS : 132.0 Volts
ALARMDEL : No alarm
BATTV : 26.8 Volts
LASTXFER : Low line voltage
NUMXFERS : 0
TONBATT : 0 seconds
CUMONBATT: 0 seconds
XOFFBATT : N/A
SELFTEST : NO
STATFLAG : 0x07000008 Status Flag
MANDATE : 2005-10-01
SERIALNO : JB0540029726
BATTDATE : 2001-09-25
NOMBATTV : 24.0
FIRMWARE : .g9 .D USB FW:g9
APCMODEL : Back-UPS RS 1500
END APC : Fri Apr 14 05:08:19 MDT 2006

And the log:

Thu Apr 13 21:55:08 MDT 2006 apcupsd 3.12.2 (18 January 2006) redhat
startup succeeded
Thu Apr 13 21:53:26 MDT 2006 apcupsd shutdown succeeded
Thu Apr 13 21:53:26 MDT 2006 apcupsd exiting, signal 15
Thu Apr 13 21:53:20 MDT 2006 User logins prohibited
Thu Apr 13 21:53:20 MDT 2006 Initiating system shutdown!
Thu Apr 13 21:53:20 MDT 2006 Reached remaining time percentage limit on
batteries.
Thu Apr 13 21:35:39 MDT 2006 Running on UPS batteries.
Thu Apr 13 21:35:33 MDT 2006 Power failure.
Thu Apr 13 20:22:02 MDT 2006 Power is back. UPS running on mains.
Thu Apr 13 20:22:02 MDT 2006 Mains returned. No longer on UPS batteries.
Thu Apr 13 20:21:19 MDT 2006 Running on UPS batteries.
Thu Apr 13 20:21:13 MDT 2006 Power failure.
Thu Apr 13 17:10:31 MDT 2006 apcupsd 3.12.2 (18 January 2006) redhat
startup succeeded
Thu Apr 13 17:06:39 MDT 2006 apcupsd shutdown succeeded
Thu Apr 13 17:06:39 MDT 2006 apcupsd exiting, signal 15
Thu Apr 13 17:05:00 MDT 2006 apcupsd 3.12.2 (18 January 2006) redhat
startup succeeded
Thu Apr 13 16:00:30 MDT 2006 apcupsd shutdown succeeded
Thu Apr 13 16:00:30 MDT 2006 apcupsd exiting, signal 15
Thu Apr 13 16:00:21 MDT 2006 User logins prohibited
Thu Apr 13 16:00:21 MDT 2006 Initiating system shutdown!
Thu Apr 13 16:00:21 MDT 2006 Battery power exhausted.
Thu Apr 13 15:46:05 MDT 2006 Running on UPS batteries.
Thu Apr 13 15:45:59 MDT 2006 Power failure.

Anyone have any ideas?
Adam Kropelin
2006-04-14 12:35:16 UTC
Permalink
Post by Shawn McNeece
I had a power failure yesterday and found out apcupsd shut my system
down after about 14 minutes when it should run for ~200 minutes. I
tested it again later with the same results. I'm running Fedora core 4.
I've removed the host/ups names.
apcupsd.conf, please? (You can eliminate the comments to make the
posting smaller)

--Adam
Adam Kropelin
2006-04-14 13:01:54 UTC
Permalink
Post by Adam Kropelin
Post by Shawn McNeece
I had a power failure yesterday and found out apcupsd shut my system
down after about 14 minutes when it should run for ~200 minutes. I
tested it again later with the same results. I'm running Fedora core 4.
I've removed the host/ups names.
apcupsd.conf, please? (You can eliminate the comments to make the
posting smaller)
Thu Apr 13 15:45:59 MDT 2006 Power failure.
Thu Apr 13 15:46:05 MDT 2006 Running on UPS batteries.
Thu Apr 13 16:00:21 MDT 2006 Battery power exhausted.
Thu Apr 13 16:00:21 MDT 2006 Initiating system shutdown!
"Battery power exhausted" means the UPS set the Low-Battery flag.
Apcupsd takes that as an immediate indication that it needs to shutdown
because power loss is imminent.

I suspect your UPS has bad batteries (2001-09-25 is pretty old for
lead-acid batts).

An interesting test to run would be to watch the voltage and time
profile as the UPS drains to empty. Connect a similar percent-load worth
of light bulbs to the UPS, but plug your computer directly into the
wall. Neuter apcupsd so it won't try to shut down your computer (edit
apccontrol and set SHUTDOWN=/bin/true instead of
SHUTDOWN=/sbin/shutdown). Then use apcaccess to start recording data
every second with a command like this:

while [ 1 ] ; do \
apcaccess status | tee -a ups.log ; \
sleep 1 ; \
done

Disconnect the UPS from mains power and let it drain all the way until
it shuts off. Then stop the recording. You can now grep various
interesting information from ups.log. Something like...

egrep "(STATUS)|(BCHARGE)|(TIMELEFT)|(BATTV )|(END APC)" < ups.log

...would show the profile of the most interesting parameters.

--Adam
Shawn McNeece
2006-04-14 14:28:11 UTC
Permalink
Adam, thanks for the help, and sorry to send the reply to you and not to
the list, I've had my pot of coffee and should be better now ;)

Anyway more information... The UPS is less than 3 months old, the battery
date I think is the default as I never set it in the conf.

I sat with the web browser and monitored the battery level and it never
dropped below 95%. I doubt the thing dropped 90% in the 5seconds intervals
I had firefox set for on refresh. The voltage was also fine.

I suppose my UPS could be defective, but to me it looks more like instead
of shutting off at 5% battery, its shutting down after 5% has been
drained?


my .conf:
------------------------
UPSCABLE usb


UPSTYPE usb
DEVICE

LOCKFILE /var/lock

ONBATTERYDELAY 6

BATTERYLEVEL 5

MINUTES 3


TIMEOUT 0

ANNOY 300

ANNOYDELAY 60


NOLOGON disable


KILLDELAY 0

NETSERVER on

NISIP 0.0.0.0


NISPORT 3551


EVENTSFILE /var/log/apcupsd.events

EVENTSFILEMAX 10
UPSCLASS standalone

UPSMODE disable
STATTIME 0




STATFILE /var/log/apcupsd.status
LOGSTATS off

DATATIME 0

SENSITIVITY H

BEEPSTATE N


SELFTEST 168
Post by Adam Kropelin
Post by Adam Kropelin
Post by Shawn McNeece
I had a power failure yesterday and found out apcupsd shut my system
down after about 14 minutes when it should run for ~200 minutes. I
tested it again later with the same results. I'm running Fedora core 4.
I've removed the host/ups names.
apcupsd.conf, please? (You can eliminate the comments to make the
posting smaller)
Thu Apr 13 15:45:59 MDT 2006 Power failure.
Thu Apr 13 15:46:05 MDT 2006 Running on UPS batteries.
Thu Apr 13 16:00:21 MDT 2006 Battery power exhausted.
Thu Apr 13 16:00:21 MDT 2006 Initiating system shutdown!
"Battery power exhausted" means the UPS set the Low-Battery flag.
Apcupsd takes that as an immediate indication that it needs to shutdown
because power loss is imminent.
I suspect your UPS has bad batteries (2001-09-25 is pretty old for
lead-acid batts).
An interesting test to run would be to watch the voltage and time
profile as the UPS drains to empty. Connect a similar percent-load worth
of light bulbs to the UPS, but plug your computer directly into the
wall. Neuter apcupsd so it won't try to shut down your computer (edit
apccontrol and set SHUTDOWN=/bin/true instead of
SHUTDOWN=/sbin/shutdown). Then use apcaccess to start recording data
while [ 1 ] ; do \
apcaccess status | tee -a ups.log ; \
sleep 1 ; \
done
Disconnect the UPS from mains power and let it drain all the way until
it shuts off. Then stop the recording. You can now grep various
interesting information from ups.log. Something like...
egrep "(STATUS)|(BCHARGE)|(TIMELEFT)|(BATTV )|(END APC)" < ups.log
...would show the profile of the most interesting parameters.
--Adam
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting
language
that extends applications into web and mobile media. Attend the live
webcast
and join the prime developer group breaking into this new coding
territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Apcupsd-users mailing list
https://lists.sourceforge.net/lists/listinfo/apcupsd-users
Adam Kropelin
2006-04-14 15:36:32 UTC
Permalink
(Please don't top-post. It makes the archives difficult to read.
Thanks.)
Post by Shawn McNeece
Adam, thanks for the help, and sorry to send the reply to you and not
to the list, I've had my pot of coffee and should be better now ;)
The list defaults to reply-to-sender instead of reply-to-list. It
catches a lot of folks off guard. I'm tempted to change it, but there
are some good arguments for keeping the default the way it is. If
everyone does reply-to-all it'll work out fine...
Post by Shawn McNeece
Anyway more information... The UPS is less than 3 months old, the
battery date I think is the default as I never set it in the conf.
Theoretically APC would set that value at the factory, but perhaps they
don't bother any more.
Post by Shawn McNeece
I sat with the web browser and monitored the battery level and it
never dropped below 95%. I doubt the thing dropped 90% in the
5seconds intervals I had firefox set for on refresh. The voltage was
also fine.
An apcupsd debug trace would be useful here. If you launch apcupsd with
'-d1000 -T' flags it will generate a very detailed (and very long) debug
log in apcupsd.trace. It would be interesting to see that debug trace as
it is possible that the UPS is glitching the LowBatt signal.
Post by Shawn McNeece
I suppose my UPS could be defective, but to me it looks more like
instead of shutting off at 5% battery, its shutting down after 5% has
been drained?
That's coincidence, I believe. Apcupsd has no logic to calculate
difference versus the value at start-of-power-fail and as I said before
the log message indicates very clearly that the UPS LowBatt signal was
the cause for shutdown. The LowBatt glitch theory is the most likely,
and the debug log will tell us for sure.

(Send me the trace privately as it'll be rather large. Thanks.)

--Adam
Shawn McNeece
2006-04-14 16:13:37 UTC
Permalink
Post by Adam Kropelin
The list defaults to reply-to-sender instead of reply-to-list. It
catches a lot of folks off guard. I'm tempted to change it, but there
are some good arguments for keeping the default the way it is. If
everyone does reply-to-all it'll work out fine...
Opinion... reply to all very bad. Default should be to list only if
possible. again just my opinion. :)
Post by Adam Kropelin
Theoretically APC would set that value at the factory, but perhaps they
don't bother any more.
I suppose it could have sat around for years? :) But I doubt it. Tonight
as well I'm going to hook the UPS to my Windows box, and try their
Parachute software which is currently installed with the same model UPS.
Post by Adam Kropelin
Post by Shawn McNeece
I sat with the web browser and monitored the battery level and it
never dropped below 95%. I doubt the thing dropped 90% in the
5seconds intervals I had firefox set for on refresh. The voltage was
also fine.
An apcupsd debug trace would be useful here. If you launch apcupsd with
'-d1000 -T' flags it will generate a very detailed (and very long) debug
log in apcupsd.trace. It would be interesting to see that debug trace as
it is possible that the UPS is glitching the LowBatt signal.
I'll work on this logfile tonight when I'm home from work. I really hope
to figure this out as 14-15 minutes run time is pretty bad for a $300.00
UPS. Even if the UPS sends low batt, does the program just proceed to shut
down the system?
Shawn McNeece
2006-04-14 16:36:58 UTC
Permalink
Even if the UPS sends low batt, does the program just proceed to shut
Post by Shawn McNeece
down the system?
I should clarify, does the program keep polling battery level after being
sent low battery signal? All the way pretty much until the service is
shut down?
Adam Kropelin
2006-04-14 17:48:46 UTC
Permalink
Post by Shawn McNeece
Even if the UPS sends low batt, does the program just proceed to shut
Post by Shawn McNeece
down the system?
I should clarify, does the program keep polling battery level after
being sent low battery signal? All the way pretty much until the
service is shut down?
It will keep polling until the shutdown procedes far enough to shut down
the daemon itself. I'm interested to see from your debug trace if the
low batt signal is temporary or if it persists.

--Adam
Adam Kropelin
2006-04-19 22:14:09 UTC
Permalink
Hi Adam, here's the log/trace file, hope it helps. This afternoons
test the UPS only stayed up a few minutes before apcupsd reported
reached
time limit on batteries, which is a different message than before
and I don't believe I changed anything since yesterdays failure except
I
tried to set its sensitivity to high. But the .conf file I posted
earlier
is still what's running. I added the d1000 -T to the
/etc/init.d/apcupsd
not sure if that makes a difference how apcupsd is launched.
The glitch theory is right, though this time it struck your TimeLeft
reading instead of LowBatt:

[...]
202.362 apcupsd: usb.c:405 TimeLeft = 222
207.871 apcupsd: usb.c:405 TimeLeft = 222
212.383 apcupsd: usb.c:405 TimeLeft = 222
213.886 apcupsd: usb.c:405 TimeLeft = 0
214.517 apcupsd: usb.c:405 TimeLeft = 222
215.750 apcupsd: usb.c:405 TimeLeft = 214
217.163 apcupsd: usb.c:405 TimeLeft = 214
[...]

That zero reading at 213.886 is what caused apcupsd to initiate
shutdown. Strictly speaking, apcupsd did the right thing; it's relying
on the UPS and the rest of the system to deliver trustworthy data.
However, it could stand to be a bit smarter about temporary glitches
like this. The false reading was corrected less than a second later, but
by that time apcupsd had already comitted to performing a shutdown.

I'll look into adding some hysteresis so shutdown will only be initiated
when the conditions persist for a few seconds.

--Adam
Adam Kropelin
2006-04-20 02:40:26 UTC
Permalink
Post by Adam Kropelin
I'll look into adding some hysteresis so shutdown will only be
initiated when the conditions persist for a few seconds.
Ok, I've implemented a mechanism whereby apcupsd will not initiate a
shutdown until it sees the trigger condition (battlow, timeleft, or
battpct) persist for 5 seconds. This should be a suffient workaround for
the glitches in the readings you are experiencing.

Check out the 3.13.x tree from CVS and build it as usual. Instructions
for checkout are at www.apcupsd.com It would be a good idea to run with
'-d1000 -T' and then send me the file so I can double-check that the
hysteresis is kicking in as expected.

--Adam
Adam Kropelin
2006-04-20 02:55:53 UTC
Permalink
Post by Adam Kropelin
Post by Adam Kropelin
I'll look into adding some hysteresis so shutdown will only be
initiated when the conditions persist for a few seconds.
Ok, I've implemented a mechanism whereby apcupsd will not initiate a
shutdown until it sees the trigger condition (battlow, timeleft, or
battpct) persist for 5 seconds. This should be a suffient workaround
for the glitches in the readings you are experiencing.
Check out the 3.13.x tree from CVS and build it as usual. Instructions
for checkout are at www.apcupsd.com
Looks like public CVS is still down at sf.net. I've made a development
release tarball available here:

<http://www.kroptech.com/~adk0212/apcupsd/apcupsd-3.13.4.tar.gz>

--Adam
Shawn McNeece
2006-04-20 03:10:31 UTC
Permalink
Post by Adam Kropelin
Post by Adam Kropelin
Ok, I've implemented a mechanism whereby apcupsd will not initiate a
shutdown until it sees the trigger condition (battlow, timeleft, or
battpct) persist for 5 seconds. This should be a suffient workaround
for the glitches in the readings you are experiencing.
Check out the 3.13.x tree from CVS and build it as usual. Instructions
for checkout are at www.apcupsd.com
Looks like public CVS is still down at sf.net. I've made a development
<http://www.kroptech.com/~adk0212/apcupsd/apcupsd-3.13.4.tar.gz>
--Adam
Thanks Adam!! Very cool, won't have to deal with sending my UPS Back :)
Will try to install that tomorrow from work, and do some testing when I
can get home and pull the plug.
Kern Sibbald
2006-04-20 07:44:29 UTC
Permalink
Post by Adam Kropelin
Post by Adam Kropelin
I'll look into adding some hysteresis so shutdown will only be
initiated when the conditions persist for a few seconds.
Ok, I've implemented a mechanism whereby apcupsd will not initiate a
shutdown until it sees the trigger condition (battlow, timeleft, or
battpct) persist for 5 seconds. This should be a suffient workaround for
the glitches in the readings you are experiencing.
Check out the 3.13.x tree from CVS and build it as usual. Instructions
for checkout are at www.apcupsd.com It would be a good idea to run with
'-d1000 -T' and then send me the file so I can double-check that the
hysteresis is kicking in as expected.
Hello Adam,

It seems to me that if timeleft suddenly goes to zero from some much larger
time, that there is no problem in waiting for 5 seconds, but in normal cases,
5 extra seconds of wait time could potentially cause problems. Of course, if
users know about it, they can always set the trigger 5 seconds higher, or we
could always add 5 seconds to the configured values ...

If you keep the 5 second delay code especially in the case when a value is
reporting 0, I would recommend you consider firing off a "sync" so that if
the power does really suddenly shut off, there will be a minimum of problems
with the disks.
--
Best regards,

Kern

(">
/\
V_V
Shawn McNeece
2006-04-20 13:29:03 UTC
Permalink
Post by Kern Sibbald
It seems to me that if timeleft suddenly goes to zero from some much larger
time, that there is no problem in waiting for 5 seconds, but in normal cases,
In "normal" cases if your UPS can't handle an extra 5 second poll, that's
not "normal", and you have bigger issues. If you're cutting your time
remaining before shutdown so close that 5 seconds will make a big deal,
you should perhaps add one more minute or 1% more that the UPS stays up.
If losing one more minute is that big of deal, then a larger UPS is your
answer. Of course the UPS sending bad data is not normal. But we all know
good programming takes "glithces" into account, human caused, or otherwise
influenced.
Post by Kern Sibbald
5 extra seconds of wait time could potentially cause problems. Of course, if
users know about it, they can always set the trigger 5 seconds higher, or we
could always add 5 seconds to the configured values ...
If you keep the 5 second delay code especially in the case when a value is
reporting 0, I would recommend you consider firing off a "sync" so that if
the power does really suddenly shut off, there will be a minimum of problems
with the disks.
Again, if that 5 seconds is really a worry, then a sync is not going to
help. The only thing going to help is re-thinking your UPS config.

Of course the simple answer is just make a variable in the config file.
Some people can set to 0, some people who want a more reliable UPS can
double check before killing the system.
Kern Sibbald
2006-04-20 14:05:30 UTC
Permalink
Post by Shawn McNeece
Post by Kern Sibbald
It seems to me that if timeleft suddenly goes to zero from some much larger
time, that there is no problem in waiting for 5 seconds, but in normal cases,
In "normal" cases if your UPS can't handle an extra 5 second poll, that's
not "normal", and you have bigger issues.
I guess I didn't state the potential problem very clearly, because what you
wrote above is rather obvious. My point is that if a value suddenly jumps to
zero there could be something wrong with the battery (exhausted, inverter
goes out, ...). In that case, waiting 5 seconds longer could potentially be
catastrophic. In order to keep the 5 second delay logic, yet possibly avoid
a catastrophe I proposed to sync the disks as a sort of compromise or
insurance ...
Post by Shawn McNeece
If you're cutting your time
remaining before shutdown so close that 5 seconds will make a big deal,
you should perhaps add one more minute or 1% more that the UPS stays up.
If losing one more minute is that big of deal, then a larger UPS is your
answer. Of course the UPS sending bad data is not normal. But we all know
good programming takes "glithces" into account, human caused, or otherwise
influenced.
Post by Kern Sibbald
5 extra seconds of wait time could potentially cause problems. Of course, if
users know about it, they can always set the trigger 5 seconds higher, or we
could always add 5 seconds to the configured values ...
If you keep the 5 second delay code especially in the case when a value
is reporting 0, I would recommend you consider firing off a "sync" so
that if the power does really suddenly shut off, there will be a minimum
of problems
with the disks.
Again, if that 5 seconds is really a worry, then a sync is not going to
help. The only thing going to help is re-thinking your UPS config.
I shouldn't have mentioned adding 5 seconds to the config values as that is
not really appropriate, my point was about a value suddenly reporting 0,
which on any good UPS would indicate a *very* unusual situation having
nothing to do with one's UPS config. In a such a potential catastophic
situation 5 seconds is a long time, and a sync can be done quite quickly.
Post by Shawn McNeece
Of course the simple answer is just make a variable in the config file.
Some people can set to 0, some people who want a more reliable UPS can
double check before killing the system.
I don't see any need to add an additional config variable for this case
regardless of whether or not Adam likes my sync idea.
--
Best regards,

Kern

(">
/\
V_V
Adam Kropelin
2006-04-20 19:18:44 UTC
Permalink
Post by Kern Sibbald
Post by Adam Kropelin
Post by Adam Kropelin
I'll look into adding some hysteresis so shutdown will only be
initiated when the conditions persist for a few seconds.
Ok, I've implemented a mechanism whereby apcupsd will not initiate a
shutdown until it sees the trigger condition (battlow, timeleft, or
battpct) persist for 5 seconds. This should be a suffient workaround for
the glitches in the readings you are experiencing.
Check out the 3.13.x tree from CVS and build it as usual. Instructions
for checkout are at www.apcupsd.com It would be a good idea to run with
'-d1000 -T' and then send me the file so I can double-check that the
hysteresis is kicking in as expected.
Hello Adam,
It seems to me that if timeleft suddenly goes to zero from some much larger
time, that there is no problem in waiting for 5 seconds, but in normal cases,
5 extra seconds of wait time could potentially cause problems. Of course, if
users know about it, they can always set the trigger 5 seconds higher, or we
could always add 5 seconds to the configured values ...
I have trouble thinking of a scenario where the 5 extra seconds matters.
In my experience so far, the runtime-remaining and battery-percent
calculations reported by APC products are not terribly reliable overall.
Accuracy to within a minute is often questionable, and it's definitely
not accurate to within 5 seconds.
Post by Kern Sibbald
If you keep the 5 second delay code especially in the case when a value is
reporting 0, I would recommend you consider firing off a "sync" so that if
the power does really suddenly shut off, there will be a minimum of problems
with the disks.
I'd worry a bit about disrupting the server's workload by issuing a sync
which could take tens of seconds for a busy box. It would be interesting
to issue a sync from apccontrol immediately at the beginning of any
shutdown cycle (after the 5 second hysteresis elapses). That might be a
good compromise.

--Adam

Loading...