8.1.3
Opossum is a Node.js circuit breaker that executes asynchronous functions
and monitors their execution status. When things start failing, opossum
plays dead and fails fast. If you want, you can provide a fallback function
to be executed when in the failure state.
For more about the circuit breaker pattern, there are lots of resources on the web - search it! Fowler's blog post is one place to start reading.
Project Info | |
---|---|
License: | Apache-2.0 |
Documentation: | https://nodeshift.dev/opossum/ |
Typings: | https://github.com/DefinitelyTyped/DefinitelyTyped/tree/master/types/opossum |
Issue tracker: | https://github.com/nodeshift/opossum/issues |
Engines: | Node.js >= 16 |
Let's say you've got an API that depends on something that might fail -
a network operation, or disk read, for example. Wrap those functions up in a
CircuitBreaker
and you have control over your destiny.
You can provide an AbortController
(https://developer.mozilla.org/en-US/docs/Web/API/AbortController, https://nodejs.org/docs/latest/api/globals.html#globals_class_abortcontroller) for aborting on going request upon
reaching Opossum timeout.
You can also provide a fallback function that will be executed in the
event of failure. To take some action when the fallback is performed,
listen for the fallback
event.
Once the circuit has opened, a timeout is set based on options.resetTimeout
.
When the resetTimeout
expires, opossum
will enter the halfOpen
state.
Once in the halfOpen
state, the next time the circuit is fired, the circuit's
action will be executed again. If successful, the circuit will close and emit
the close
event. If the action fails or times out, it immediately re-enters
the open
state.
When a fallback function is triggered, it's considered a failure, and the fallback function will continue to be executed until the breaker is closed.
The fallback function accepts the same parameters as the fire function:
There may be times where you will need to initialize the state of a Circuit Breaker. Primary use cases for this are in a serverless environment such as Knative or AWS Lambda, or any container based platform, where the container being deployed is ephemeral.
The toJSON
method is a helper function to get the current state and status of a breaker:
const breakerState = breaker.toJSON();
This will return an object that might look similar to this:
{
state: {
enabled: true,
name: 'functionName'
closed: true,
open: false,
halfOpen: false,
warmUp: false,
shutdown: false
},
status: {
...
}
};
A new circuit breaker instance can be created with this state by passing this object in:
const breaker = new CircuitBreaker({state: state});
There may also be times where you will need to pre-populate the stats of the Circuit Breaker Status Object. Primary use cases for this are also in a serverless environment such as Knative or AWS Lambda, or any container based platform, where the container being deployed is ephemeral.
Getting the existing cumalative stats for a breaker can be done like this:
const stats = breaker.stats;
stats
will be an object that might look similar to this:
{
failures: 11,
fallbacks: 0,
successes: 5,
rejects: 0,
fires: 16,
timeouts: 0,
cacheHits: 0,
cacheMisses: 0,
semaphoreRejections: 0,
percentiles: {
'0': 0,
'1': 0,
'0.25': 0,
'0.5': 0,
'0.75': 0,
'0.9': 0,
'0.95': 0,
'0.99': 0,
'0.995': 0
},
latencyTimes: [ 0 ],
latencyMean: 0
}
To then re-import those stats, first create a new Status
object with the previous stats and then pass that as an option to the CircuitBreaker constructor:
const statusOptions = {
stats: {....}
};
const newStatus = CircuitBreaker.newStatus(statusOptions);
const breaker = new CircuitBreaker({status: newStatus});
Opossum really shines in a browser. You can use it to guard against network failures in your AJAX calls.
We recommend using webpack to bundle your applications,
since it does not have the effect of polluting the window
object with a global.
However, if you need it, you can access a circuitBreaker
function in the global
namespace by doing something similar to what is shown in the below example.
Here is an example using hapi.js. See the opossum-examples repository for more detail.
Include opossum.js
in your HTML file.
In your application, set a route to the file, pointing to
node_modules/opossum/dist/opossum-min.js
.
In the browser's global scope will be a CircuitBreaker
constructor. Use it
to create circuit breakers, guarding against network failures in your REST
API calls.
A CircuitBreaker
will emit events for important things that occur.
Here are the events you can listen for.
fire
- emitted when the breaker is fired.reject
- emitted when the breaker is open (or halfOpen).timeout
- emitted when the breaker action times out.success
- emitted when the breaker action completes successfullyfailure
- emitted when the breaker action fails, called with the erroropen
- emitted when the breaker state changes to open
close
- emitted when the breaker state changes to closed
halfOpen
- emitted when the breaker state changes to halfOpen
fallback
- emitted when the breaker has a fallback function and executes itsemaphoreLocked
- emitted when the breaker is at capacity and cannot execute the requesthealthCheckFailed
- emitted when a user-supplied health check function returns a rejected promiseshutdown
- emitted when the breaker shuts downHandling events gives a greater level of control over your application behavior.
The opossum
API returns a Promise
from CircuitBreaker.fire()
.
But your circuit action - the async function that might fail -
doesn't have to return a promise. You can easily turn Node.js style
callback functions into something opossum
understands by using the built in
Node core utility function util.promisify()
.
And just for fun, your circuit doesn't even really have to be a function. Not sure when you'd use this - but you could if you wanted to.
The errorThresholdPercentage
value is compared to the error rate. That rate is determined by dividing the number of failures by the number of times the circuit has been fired. You can see this comparison here:
The numbers for fires
and failures
come from the stats that are indeed governed by rollingCountTimeout
and rollingCountBuckets
. The timeout value is the total number of seconds for which the stats are being maintained, and the buckets value is the number of slots in the window. The defaults are 10 seconds and 10 buckets. So, the statistics that are being compared against errorThresholdPercentage
are based on 10 samples, one per second over the last 10 seconds.
Example: a circuit is fired 24 times over 10 seconds with a somewhat bursty pattern, failing three times.
| fires: 2 | fires: 1 | fires: 3 | fires: 0 | fires: 9 | fires: 3 | fires: 2 | fires: 0 | fires: 4 | fires: 0 |
| fails: 0 | fails: 0 | fails: 0 | fails: 0 | fails: 0 | fails: 3 | fails: 0 | fails: 0 | fails: 0 | fails: 0 |
The failure rate here is 3/24 or 1/8 or 12.5%. The default error threshold is 50%, so in this case, the circuit would not open. However, if you modified the rollingCountTimeout
to 3 seconds, and the rollingCountBuckets
to 3 (not recommended), then the stats array might look like these three seconds from above.
| fires: 3 | fires: 2 | fires: 0 |
| fails: 3 | fails: 0 | fails: 0 |
Now, without changing errorThresholdPercentage
our circuit will open because our error rate is now 3/5 or 60%. It's tricky to test this stuff because the array of statistics is a rolling count. Every second the oldest bucket is removed and a new one is added, so the totals change constantly in a way that may not be intuitive.
For example, if the first example is shifted right, dropping the first bucket and adding another with fires: 3
the total number of fires
now in the stats is not 27 (24+3) but 25 (24-2+3).
The code that is summing the stats samples is here:
Typings are available here.
If you'd like to add them, run npm install @types/opossum
in your project.
The opossum-prometheus
module
can be used to produce metrics that are consumable by Prometheus.
These metrics include information about the circuit itself, for example how many
times it has opened, as well as general Node.js statistics, for example event loop lag.
The opossum-hystrix
module can
be used to produce metrics that are consumable by the Hystrix Dashboard.
You may run into issues related to too many listeners on an EventEmitter
like this.
In some cases, seeing this error might indicate a bug in client code, where many CircuitBreaker
s are inadvertently being created. But there are legitimate scenarios where this may not be the case. For example, it could just be that you need more than 10 CircuitBreaker
s in your app. That's ok.
To get around the error, you can set the number of listeners on the stream.
Or it could be that you have a large test suite which exercises some code that creates CircuitBreaker
s and does so repeatedly. If the CircuitBreaker
being created is only needed for the duration of the test, use breaker.shutdown()
when the circuit is no longer in use to clean up all listeners.
Constructs a CircuitBreaker.
Extends EventEmitter
Name | Description |
---|---|
options.status Status
|
A Status object that might have pre-prime stats |
options.timeout Number
|
The time in milliseconds that action should
be allowed to execute before timing out. Timeout can be disabled by setting
this to
false
. Default 10000 (10 seconds)
|
options.maxFailures Number
|
(Deprecated) The number of times the circuit can fail before opening. Default 10. |
options.resetTimeout Number
|
The time in milliseconds to wait before
setting the breaker to
halfOpen
state, and trying the action again.
Default: 30000 (30 seconds)
|
options.rollingCountTimeout Number
|
Sets the duration of the statistical rolling window, in milliseconds. This is how long Opossum keeps metrics for the circuit breaker to use and for publishing. Default: 10000 |
options.rollingCountBuckets Number
|
Sets the number of buckets the rolling statistical window is divided into. So, if options.rollingCountTimeout is 10000, and options.rollingCountBuckets is 10, then the statistical window will be 1000/1 second snapshots in the statistical window. Default: 10 |
options.name String
|
the circuit name to use when reporting stats. Default: the name of the function this circuit controls. |
options.rollingPercentilesEnabled boolean
|
This property indicates whether execution latencies should be tracked and calculated as percentiles. If they are disabled, all summary statistics (mean, percentiles) are returned as -1. Default: true |
options.capacity Number
|
the number of concurrent requests allowed.
If the number currently executing function calls is equal to
options.capacity, further calls to
fire()
are rejected until at least one
of the current requests completes. Default:
Number.MAX_SAFE_INTEGER
.
|
options.errorThresholdPercentage Number
|
the error percentage at which to open the circuit and start short-circuiting requests to fallback. Default: 50 |
options.enabled boolean
|
whether this circuit is enabled upon construction. Default: true |
options.allowWarmUp boolean
|
determines whether to allow failures
without opening the circuit during a brief warmup period (this is the
rollingCountTimeout
property). Default: false
This can help in situations where no matter what your
errorThresholdPercentage
is, if the first execution times out or fails,
the circuit immediately opens.
|
options.volumeThreshold Number
|
the minimum number of requests within
the rolling statistical window that must exist before the circuit breaker
can open. This is similar to
options.allowWarmUp
in that no matter how many
failures there are, if the number of requests within the statistical window
does not exceed this threshold, the circuit will remain closed. Default: 0
|
options.errorFilter Function
|
an optional function that will be called when the circuit's function fails (returns a rejected Promise). If this function returns truthy, the circuit's failPure statistics will not be incremented. This is useful, for example, when you don't want HTTP 404 to trip the circuit, but still want to handle it as a failure case. |
options.cache boolean
|
whether the return value of the first
successful execution of the circuit's function will be cached. Once a value
has been cached that value will be returned for every subsequent execution:
the cache can be cleared using
clearCache
. (The metrics
cacheHit
and
cacheMiss
reflect cache activity.) Default: false
|
options.cacheTTL Number
|
the time to live for the cache in milliseconds. Set 0 for infinity cache. Default: 0 (no TTL) |
options.cacheGetKey Function
|
function that returns the key to use
when caching the result of the circuit's fire.
Better to use custom one, because
JSON.stringify
is not good
from performance perspective.
Default:
(...args) => JSON.stringify(args)
|
options.cacheTransport CacheTransport
|
custom cache transport
should implement
get
,
set
and
flush
methods.
|
options.abortController AbortController
|
this allows Opossum to signal upon timeout and properly abort your on going requests instead of leaving it in the background |
options.enableSnapshots boolean
|
whether to enable the rolling stats snapshots that opossum emits at the bucketInterval. Disable this as an optimization if you don't listen to the 'snapshot' event to reduce the number of timers opossum initiates. |
options.rotateBucketController EventEmitter
|
if you have multiple breakers in your app, the number of timers across breakers can get costly. This option allows you to provide an EventEmitter that rotates the buckets so you can have one global timer in your app. Make sure that you are emitting a 'rotate' event from this EventEmitter |
Closes the breaker, allowing the action to execute again
void
:
Opens the breaker. Each time the breaker is fired while the circuit is opened, a failed Promise is returned, or if any fallback function has been provided, it is invoked.
If the breaker is already open this call does nothing.
void
:
Shuts down this circuit breaker. All subsequent calls to the circuit will fail, returning a rejected promise.
void
:
The current Status of this CircuitBreaker
Type: Status
Provide a fallback function for this CircuitBreaker. This
function will be executed when the circuit is fire
d and fails.
It will always be preceded by a failure
event, and breaker.fire
returns
a rejected Promise.
((Function | CircuitBreaker))
the fallback function to execute
when the breaker has opened or when a timeout or error occurs.
CircuitBreaker
:
this
Execute the action for this circuit. If the action fails or times out, the returned promise will be rejected. If the action succeeds, the promise will resolve with the resolved value from action. If a fallback function was provided, it will be invoked in the event of any failure or timeout.
Any parameters passed to this function will be proxied to the circuit function.
(...any)
Promise<any>
:
promise resolves with the circuit function's return
value on success or is rejected on failure of the action. Use isOurError()
to determine if a rejection was a result of the circuit breaker or the
action.
Execute the action for this circuit using context
as this
.
If the action fails or times out, the
returned promise will be rejected. If the action succeeds, the promise will
resolve with the resolved value from action. If a fallback function was
provided, it will be invoked in the event of any failure or timeout.
Any parameters in addition to `context will be passed to the circuit function.
(any)
the
this
context used for function execution
(any)
the arguments passed to the action
Promise<any>
:
promise resolves with the circuit function's return
value on success or is rejected on failure of the action.
Clears the cache of this CircuitBreaker
void
:
Provide a health check function to be called periodically. The function
should return a Promise. If the promise is rejected the circuit will open.
This is in addition to the existing circuit behavior as defined by
options.errorThresholdPercentage
in the constructor. For example, if the
health check function provided here always returns a resolved promise, the
circuit can still trip and open if there are failures exceeding the
configured threshold. The health check function is executed within the
circuit breaker's execution context, so this
within the function is the
circuit breaker itself.
(Function)
a health check function which returns a promise.
(Number?)
the amount of time between calls to the health
check function. Default: 5000 (5 seconds)
void
:
interval
is supplied but not a number
Enables this circuit. If the circuit is the disabled state, it will be re-enabled. If not, this is essentially a noop.
void
:
Disables this circuit, causing all calls to the circuit's function to be executed without circuit or fallback protection.
void
:
Emitted after options.resetTimeout
has elapsed, allowing for
a single attempt to call the service again. If that attempt is
successful, the circuit will be closed. Otherwise it remains open.
Type: Number
Emitted when the breaker is reset allowing the action to execute again
Emitted when the breaker opens because the action has
failure percentage greater than options.errorThresholdPercentage
.
Emitted when the circuit breaker has been shut down.
Emitted when the circuit breaker action is executed
Type: any
Emitted when the circuit breaker is using the cache and finds a value.
Emitted when the circuit breaker does not find a value in the cache, but the cache option is enabled.
Emitted when the circuit breaker is open and failing fast
Type: Error
Emitted when the circuit breaker action takes longer than
options.timeout
Type: Error
Emitted when the circuit breaker action succeeds
Type: any
Emitted when the rate limit has been reached and there are no more locks to be obtained.
Type: Error
Emitted with the user-supplied health check function returns a rejected promise.
Type: Error
Emitted when the circuit breaker executes a fallback function
Type: any
Emitted when the circuit breaker action fails
Type: Error
Tracks execution status for a given CircuitBreaker. A Status instance is created for every CircuitBreaker and does not typically need to be created by a user.
A Status instance will listen for all events on the CircuitBreaker
and track them in a rolling statistical window. The window duration is
determined by the rollingCountTimeout
option provided to the
CircuitBreaker. The window consists of an array of Objects,
each representing the counts for a CircuitBreaker's events.
The array's length is determined by the CircuitBreaker's
rollingCountBuckets
option. The duration of each slice of the window
is determined by dividing the rollingCountTimeout
by
rollingCountBuckets
.
Extends EventEmitter
(Object)
for the status window
Name | Description |
---|---|
options.rollingCountBuckets Number
|
number of buckets in the window |
options.rollingCountTimeout Number
|
the duration of the window |
options.rollingPercentilesEnabled Boolean
|
whether to calculate percentiles |
options.stats Object
|
object of previous stats |
// Creates a 1 second window consisting of ten time slices,
// each 100ms long.
const circuit = circuitBreaker(fs.readFile,
{ rollingCountBuckets: 10, rollingCountTimeout: 1000});
// get the cumulative statistics for the last second
circuit.status.stats;
// get the array of 10, 1 second time slices for the last second
circuit.status.window;
Emitted at each time-slice. Listeners for this event will receive a cumulative snapshot of the current status window.
Type: Object
Simple in-memory cache implementation
(Map)
: Cache map