-
-
Notifications
You must be signed in to change notification settings - Fork 35.2k
dns.lookup: pending promises grow unboundedly under sustained EAI_AGAIN load #62503
Description
Version
v24.14.1
Platform
Linux 6.6.87.2-microsoft-standard-WSL2 x86_64 (Debian forky/sid)
Subsystem
dns
What steps will reproduce the bug?
When dns.promises.lookup() is called repeatedly for a hostname that triggers EAI_AGAIN (e.g. a nonsense TLD like 'asd'), getaddrinfo blocks its libuv thread for ~10 seconds before rejecting. Under sustained load, new lookups queue behind the blocked threads and the number of unresolved promises grows without bound.
In our production system this manifests as a memory leak: each pending promise holds references to the calling closure (in our case, per-conversation config objects), and none of them can be GC'd until getaddrinfo eventually returns.
Minimal reproduction — save as repro.mjs:
import { lookup } from 'node:dns/promises';
const HOSTNAME = 'asd'; // triggers EAI_AGAIN
const INTERVAL_MS = 2000;
const TOTAL = 50;
let pending = 0;
let started = 0;
let settled = 0;
function fire() {
const id = ++started;
++pending;
const t0 = Date.now();
console.log(`[#${id}] started (pending: ${pending})`);
lookup(HOSTNAME).then(
(result) => {
--pending;
++settled;
console.log(`[#${id}] resolved ${result.address} after ${Date.now() - t0}ms (pending: ${pending})`);
},
(err) => {
--pending;
++settled;
console.log(`[#${id}] rejected ${err.code} after ${Date.now() - t0}ms (pending: ${pending})`);
}
);
}
const interval = setInterval(() => {
if (started >= TOTAL) {
clearInterval(interval);
return;
}
fire();
}, INTERVAL_MS);
setInterval(() => {
console.log(`\n--- status: started=${started} settled=${settled} pending=${pending} ---\n`);
}, 5000).unref();
setTimeout(() => {
console.log(`\n=== FINAL: started=${started} settled=${settled} stuck=${pending} ===`);
if (pending > 0)
console.log(`LEAK CONFIRMED: ${pending} lookup(s) never settled.`);
else
console.log('No leak detected.');
process.exit(pending > 0 ? 1 : 0);
}, TOTAL * INTERVAL_MS + 60_000);Run:
node repro.mjs
How often does it reproduce? Is there a required condition?
Always
What is the expected behavior? Why is that the expected behavior?
Each lookup() call should resolve or reject in bounded time, regardless of the hostname. Under sustained load, the number of pending (unresolved) promises should stay bounded — ideally proportional to the libuv thread pool size (UV_THREADPOOL_SIZE, default 4), not to the total number of requests issued.
What do you see instead?
- Each
getaddrinfocall for'asd'blocks a libuv thread for ~10 seconds before rejecting withEAI_AGAIN. - Only ~2 rejections return per 10-second cycle (thread pool saturation).
- New lookups queue behind the blocked threads, so the pending count grows monotonically: 5 → 8 → 11 → 14 → 17 → 20 → …
- Promises are never settled until their turn comes, which under continuous load may be never.
Sample output (trimmed):
[#1] started (pending: 1)
[#2] started (pending: 2)
[#3] started (pending: 3)
[#4] started (pending: 4)
[#5] started (pending: 5)
[#6] started (pending: 6)
[#1] rejected EAI_AGAIN after 10031ms (pending: 5)
[#7] started (pending: 6)
[#2] rejected EAI_AGAIN after 10014ms (pending: 5)
--- status: started=7 settled=2 pending=5 ---
[#8] started (pending: 6)
...
[#12] started (pending: 9)
[#4] rejected EAI_AGAIN after 16024ms (pending: 8)
--- status: started=12 settled=4 pending=8 ---
...
[#32] started (pending: 21)
[#12] rejected EAI_AGAIN after 40068ms (pending: 20)
--- status: started=32 settled=12 pending=20 ---
The pending count never converges to zero. In a long-running server, this is effectively a memory leak because every queued promise retains references to its enclosing closure.
Additional information
UV_THREADPOOL_SIZEis unset (default 4).- The callback-based
dns.lookup()shows the same behavior — this is a libuv/getaddrinfo issue, not specific to the promises API. - Hostnames that fail fast (e.g.
ENOTFOUND) do not exhibit this problem. - The
EAI_AGAINtimeout (~10s per attempt) appears to come from the system resolver's retry/timeout settings, but the queuing behind the fixed-size thread pool is what makes it unbounded. - A possible mitigation from Node's side: support
AbortSignalindns.lookup()/dns.promises.lookup()so callers can cancel stale lookups, or expose a configurable per-lookup timeout.