Practical Guides

How to find Node.js performance bottlenecks without rewriting everything

Before migrating to Go or Rust, find out where your Node.js actually stalls. A production profiling playbook that finds the real cause, not the guess.

Practical GuidesJohnny Carreiro·January 13, 2026·3 min read

Every time a Node.js API gets slow, the same hallway suggestion appears: "Node doesn't scale, migrate to Go." In practice, in the vast majority of cases, the problem is not the language — it is what your code does with it. Before considering a months-long rewrite, it is worth spending a few hours finding out where the time actually goes.

Profiling in production, not in dev

The first mistake is measuring in the wrong environment. In dev, with no real load, no production data volume, and no concurrency, the numbers lie. An endpoint that looks instant on your machine may be blocking the event loop under 200 concurrent requests.

Start with observability: response times per route (P50, P95, P99), CPU and memory usage, and query count per request. Tools like clinic.js, 0x (for flamegraphs), and node --inspect itself with heap snapshots give you the picture you need — ideally over real traffic or a replay of it.

The four usual suspects

In practice, Node.js bottlenecks almost always fall into one of these four patterns.

Blocked event loop. Node is single-threaded for your code. A heavy synchronous operation — a JSON.parse of a giant payload, synchronous crypto, a loop over thousands of items — freezes every other request while it runs. The symptom is P99 spiking while the average looks fine. The flamegraph points to the guilty function.

N+1 at the database. The ORM hides queries. Listing 100 orders and, for each one, fetching the customer generates 101 queries. Under load, the database becomes the bottleneck and Node just waits. Logging the query count per request reveals it instantly.

Memory leak. A cache that never expires, a growing array of listeners, a closure holding a reference. Memory climbs step by step and the garbage collector pauses more and more. Heap snapshots taken at different moments and compared show what is not being released.

Connection pool leak. Database connections that never return to the pool exhaust the limit, and requests start waiting for a free connection. The symptom is rising latency without high CPU — the process is just waiting.

The 80/20 rule

Resist the urge to optimize everything. In almost every system, three to five causes explain 80% of the problem. The flamegraph and query logs show which ones. Fix those, measure again, and only then decide whether it is worth going further.

A concrete example: we have seen an API with an 8-second P99 drop to under 400ms by fixing a single N+1 and adding an index — no language change, no rewrite. The migration to Go the team was considering would have cost months and probably recreated the same N+1s in another syntax.

When a rewrite actually makes sense

There are legitimate cases: pure CPU-bound workloads (image processing, heavy cryptography, parsing large volumes) where the single thread is a real limit. But those are a minority, and even there the answer is usually a worker thread or a dedicated service for the hot part — not rewriting the whole application.

The checklist before deciding

Before approving any rewrite for performance, go through: a flamegraph under real load, query count per request, compared heap snapshots, connection pool metrics, and P99 per route. If you do not have those five numbers, you do not yet know where the problem is — you only have an expensive guess.

Performance is diagnosis before surgery. A rewrite is the most expensive and riskiest option; almost always, there is a surgical fix that returns most of the gain for a fraction of the cost.