- Published on
Node.js CPU intensive operations
- Authors
- Name
- Nazarii Mural
- @NazariiMural
Node.js model
Node.js programming model is a single-threaded event loop that has access to asynchronous operations with the help of libuv. And libuv allows Node.js to perform asynchronous operations that use native OS interfaces, which are most of the time asynchronous by default. Node.js programming model is not a product of JavaScript the language itself. It's a product of how JavaScript
is deployed in popular environments like Node.js and browsers etc.
What are WorkerThreads?
The worker_threads
module enables the use of threads that execute JavaScript in parallel. WorkerThreads are more heavyweight than an OS-level thread. When you create a new Worker using worker_threads
module Node.js fires up a new JavaScript VM, initializes a new global context, sets up a new heap, starts up a new garbage collector, allocates memory, etc... In Node.js, memory updated in one thread will not be visible to the others (variables set on one side aren't visible to the other). To share memory between main thread and workers you can use a SharedArrayBuffer
to share memory between the threads.
When WorkerThreads are useful?
worker_threads
is useful when you want to perform CPU-intensive tasks in your Node.js application. The idea behind WorkerThreads is that they are useful for getting CPU-intensive code out of your main Event Loop (particularly useful for servers) so you can fire up one or more WorkerThreads to handle CPU-intensive work and keep the main thread Event Loop free and responsive to incoming events/networking and so on.
What is the difference between cluster and WorkerThreads in Node.js?
cluster
module allows you to run multiple instances of Node.js in separate processes, while the worker_thread
module allows you to run multiple instances of Node.js in the same process.
Useful features of WorkerThreads?
SharedArrayBuffer
- lets you share memory between threads (limited to binary data).
Atomics
- an atomic operation is a single, indivisible action that is performed on data.
MessageChannel
- represents an asynchronous, two-way communications channel used for communicating between threads.
WorkerData
- used to pass startup data to WorkerThread.
CPU intensive task
This example shows us how to divide big jobs (in this case count to the 1e10
number) into smaller parallel chunks of work. The most simple and straightforward example is to count to some big number. We are going to use Node.js WorkerThreads
module and Go goroutine functions. Goroutines provide the simplest way to achieve concurrency and incredible performance baked in Go language. It isn't Node.js vs Go comparison, it's a known fact that Node.js is not good at CPU intensive tasks.
Node.js WorkerThreads example
const { Worker, threadId, parentPort, isMainThread } = require("worker_threads");
/*
What data type to use?
It is always nice to think about memory consumption,
since WorkerThreads are fatty ones we wanna use as little memory as possible,
but not in our case, since we are going to manipulate big numbers
Int8Array - from -127 to 127 - to small
Uint8Array - from 0 to 255 - to small
Uint16Array - from 0 to 65535 - to small
Uint32Array - from 0 to 4294967295 - the most suitable
*/
// STATIC VARIABLES
const MESSAGE_TYPES = { INIT_WORKER: 1, WORKER_JOB_DONE: 2 };
const WORKER_AMOUNT = 4;
const EXPECTED_RESULT = 1e10;
const WORKER_FILE = "./worker.js";
// DYNAMIC VARIABLES
// Allocate memory for ${WORKER_AMOUNT} integers
const sab = new SharedArrayBuffer(Uint32Array.BYTES_PER_ELEMENT * WORKER_AMOUNT);
const arr = new Uint32Array(sab);
let t0;
for (let i = 0; i < WORKER_AMOUNT; i++) {
const worker = new Worker(WORKER_FILE, {
workerData: { MESSAGE_TYPES, WORKER_AMOUNT, EXPECTED_RESULT },
});
worker.on("online", () => {
if (!t0) t0 = performance.now();
});
worker.on("message", (msg) => {
if (msg.type === MESSAGE_TYPES.WORKER_JOB_DONE && arr.every(Boolean)) {
handleResult();
}
});
// Share data between workers
worker.postMessage({ type: MESSAGE_TYPES.INIT_WORKER, data: arr });
}
const handleResult = () => {
const t1 = performance.now() - t0;
const result = arr.reduce((acc, v) => (acc += v), 0);
console.log(
`To count to the ${EXPECTED_RESULT} took: ${t1 / 1000}s`
);
process.exit(0);
};
worker.js
file
We are using threadId
here to avoid race conditions. Each thread has its own slot inside shared memory interface.
const { parentPort, workerData, threadId } = require("worker_threads");
const { MESSAGE_TYPES, EXPECTED_RESULT, WORKER_AMOUNT } = workerData;
const workerRunJob = () => {
const maxValue = EXPECTED_RESULT / WORKER_AMOUNT;
let cnt = 0;
for (let i = 0; i < maxValue; i++) {
cnt++;
}
return cnt;
};
parentPort.on("message", (msg) => {
if (msg.type === MESSAGE_TYPES.INIT_WORKER) {
// Set data to the SharedArrayBuffer
// `threadId` to avoid race conditions.
msg.data[threadId - 1] = workerRunJob();
// Post message about SharedArrayBuffer was updated.
parentPort.postMessage({ type: MESSAGE_TYPES.WORKER_JOB_DONE });
}
});
Go and goroutines
A goroutine is a lightweight thread managed by the Go runtime they are functions and methods that run concurrently with other functions and methods. Goroutines run in the same address space, so access to shared memory must be synchronized. Equivalent example in Go is much more simple and easy to follow.
package example
import (
"fmt"
"sync"
"time"
)
// STATIC VARIABLES
const GOROUTINE_AMOUNT int = 4
const EXPECTED_RESULT int = 1e10
// DYNAMIC VARIABLES
var data_store [GOROUTINE_AMOUNT]int
var wg sync.WaitGroup
func routine(i int) {
defer wg.Done()
var count int = 0
var maxValue int = EXPECTED_RESULT / GOROUTINE_AMOUNT
for i := 0; i < maxValue; i++ {
count++
}
data_store[i] = count
}
func ExampleGoroutine() {
t0 := time.Now()
for i := 0; i < GOROUTINE_AMOUNT; i++ {
wg.Add(1)
go routine(i)
}
wg.Wait()
t1 := time.Since(t0)
result := getSum(&data_store)
fmt.Printf("\nTo count to the %v took: %v",
EXPECTED_RESULT, t1)
}
// helper func to sum up numbers in array
func getSum(arr *[GOROUTINE_AMOUNT]int) int {
result := 0
for i := 0; i < len(arr); i++ {
result = result + arr[i]
}
return result
}
And that is it, just import example
in your main.go
and run using go run main.go
command.
package main
import (
"goroutine/example"
)
func main() {
example.ExampleGoroutine()
}
Tests
In order to test we must have some AWS EC2
instance with at least 2 physical core and 4 logical ones. And at least 8 GB of RAM.
But since the author of this article is poor (not cheap) who can only afford Free Tire instance. But due to Free Tire generosity we can't use this mega machine 😭.
So we are going to use local PC 🥳, I know that those results can't be precise and gonna be dependent on staff that currently running on local pc with and so on... But it's the only choice for those examples.
Spec: i5-1135G7 CPU @ 4.20GHz and 16GB of RAM.
Node.js results
#1 To count to the 10000000000 took: 6.796850600000471s
#2 To count to the 10000000000 took: 6.79258940000087s
#3 To count to the 10000000000 took: 6.88838369999826s
#4 To count to the 10000000000 took: 6.807960499998182s
Go results
#1 To count to the 10000000000 took: 2.1970919s
#2 To count to the 10000000000 took: 2.2326607s
#3 To count to the 10000000000 took: 2.3754543s
#4 To count to the 10000000000 took: 2.2233289s
As may be expected for CPU bound tasks - Go
may be considered a better solution Node.js
.
Accessing variable in global scope
While working on Node.js example, I've noticed that frequent accessing of a variable outside of its local scope could have huge impact on performance while using V8.
Accessing variables in local scope.
function run() {
const RESULT = 1e9;
let cnt = 0;
for (let i = 0; i < RESULT; i++) {
cnt++;
}
}
const t0 = performance.now();
run();
console.log(performance.now() - t0); // 723.72
Accessing variables outside of local scope.
const RESULT = 1e9;
let cnt = 0;
function run() {
for (let i = 0; i < RESULT; i++) {
cnt++;
}
}
const t0 = performance.now();
run();
console.log(performance.now() - t0); // 3631.19
Local variables are much much more efficient. As may be seen using local variable job took: 723.72ms
, but with the global ones the same job took: 3631.19ms
, which is 5x slower.
But why?
The smaller the scope, the better. The explanation is that variables should live in the smallest scope possible. The variable in the global namespace will have much poorer performance. A variable in the global namespace could potentially be changed by something external, forcing the interpreter to reload the value each time through the loop. Using a local variable, the JIT engine can optimize the loop down to a few machine cycles per iteration.