Parsing JSON using worker_thread would not technically be benificial. The below article is just to demonstrate how you can utilize worker_threads in Node.JS
In my recent project, I had a task where a user would be able to import data. Data will be in JSON format. The file size could be as large as 200MB. So, the challenge was to parse a large JSON file. Importing a file is a simple and typical operation but, when we are talking about Node.JS we have to be very careful about it. Because uploading the large file in the system is not the issue. The issue is parsing the large file. JSON parsing is a CPU intensive task. This means, while parsing the JSON, the system will not be able to perform any other task (ex: response to other requests). Why is that? Because JSON.parse() function will block the main thread from performing other operations. In other words, the event loop will not respond until the JSON.parse() is done with it. This is a very important concept to understand. Here is an official article about why you should not block the event loop. You will find some recommendations about parsing JSON in that official article as well.
How to parse JSON without blocking the event loop in Node.JS?
So, the main goal was to parse large JSON files without affecting the main thread aka event loop.
- Asynchronous JSON parsing using Big-friendly JSON (BFJ) (recommended)
- Worker threads (for the sake of this article)
I already had a solution in my mind which was stream parsing the JSON using a very popular module named BFJ aka Big-friendly JSON. BFJ is not as fast as
JSON.parse() function because it parses some data then let the event loop handle some other tasks. I have implemented it anyway and let the client test it. In my development environment (Macbook pro 2015 with 16GB memory) BFJ took 7-8 seconds to parse 20MB of data. I thought I might use worker_threads to offload the task to some other thread. The Node.JS worker threads module is already stable.
This is what we need, right? My main program will run normally and I will offload the JSON parsing task to some other threads. So, lets do this. First thing first, import necessary stuff and create a new thread.
In the code above we are checking that if we are currently in main thread or not. If we are in main thread then create another thread. Also, listening to a few (
error) events. The
exit event callback would get executed when the worker thread exited by an error or manually being exited (Example:
message event callback will get executed when the worker thread sends some data/message to main thread. And finally, the
error callback will get executed when the worker thread encounters an error.
The above code will run in main thread.
Notice the line number 6. It is one of the most important part. We have created a new thread but what would the thread do? Line number 6 defines which task to perform. We have addressed a file named
worker.js. Let's take a look at that file.
Unlike the main thread, we are now checking if we are in worker threads. If yes then listen to the
message event. Notice the usage of
parentPort. It is the bridge between main thread and worker thread. The
message event callback will get executed when the worker thread receives any message/data from main thread.
If you notice the above app.js and worker.js file carefully then you will see we are not sending any data to worker thread yet. To send a message or some data to worker thread, from main thread we have to use
postMessage function like this
worker.postMessage(payload). So finally, the main.js file would look like this.
Full data flow
Let's assume our app has a REST endpoint that receives a file. The file contains texts in valid JSON format that we will parse and do some processing later. In our final main.js file, first we are reading the file and then sending the data to our worker thread (main.js, line 30). This data is received by the worker thread (worker.js, line 4) and
message event gets executed. Worker thread runs the
JSON.parse() and sends the data back to main thread (worker.js, line 9). When main thread receives the parsed JSON data the
message event gets emitted and it's callback gets executed (main.js, line 7). There you have your parsed JSON data. Do the necessary processing now.
From Node's official documentation:
In actual practice, use a pool of Workers instead for these kinds of tasks. Otherwise, the overhead of creating Workers would likely exceed their benefit.
When you transfer data from a worker thread to the main thread, the data gets copied. To prevent copy of the data you have to use either
SharedArrayBuffer. So in our case, when we transfer parsed JSON from worker thread to main thread it gets copied (NodeJS implicitly stringify the data from worker and send it to main thread and then call parse again). Which means, we do not get any benefit!