Skip to content

buffer: use simdutf for two-byte utf8 byteLength#63639

Open
mertcanaltin wants to merge 1 commit into
nodejs:mainfrom
mertcanaltin:mert/buffer-bytelength-utf8-simdutf
Open

buffer: use simdutf for two-byte utf8 byteLength#63639
mertcanaltin wants to merge 1 commit into
nodejs:mainfrom
mertcanaltin:mert/buffer-bytelength-utf8-simdutf

Conversation

@mertcanaltin
Copy link
Copy Markdown
Member

@mertcanaltin mertcanaltin commented May 29, 2026

I changed two-byte (UTF-16) string path, we are now switching from V8 Utf8LengthV2 to simdutf,

note: simdutf is only enabled for two-byte strings containing 512 code units or more

benchmark results:

➜  node git:(mert/buffer-bytelength-utf8-simdutf) ✗ node-benchmark-compare /tmp/nvb/result2.csv
                                                                                            confidence improvement accuracy (*)   (**)  (***)
buffers/buffer-bytelength-string.js n=4000000 repeat=1 encoding='utf8' type='four_bytes'                    0.56 %       ±1.66% ±2.21% ±2.87%
buffers/buffer-bytelength-string.js n=4000000 repeat=1 encoding='utf8' type='latin1'                        0.45 %       ±1.50% ±2.00% ±2.61%
buffers/buffer-bytelength-string.js n=4000000 repeat=1 encoding='utf8' type='one_byte'                      1.49 %       ±1.85% ±2.46% ±3.21%
buffers/buffer-bytelength-string.js n=4000000 repeat=1 encoding='utf8' type='three_bytes'                   0.02 %       ±1.40% ±1.87% ±2.45%
buffers/buffer-bytelength-string.js n=4000000 repeat=1 encoding='utf8' type='two_bytes'                     0.96 %       ±1.37% ±1.82% ±2.37%
buffers/buffer-bytelength-string.js n=4000000 repeat=16 encoding='utf8' type='four_bytes'          ***     28.95 %       ±1.10% ±1.47% ±1.92%
buffers/buffer-bytelength-string.js n=4000000 repeat=16 encoding='utf8' type='latin1'                *     -0.74 %       ±0.59% ±0.79% ±1.03%
buffers/buffer-bytelength-string.js n=4000000 repeat=16 encoding='utf8' type='one_byte'                     0.14 %       ±1.41% ±1.88% ±2.45%
buffers/buffer-bytelength-string.js n=4000000 repeat=16 encoding='utf8' type='three_bytes'          **      2.25 %       ±1.66% ±2.21% ±2.88%
buffers/buffer-bytelength-string.js n=4000000 repeat=16 encoding='utf8' type='two_bytes'                    1.38 %       ±1.73% ±2.30% ±2.99%
buffers/buffer-bytelength-string.js n=4000000 repeat=2 encoding='utf8' type='four_bytes'             *     -1.44 %       ±1.35% ±1.80% ±2.34%
buffers/buffer-bytelength-string.js n=4000000 repeat=2 encoding='utf8' type='latin1'                        0.09 %       ±1.29% ±1.72% ±2.24%
buffers/buffer-bytelength-string.js n=4000000 repeat=2 encoding='utf8' type='one_byte'                     -0.77 %       ±1.77% ±2.36% ±3.07%
buffers/buffer-bytelength-string.js n=4000000 repeat=2 encoding='utf8' type='three_bytes'                   0.86 %       ±1.65% ±2.20% ±2.86%
buffers/buffer-bytelength-string.js n=4000000 repeat=2 encoding='utf8' type='two_bytes'              *     -1.69 %       ±1.61% ±2.14% ±2.78%
buffers/buffer-bytelength-string.js n=4000000 repeat=256 encoding='utf8' type='four_bytes'         ***     36.25 %       ±1.23% ±1.65% ±2.19%
buffers/buffer-bytelength-string.js n=4000000 repeat=256 encoding='utf8' type='latin1'                     -0.20 %       ±0.24% ±0.32% ±0.41%
buffers/buffer-bytelength-string.js n=4000000 repeat=256 encoding='utf8' type='one_byte'             *     -0.72 %       ±0.63% ±0.84% ±1.10%
buffers/buffer-bytelength-string.js n=4000000 repeat=256 encoding='utf8' type='three_bytes'        ***     67.59 %       ±1.42% ±1.91% ±2.52%
buffers/buffer-bytelength-string.js n=4000000 repeat=256 encoding='utf8' type='two_bytes'          ***     67.62 %       ±1.23% ±1.65% ±2.18%

Be aware that when doing many comparisons the risk of a false-positive result increases.
In this case, there are 20 comparisons, you can thus expect the following amount of false-positive results:
  1.00 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.20 false positives, when considering a   1% risk acceptance (**, ***),
  0.02 false positives, when considering a 0.1% risk acceptance (***)
➜  node git:(mert/buffer-bytelength-utf8-simdutf) ✗

@nodejs-github-bot nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels May 29, 2026
Signed-off-by: Mert Can Altin <mertgold60@gmail.com>
@mertcanaltin mertcanaltin force-pushed the mert/buffer-bytelength-utf8-simdutf branch from c5e15ea to f382a03 Compare May 29, 2026 10:29
@codecov
Copy link
Copy Markdown

codecov Bot commented May 29, 2026

Codecov Report

❌ Patch coverage is 52.38095% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.30%. Comparing base (4d21e86) to head (f382a03).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
src/node_buffer.cc 52.38% 8 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #63639      +/-   ##
==========================================
+ Coverage   90.29%   90.30%   +0.01%     
==========================================
  Files         730      730              
  Lines      234773   234821      +48     
  Branches    43953    43959       +6     
==========================================
+ Hits       211996   212066      +70     
+ Misses      14495    14484      -11     
+ Partials     8282     8271      -11     
Files with missing lines Coverage Δ
src/node_buffer.cc 68.10% <52.38%> (-0.28%) ⬇️

... and 39 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants