<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Double Byte's Blog]]></title><description><![CDATA[Double Byte's Blog]]></description><link>https://blog.doublebyte.dev</link><generator>RSS for Node</generator><lastBuildDate>Sun, 17 May 2026 15:50:58 GMT</lastBuildDate><atom:link href="https://blog.doublebyte.dev/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[✂️ Advanced Javascript Strings: Trim]]></title><description><![CDATA[Problem
Javscript provides the built-in method trim, and the newer trimStart / trimEnd methods [1]. These native methods will do a quick job of trimming whitespace and line terminators from the start and end of a string. How does this functionality w...]]></description><link>https://blog.doublebyte.dev/advanced-javascript-strings-trim</link><guid isPermaLink="true">https://blog.doublebyte.dev/advanced-javascript-strings-trim</guid><category><![CDATA[JavaScript]]></category><category><![CDATA[V8]]></category><category><![CDATA[Strings]]></category><category><![CDATA[Node.js]]></category><category><![CDATA[trim]]></category><dc:creator><![CDATA[Double Byte]]></dc:creator><pubDate>Wed, 17 Aug 2022 21:36:37 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/unsplash/pu20JkUx--A/upload/v1660772003736/kzrC6KNFJ.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-problem">Problem</h2>
<p>Javscript provides the built-in method <strong>trim</strong>, and the newer <strong>trimStart</strong> / <strong>trimEnd</strong> methods <a class="post-section-overview" href="#sources">[1]</a>. These native methods will do a quick job of trimming whitespace and line terminators from the start and end of a string. How does this functionality work, and how would we extend them to trim anything we want?</p>
<h2 id="heading-exploration">Exploration</h2>
<p>Let's first take a look at how Javascript does trimming. The ECMAScript standard (2015) <a class="post-section-overview" href="#sources">[2]</a> describes the trim method as a function that takes a String input and returns a copy of the input "with both leading and trailing white space removed. The definition of white space is the union of WhiteSpace and LineTerminator." So... space bar and enter key, done. right? not quite. Strings in Javascript are interpreted as "UTF-16 encoded code points" so we must include all possible values in this sequence mapping. Here are the definitions of whitespace and line terminators:</p>
<p>Table 32 — White Space Code Points <a class="post-section-overview" href="#sources">[3]</a></p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Code Point</td><td>Name</td><td>Abbreviation</td></tr>
</thead>
<tbody>
<tr>
<td>U+0009</td><td>CHARACTER TABULATION</td><td>TAB</td></tr>
<tr>
<td>U+000B</td><td>LINE TABULATION</td><td>VT</td></tr>
<tr>
<td>U+000C</td><td>FORM FEED (FF)</td><td>FF</td></tr>
<tr>
<td>U+0020</td><td>SPACE</td><td>SP</td></tr>
<tr>
<td>U+00A0</td><td>NO-BREAK SPACE</td><td>NBSP</td></tr>
<tr>
<td>U+FEFF</td><td>ZERO WIDTH NO-BREAK SPACE</td><td>ZWNBSP</td></tr>
<tr>
<td>Other category “Zs”</td><td>Any other end space only “Separator, space” code point</td><td>USP</td></tr>
</tbody>
</table>
</div><p>Table 33 — Line Terminator Code Points <a class="post-section-overview" href="#sources">[4]</a></p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Code Point</td><td>end space only Name</td><td>Abbreviation</td></tr>
</thead>
<tbody>
<tr>
<td>U+000A</td><td>LINE FEED (LF)</td><td>LF</td></tr>
<tr>
<td>U+000D</td><td>CARRIAGE RETURN (CR)</td><td>CR</td></tr>
<tr>
<td>U+2028</td><td>LINE SEPARATOR</td><td>LS</td></tr>
<tr>
<td>U+2029</td><td>PARAGRAPH SEPARATOR</td><td>PS</td></tr>
</tbody>
</table>
</div><p>There are several implementations of the ES standard in use, but for the purposes of this post, we will use Node.JS's v8 implementation <a class="post-section-overview" href="#sources">[5]</a></p>
<p>Here's the code that runs when you call .trim() on a string in earlier versions of v8:</p>
<pre><code class="lang-c"><span class="hljs-function">Handle&lt;String&gt; <span class="hljs-title">String::Trim</span><span class="hljs-params">(Handle&lt;String&gt; <span class="hljs-built_in">string</span>, TrimMode mode)</span> </span>{
  Isolate* <span class="hljs-keyword">const</span> isolate = <span class="hljs-built_in">string</span>-&gt;GetIsolate();
  <span class="hljs-built_in">string</span> = String::Flatten(<span class="hljs-built_in">string</span>);
  <span class="hljs-keyword">int</span> <span class="hljs-keyword">const</span> length = <span class="hljs-built_in">string</span>-&gt;length();
  <span class="hljs-comment">// Perform left trimming if requested.</span>
  <span class="hljs-keyword">int</span> left = <span class="hljs-number">0</span>;
  end space onlyCache* end space only_cache = isolate-&gt;end space only_cache();
  <span class="hljs-keyword">if</span> (mode == kTrim || mode == kTrimStart) {
    <span class="hljs-keyword">while</span> (left &lt; length &amp;&amp;
           end space only_cache-&gt;IsWhiteSpaceOrLineTerminator(<span class="hljs-built_in">string</span>-&gt;Get(left))) {
      left++;
    }
  }
  <span class="hljs-comment">// Perform right trimming if requested.</span>
  <span class="hljs-keyword">int</span> right = length;
  <span class="hljs-keyword">if</span> (mode == kTrim || mode == kTrimEnd) {
    <span class="hljs-keyword">while</span> (
        right &gt; left &amp;&amp;
        end space only_cache-&gt;IsWhiteSpaceOrLineTerminator(<span class="hljs-built_in">string</span>-&gt;Get(right - <span class="hljs-number">1</span>))) {
      right--;
    }
  }
  <span class="hljs-keyword">return</span> isolate-&gt;factory()-&gt;NewSubString(<span class="hljs-built_in">string</span>, left, right);
}
</code></pre>
<p>which roughly translates in Javscript to:</p>
<pre><code class="lang-js"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">faux_v8_trim</span>(<span class="hljs-params">str, mode</span>) </span>{
  <span class="hljs-keyword">const</span> length = str.length;
  <span class="hljs-keyword">let</span> left = <span class="hljs-number">0</span>;
  <span class="hljs-keyword">if</span> (mode === TRIM || mode === TRIMSTART) {
    <span class="hljs-keyword">while</span> (left &lt; length &amp;&amp;
      isWhiteSpaceOrLineTerminator(str.charCodeAt(left))) {
      left++;
    }
  }

  <span class="hljs-keyword">let</span> right = length;
  <span class="hljs-keyword">if</span> (mode === TRIM || mode === TRIMEND) {
    <span class="hljs-keyword">while</span> (right &gt; left &amp;&amp;
      isWhiteSpaceOrLineTerminator(str.charCodeAt(right - <span class="hljs-number">1</span>))) {
      right--;
    }
  }
  <span class="hljs-keyword">return</span> str.substring(left, right);
}
</code></pre>
<p>The more recent version (Node 16 LTS) <a class="post-section-overview" href="#sources">[6]</a> can be seen <a target="_blank" href="https://chromium.googlesource.com/v8/v8/+/refs/heads/9.4.146/src/builtins/string-trim.tq">here</a>. It still uses while loops, but adds the use of pointers :)</p>
<p>Indexes and pointers are powerful here because we know the string's full representation. Iterating only over necessary characters as opposed to searching the full string saves time and resources.</p>
<p>Now we know how trim works under the hood. If given the task of implementing trim, some programmers may think to use regular expressions. They are a viable option, although depending on the implementation, they will be slower and can be vulnerable to exploits. Let's try a few implementations and see the data.</p>
<h2 id="heading-solutions">Solutions</h2>
<h3 id="heading-regular-expressions">Regular Expressions</h3>
<p>To many, regex seems like the obvious choice, especially since the metacharacter <code>\s</code> will be very helpful.</p>
<pre><code class="lang-js">basic_re = <span class="hljs-regexp">/^[\s]+|[\s]+$/g</span>
</code></pre>
<p>this will get flagged by some code linters due to regex operation precedence: "In cases where it is intended that the anchors only apply to one alternative each, adding (non-capturing) groups around the anchors and the parts that they apply to will make it explicit which parts are anchored and avoid readers misunderstanding the precedence or changing it because they mistakenly assume the precedence was not intended." <a class="post-section-overview" href="#sources">[7]</a></p>
<p>so we can adjust it to:</p>
<pre><code class="lang-js">noncap_group = <span class="hljs-regexp">/(?:^[\s]+)|(?:[\s]+$)/g</span>
</code></pre>
<p>This regex is the most concise, but not always the most efficient since it will match twice if there is whitespace at both ends of the string.</p>
<p>We can also break this up into two operations:</p>
<pre><code class="lang-js">double_regex = str.replace(<span class="hljs-regexp">/^[\s]+/</span>, <span class="hljs-string">''</span>).replace(<span class="hljs-regexp">/[\s]+$/</span>, <span class="hljs-string">''</span>)
</code></pre>
<p>This should perform better on longer strings.</p>
<p>There are other regex solutions that involve backtracking, but they are slow and can be prone to security holes.</p>
<h3 id="heading-regex-loop">Regex + Loop</h3>
<p>A solution proposed in "High Performance JavaScript" <a class="post-section-overview" href="#sources">[8]</a> is a hybrid solution to combine the strengths of regular expressions on the beginning of the string, and an indexed loop on the end of the string for a best of both worlds approach:</p>
<pre><code class="lang-js"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">non_re_trim</span>(<span class="hljs-params">str</span>) </span>{
  <span class="hljs-keyword">var</span> start = <span class="hljs-number">0</span>,
    end = str.length - <span class="hljs-number">1</span>,
    ws =
      <span class="hljs-string">' \n\r\t\f\x0b\xa0\u1680\u180e\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u200b\u2028\u2029\u202f\u205f\u3000\ufeff'</span>
  <span class="hljs-keyword">while</span> (ws.indexOf(str.charAt(start)) &gt; <span class="hljs-number">-1</span>) {
    start++
  }
  <span class="hljs-keyword">while</span> (end &gt; start &amp;&amp; ws.indexOf(str.charAt(end)) &gt; <span class="hljs-number">-1</span>) {
    end--
  }
  <span class="hljs-keyword">return</span> str.slice(start, end + <span class="hljs-number">1</span>)
}
</code></pre>
<p>note the use of <code>indexOf</code> to search for whitespace and <code>slice</code> to render the final result</p>
<p>The main weakness of this version is long whitespace at the end of the string.</p>
<h2 id="heading-performance">Performance</h2>
<p><strong>All tests run on Node 16 LTS, browser results will vary</strong></p>
<p>Native trim and its JS couterpart are by far the fastest, next is the non-regex solution, with the regexes coming in last</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>test</td><td>method</td><td>ops/sec</td><td>pct error</td></tr>
</thead>
<tbody>
<tr>
<td>space at both ends</td><td>v8_trim</td><td>37,384,532</td><td>±2.40%</td></tr>
<tr>
<td>end space only</td><td>v8_trim</td><td>30,705,964</td><td>±1.60%</td></tr>
<tr>
<td>space at beginning only</td><td>v8_trim</td><td>29,038,258</td><td>±4.34%</td></tr>
<tr>
<td>end space only</td><td>faux_v8</td><td>16,094,768</td><td>±1.27%</td></tr>
<tr>
<td>space at beginning only</td><td>faux_v8</td><td>15,618,819</td><td>±1.69%</td></tr>
<tr>
<td>space at both ends</td><td>faux_v8</td><td>14,026,258</td><td>±3.31%</td></tr>
<tr>
<td>space at beginning only</td><td>non_re_trim</td><td>7,448,794</td><td>±5.81%</td></tr>
<tr>
<td>end space only</td><td>non_re_trim</td><td>7,273,079</td><td>±1.23%</td></tr>
<tr>
<td>space at both ends</td><td>non_re_trim</td><td>7,073,627</td><td>±1.56%</td></tr>
<tr>
<td>space at beginning only</td><td>hybrid_trim</td><td>6,742,686</td><td>±0.94%</td></tr>
<tr>
<td>space at beginning only</td><td>noncap_group</td><td>5,207,593</td><td>±1.09%</td></tr>
<tr>
<td>space at both ends</td><td>hybrid_trim</td><td>5,144,763</td><td>±1.35%</td></tr>
<tr>
<td>end space only</td><td>basic_re</td><td>5,068,264</td><td>±1.62%</td></tr>
<tr>
<td>end space only</td><td>noncap_group</td><td>5,038,307</td><td>±1.16%</td></tr>
<tr>
<td>space at beginning only</td><td>basic_re</td><td>4,947,144</td><td>±2.76%</td></tr>
<tr>
<td>end space only</td><td>hybrid_trim</td><td>4,811,567</td><td>±3.62%</td></tr>
<tr>
<td>space at both ends</td><td>noncap_group</td><td>4,693,936</td><td>±2.23%</td></tr>
<tr>
<td>space at both ends</td><td>basic_re</td><td>4,658,159</td><td>±1.57%</td></tr>
<tr>
<td>space at beginning only</td><td>double_regex</td><td>4,636,759</td><td>±1.42%</td></tr>
<tr>
<td>space at both ends</td><td>double_regex</td><td>4,247,326</td><td>±1.63%</td></tr>
<tr>
<td>end space only</td><td>double_regex</td><td>4,050,811</td><td>±2.53%</td></tr>
</tbody>
</table>
</div><hr />
<h2 id="heading-conclusion">Conclusion</h2>
<p>Sometimes the need arises for us to extend built-in functionality. Deep diving into the source is typically a good starting point. There may be faster ways of doing trimming. If you know of one, leave a comment below!</p>
<hr />
<h2 id="heading-sources">Sources</h2>
<p>[1] String.prototype.trimStart / String.prototype.trimEnd https://github.com/tc39/proposal-string-left-right-trim</p>
<p>[2] Standard ECMA-262 6th Edition / June 2015 - String.prototype.trim https://262.ecma-international.org/6.0/#sec-string.prototype.trim</p>
<p>[3] Standard ECMA-262 6th Edition / June 2015 - whitespace https://262.ecma-international.org/6.0/#sec-white-space</p>
<p>[4] Standard ECMA-262 6th Edition / June 2015 - Line Terminator Code Points https://262.ecma-international.org/6.0/#sec-line-terminators</p>
<p>[5] Chromium v8 https://chromium.googlesource.com/v8/v8/</p>
<p>[6] <a target="_blank" href="https://github.com/nodejs/node/blob/main/deps/v8/src/builtins/string-trim.tq">v8 github</a></p>
<p>[7] sonarsource regex security hotspot rule - https://rules.sonarsource.com/java/tag/regex/RSPEC-5850</p>
<p>[8] High Performance JavaScript [Book] - O'Reilly - https://www.oreilly.com/library/view/high-performance-javascript/9781449382308/</p>
<p><em>side note: Am I the only person who thinks "start" should only go with "finish" and "begin" with "end"? "start...end" seems a little off...</em></p>
]]></content:encoded></item></channel></rss>