When to use which RegExp function in JavaScript

Although the MDN pages do a good job in explaining what the different RegExp functions do exactly and what the differences between them are, they can be a little confusing if you know what you want to do, but not which function to call.

So here is a breakdown, grouped by what you want to do.
As is so often the case, I made this list mostly for myself, but I think other people may benefit from it too.

You simply want to know if a string contains a certain pattern

RegExp.test(String) returns true if the pattern can be found, false otherwise.

let str = 'The quick brown fox jumps over a lazy dog';
let result = /\w+o\w+/.test(str);
// result will be true

You want to know where in the string a pattern occurs

String.search(RegExp) returns the index, or -1 if not found.

let str = 'The quick brown fox jumps over a lazy dog';
let result = str.search(/\w+o\w+/);
// result will be 10

If there are multiple matches, it will return the index of the first one.

Retrieve the substring matched by the pattern

String.match(RegExp) and RegExp.exec(String) each return an array, the first element of which is the first match.
They return null if not found.

let str = 'The quick brown fox jumps over a lazy dog';
let result = str.match(/\w+o\w+/); if (result) result = result[0];
// result will be 'brown'

Count how many times the pattern occurs in the string

String.match(RegExp) on a regex with the g flag returns an array of matches (or null if not found).

So just take the length (or use 0 if the result is null). In this particular example, there are three matches and the outcome is [‘brown’, ‘fox’, ‘dog’];

let str = 'The quick brown fox jumps over a lazy dog';
let result = str.match(/\w+o\w+/g); result = result ?result.length :0;
// result will be 3

Retrieve a list of all substrings matched by the pattern

String.match(RegExp) on a regex with the g flag returns an array of matches (or null if not found).

If there are multiple matches in the string for the pattern, the returned array will contain all of them. In this particular case, the result will be [‘brown’, ‘fox’, ‘dog’];

let str = 'The quick brown fox jumps over a lazy dog';
let result = str.match(/\w+o\w+/g);
// result will be ['brown', 'fox', 'dog']

Retrieve the match and its capturing groups

String.match(RegExp) without the g flag and RegExp.exec(String)
each return an array, the first element of which is the first match and the following elements are the matches for the capturing groups (or the result is null if not found).

let str = 'The quick brown fox jumps over a lazy dog';
let result = str.match(/(\w+)o(\w+)/);
// result will be ['brown', 'br', 'wn'];

Retrieve all matches, the indexes at which they are found in the string and all their capturing groups

RegExp.exec(String) with the g flag returns an array with the info you want for the first match (or null if not found).
To get to the rest of the matches, you have to call the exec function repeatedly with the same RegExp variable, until it returns null. So this is a tad more work, but not a lot.

let str = 'The quick brown fox jumps over a lazy dog';
let rex = /(\w+)o(\w+)/g;
let allRes = [];
while ((result = rex.exec(str))!=null)
  allRes.push('n='+result.shift()+' i='+result.index+' g='+result.join('/'));
// allRes will be ['n=brown i=10 g=br/wn', 'n=fox i=16 g=f/x', 'n=dog i=38 g=d/g']

Note that this requires a RegExp variable, because it needs to remember the location at which it found its last result, which it starts off from on the next go through the loop. A regex literal, like result = /(\w+)o(\w+)/g.exec(str), won’t do; this would reinitialise the regex each time and so it would always return the first match.

Or, alternatively…

If you don’t want to remember all these different function calls, know that there is one function which has all these features built in: exec! That’s all you need to remember. Make sure to use the g flag.

let str = 'The quick brown fox jumps over a lazy dog';
let rex = /(\w+)o(\w+)/g;
let result = rex.exec(str);
// To test if the pattern occurs, return true here if the result is not null or false otherwise
// For the location of the pattern, return result.index if the result is not null, or else -1
// For the (first) matching substring, return result[0] if the result is not null
// Other results need some more code, like above
let allRes = [];
while (result!=null) {allRes.push(result); result = rex.exec(str);}
// Now to retrieve the number of matches, return allRes.length
// For the matches themselves, return allRes.map(el => el[0])
// etc. You get the idea.

That’s about it.
I want to close with a heads-up: this mechanism (fetching the next result if you call the function repeatedly while using the g flag) is also used by the test function. So if you use that in a loop for unrelated reasons, you may get unexpected results:

let str = 'The quick brown fox jumps over a lazy dog';
let rex = /(\w+)o(\w+)/g;
for (let i = 1; i<=10; ++i) {
  do stuff;
  if (rex.test(str)) do other stuff, but only if str contains rex;
  do more stuff;
}

This will behave the way you want the first three times through the loop, but it will fail after that!
Solution: don’t use the g flag, or call rex.test(str) once and put it in a variable to use later.

Advertisement

HTML in WordPress

This was supposed to be a blog about how browsers handle ruby annotation. With live examples of different HTML snippets, demonstrating how your browser renders those.

Unfortunately, it turns out that the WordPress editor can’t handle esoteric markup like ruby very well; it removes many of the elements, leaving the examples crippled. Shame.

So does anybody know how to insert raw HTML in a WordPress post? I mean, without it being changed? Let me know in the comments!

In the meantime, if you want to know about the situation with ruby and how your browser handles it, you can read the blog post here: http://strictquirks.nl/standards/the-situation-with-ruby-2020.xhtml

html {font-size:62.5%} is a mistake

If you find yourself using html{font-size:62.5%} in your stylesheet, ask yourself why you are doing it.

You may argue that font sizes of 10px are easier to calculate with than font sizes of 16px. But you’d forget a few things.
Firstly, 62.5% of the user’s preferred font size is not 10px. Well, it might be, if the user’s preferred size is 16px, but then again, it might not be!
If you want the root font size to be 10px, then why don’t you make it 10px? Why don’t you write html{font-size:10px}? Tell me that.
You may say that if you use a percentage, you’re still respecting the user’s preferences: no matter what default font size they have, 1.6rem will still be their original. But that isn’t always true; it depends on many different factors.
Let’s say you have a line of text like <p style="font-size: 1.6rem"> This is the user's preferred size! </p> and the user has set a preferred font size of 15px. Then this line can come out at the following sizes:

  • 15px if you’re lucky
  • 14px if the browser rounds all sizes to whole pixels, so 62.5% of 15 becomes 9 and 1.6 times 9 becomes 14.
  • 9px if the browser doesn’t support rem. The p’s style attribute will be ignored.
  • 12px if the browser doesn’t support rem and the minimum font size is 12px.
  • 19px if the minimum size is 12px and the browser has corrected the root font size to this minimum size (that is, 1rem is now 12px).

(The last example might sound contrived, but let me tell you that one of the major browsers does indeed treat its sizes that way. Test thoroughly!)

And what about zooming in and out? If a user has their minimum font size at half the default font size, they can zoom out to 50% before those sizes start interfering. With your setup, they can only zoom out to 80% before running into issues. So test that too!

Secondly, even disregarding your flawed assumption about every user having 16px for a default, why do you want to make calculations with the font size? And why do you think it’s easier after html{font-size:62.5%}? If you want some header to be 24px, you can just write 24px. There is no need to change the html font size first and then write 2.4rem.
Also, if you hadn’t changed the html font size, you could have written 1.5rem. Why would 1.5rem be more difficult to work with than 2.4rem? In fact, not changing the html and writing 1.5rem will make it clearer that this is one and a half times the standard size. Much more intuitive.

Speaking of intuitive, you’re also messing with the predefined font sizes xx-small, x-small, small, medium, large, x-large and xx-large. After this treatment. these keywords don’t work as they should any more; small wil be larger than 1rem!

Now some people believe that if you use pixels, users will not be able to zoom in and out on their webpages. This isn’t true.

Oh, and I know there is a misconception about using pixels. Pixels are not good, because not all pixels are the same size. Well, let me tell you, not all percentages are the same size either! How about that.

And some people believe that you need to set a font size on the html, for whatever reason. That if you don’t, some things won’t work correctly. So they do html{font-size:100%} and think that they are doing the right thing. I am not sure where this misconception comes from.