Today I learned: How to match a portion of text that spans over multiple lines with a Javascript regular expression
Posted on
Problem
Let's say you have the following 5 lines long quote by Mark Twain:
Keep away from people who try to belittle your ambitions.
Small people always do that,
but the really great make you feel that you,
too,
can become great.
Now you wish to match, with a regular expression, from people
on line 1
,
and until really
on line 3
.
Hint: Spotlight on a very useful set of tokens
The token [^]
lets you match any character, including new line.
If you append the *
quantifier token to it, it matches what matches [^]
but between zero
and an unlimited
amount of times.
Beware though, because the token [^]
seems to only work in Javascript.
An alternative to it seems to be [\s\S]
.
\s
matches any whitespace character\S
matches any non-whitespace character
Solution
Using [^]*
/people[^]*.*really/m
This, according to regex101.com goes as follow:
people
matches the characterspeople
literally (case sensitive)[^]
matches any character, includingnewline
*
matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy).
matches any character (except for line terminators)*
matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)really
matches the charactersreally
literally (case sensitive)
Global pattern flag:
m modifier
: multi line. Causes ^ and $ to match the begin/end of each line
(not only begin/end of string)
Using [\s\S]*
/people[\s\S]*.*really/m
This, according to regex101.com goes as follow:
people
matches the characterspeople
literally (case sensitive)\s
matches any whitespace character (equivalent to[\r\n\t\f\v \u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]
)\S
matches any non-whitespace character (equivalent to[^\r\n\t\f\v \u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]
)*
matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy).
matches any character (except for line terminators)*
matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)really
matches the charactersreally
literally (case sensitive)
Global pattern flag:
m modifier
: multi line. Causes ^ and $ to match the begin/end of each line
(not only begin/end of string)