Welcome to Abdul Malik Ikhsan's Blog

Practical Regex 3: Using Nested Negative Lookahead

Posted in regex, tips and tricks by samsonasik on February 18, 2022

So, you need to verify a statement, that in the statement, something has NOT followed with other thing, except continue with another thing, for example:

I like foo                 << - not allowed
I like fooSomeRandomString << - not allowed
I like food                << - allowed
I like football            << - allowed
I like drawing             << - allowed
I dislike drawing          << - not allowed

First, let’s create a regex to make “I like” not followed with “foo” with Negative Lookahead “(?!foo)”:

I like (?!foo)[^ ]+

Above regex will match only for “I like drawing”, ref https://regex101.com/r/hAsvYH/1,

Important note:

I used caret for negation character class [^ ] for not space just to ensure it exactly has word found next with non space after collection of character so we can see the result of next word.

Next, continue with another requirement, except “food” and “football”, so we make another Negative Lookahead inside Negative Lookahead “(?!foo(?!d|tball))”:

I like (?!foo(?!d|tball))[^ ]+

Above regex will verify next followed with “foo” is allowed if next of “foo” is “d” or “tball”, so it will matches:

I like food
I like football
I like drawing

ref https://regex101.com/r/hAsvYH/2

Next, how to get “food”, “footbal”, and “drawing”? You can use “Named Capture Group” that cover “Nested Negative Lookahead + “character class [^ ]+” so the word next will be included as a “Named Capture Group”, like the following:

I like (?<like>(?!foo(?!d|tball))[^ ]+)

Above, the word result is captured with named “like”, ref https://regex101.com/r/hAsvYH/3

Let’s use real code to get it, for example, with PHP:

$text = <<<TEXT
I like foo                 << - not allowed
I like fooSomeRandomString << - not allowed
I like food                << - allowed
I like football            << - allowed
I like drawing             << - allowed
I dislike drawing          << - not allowed
TEXT;

$pattern = '#I like (?<like>(?!foo(?!d|tball))[^ ]+)#';
preg_match_all($pattern, $text, $matches);

foreach($matches['like'] as $like) {
    echo $like . PHP_EOL;
}

Ref https://3v4l.org/R7TfC

That’s it 😉

References:

Practical Regex 2: Using Named Backreference in Positive Lookahead

Posted in regex, tips and tricks by samsonasik on September 15, 2021

When we have the following namespaced classname value:

App\Domain\Foo\FooRepository

and our task is to get “Foo” value as Domain group with the criteria:

  • must have App\Domain\ prefix
  • must have \$Domain name + “Repository” suffix which $Domain must match previous sub-namespace, on this case, “Foo”

we can use the following regex with Positive Lookbehind to filter prefix and Positive Lookahead to filter suffix:

(?<=App\\Domain\\)(?<Domain>[A-Z][a-z]{1,})(?=\\\1Repository)

Above, we find $Domain with App\Domain\ before it with Positive Lookbehind, and $DomainRepository after it with Positive Lookahead.

We are using backreference with \1 to match the exact same text of the first capturing group. If you code in PHP, you can do like this:

$pattern   = '#(?<=App\\\\Domain\\\\)(?<Domain>[A-Z][a-z]{1,})(?=\\\\\1Repository)#';
$className = 'App\Domain\Foo\FooRepository';

preg_match($pattern, $className, $matches);
if ($matches !== []) {
    echo $matches['Domain'];
}

What if the code is getting more complex, like in my previous post for named capturing group, you need to remember the numbered index! To handle it, you can use named backreference for it, so the regex will be:

(?<=App\\Domain\\)(?<Domain>[A-Z][a-z]{1,})(?=\\\k<Domain>Repository)

Now, you are exactly know what backrefence reference to. If you code in PHP, you can do like this:

$pattern   = '#(?<=App\\\\Domain\\\\)(?<Domain>[A-Z][a-z]{1,})(?=\\\\\k<Domain>Repository)#';
$className = 'App\Domain\Foo\FooRepository';

preg_match($pattern, $className, $matches);
if ($matches !== []) {
    echo $matches['Domain'];
}

That’s it 😉

Ref: https://www.regular-expressions.info/backref.html

Practical Regex 1: Using Named Capturing Groups

Posted in regex, tips and tricks by samsonasik on September 3, 2021

Still using numbered group of capture value in Regex? You may forget, or the index changed if the requirement changed. For example, you want to get a csrf value from a form field with the following regex example:

name="csrf" value="(.{32})"

For input field csrf with 32 chars value “4X0ZfDKr71KHCec7SOkoJ5onq1PTCN3v”, you want to get the value, you will need to get index 1 for it with PHP:

<?php

$pattern = '#name="csrf" value="(.{32})"#';
$content = <<<'HTML_CONTENT'
<form>
    <input type="hidden" name="csrf" value="4X0ZfDKr71KHCec7SOkoJ5onq1PTCN3v" />
    <input type="submit" />
</form>
HTML_CONTENT;

preg_match($pattern, $content, $matches);
if ($matches !== []) {
    echo $matches[1];
}

To handle the possible forgotten index or changed index that can create a bug, you can use named capturing groups, so you can change to:

name="csrf" value="(?<csrf_value>.{32})"

Now, you can get it easily:

<?php

$pattern = '#name="csrf" value="(?<csrf_value>.{32})"#';
$content = <<<'HTML_CONTENT'
<form>
    <input type="hidden" name="csrf" value="4X0ZfDKr71KHCec7SOkoJ5onq1PTCN3v" />
    <input type="submit" />
</form>
HTML_CONTENT;

preg_match($pattern, $content, $matches);
if ($matches !== []) {
    echo $matches['csrf_value'];
}

That’s it 😉

Ref: https://www.regular-expressions.info/named.html