Welcome to Abdul Malik Ikhsan's Blog

Practical Regex 2: Using Named Backreference in Positive Lookahead

Posted in regex, tips and tricks by samsonasik on September 15, 2021

When we have the following namespaced classname value:

App\Domain\Foo\FooRepository

and our task is to get “Foo” value as Domain group with the criteria:

  • must have App\Domain\ prefix
  • must have \$Domain name + “Repository” suffix which $Domain must match previous sub-namespace, on this case, “Foo”

we can use the following regex with Positive Lookbehind to filter prefix and Positive Lookahead to filter suffix:

(?<=App\\Domain\\)(?<Domain>[A-Z][a-z]{1,})(?=\\\1Repository)

Above, we find $Domain with App\Domain\ before it with Positive Lookbehind, and $DomainRepository after it with Positive Lookahead.

We are using backreference with \1 to match the exact same text of the first capturing group. If you code in PHP, you can do like this:

$pattern   = '#(?<=App\\\\Domain\\\\)(?<Domain>[A-Z][a-z]{1,})(?=\\\\\1Repository)#';
$className = 'App\Domain\Foo\FooRepository';

preg_match($pattern, $className, $matches);
if ($matches !== []) {
    echo $matches['Domain'];
}

What if the code is getting more complex, like in my previous post for named capturing group, you need to remember the numbered index! To handle it, you can use named backreference for it, so the regex will be:

(?<=App\\Domain\\)(?<Domain>[A-Z][a-z]{1,})(?=\\\k<Domain>Repository)

Now, you are exactly know what backrefence reference to. If you code in PHP, you can do like this:

$pattern   = '#(?<=App\\\\Domain\\\\)(?<Domain>[A-Z][a-z]{1,})(?=\\\\\k<Domain>Repository)#';
$className = 'App\Domain\Foo\FooRepository';

preg_match($pattern, $className, $matches);
if ($matches !== []) {
    echo $matches['Domain'];
}

That’s it 😉

Ref: https://www.regular-expressions.info/backref.html

Practical Regex 1: Using Named Capturing Groups

Posted in regex, tips and tricks by samsonasik on September 3, 2021

Still using numbered group of capture value in Regex? You may forget, or the index changed if the requirement changed. For example, you want to get a csrf value from a form field with the following regex example:

name="csrf" value="(.{32})"

For input field csrf with 32 chars value “4X0ZfDKr71KHCec7SOkoJ5onq1PTCN3v”, you want to get the value, you will need to get index 1 for it with PHP:

<?php

$pattern = '#name="csrf" value="(.{32})"#';
$content = <<<'HTML_CONTENT'
<form>
    <input type="hidden" name="csrf" value="4X0ZfDKr71KHCec7SOkoJ5onq1PTCN3v" />
    <input type="submit" />
</form>
HTML_CONTENT;

preg_match($pattern, $content, $matches);
if ($matches !== []) {
    echo $matches[1];
}

To handle the possible forgotten index or changed index that can create a bug, you can use named capturing groups, so you can change to:

name="csrf" value="(?<csrf_value>.{32})"

Now, you can get it easily:

<?php

$pattern = '#name="csrf" value="(?<csrf_value>.{32})"#';
$content = <<<'HTML_CONTENT'
<form>
    <input type="hidden" name="csrf" value="4X0ZfDKr71KHCec7SOkoJ5onq1PTCN3v" />
    <input type="submit" />
</form>
HTML_CONTENT;

preg_match($pattern, $content, $matches);
if ($matches !== []) {
    echo $matches['csrf_value'];
}

That’s it 😉

Ref: https://www.regular-expressions.info/named.html