Tag Archives: php

PHP preg_replace unicode word boundary

I run into a problem at the beginning of the year and I just  figured what the problem is and how to fix it.

I have unicode eastern james bay cree roman orthography to syllabics converter.

When the all characters have been converted from roman to syllabics the final ᐅ has to be change to ᐤ only if it is the final u or occurs before a final h.

I decided to use regex in my PHP converter

$find ='/(\p{Lo})ᐅ(᙮|ᐦ)?\b/ui';

when I run a test wiiuh

  • WAMP ᐧᐄᐤᐦ
  • LAMP ᐧᐄᐅᐦ ᐧᐄᐅᐦ needs changed to ᐧᐄᐤᐦ

It turns out the word boundary mark \b in linux is not unicode aware so I have to swap that with (?!\pL)


$find = '/(\p{Lo})ᐅ(᙮|ᐦ)?(?!\pL)/ui';
$processed = preg_replace($find, '\1ᐤ\2', $processed);

That fixed the problem