Category Archives: Code bank

bits of code

PHP preg_replace unicode word boundary

I run into a problem at the beginning of the year and I just  figured what the problem is and how to fix it.

I have unicode eastern james bay cree roman orthography to syllabics converter.

When the all characters have been converted from roman to syllabics the final ᐅ has to be change to ᐤ only if it is the final u or occurs before a final h.

I decided to use regex in my PHP converter

$find ='/(\p{Lo})ᐅ(᙮|ᐦ)?\b/ui';

when I run a test wiiuh

  • WAMP ᐧᐄᐤᐦ
  • LAMP ᐧᐄᐅᐦ ᐧᐄᐅᐦ needs changed to ᐧᐄᐤᐦ

It turns out the word boundary mark \b in linux is not unicode aware so I have to swap that with (?!\pL)


$find = '/(\p{Lo})ᐅ(᙮|ᐦ)?(?!\pL)/ui';
$processed = preg_replace($find, '\1ᐤ\2', $processed);

That fixed the problem

Frequently used MySQL commands (UTF-8 safe)

MySQL dump

mysqldump -u user -p database -r file.sql
to dump just the structure
mysqldump -u user -p --no-data database -r

MySQL import

mysql -u user -p --default-character-set=utf8 database

mysql> SOURCE file.sql

alternatively
mysql -u user -p --default-character-set=utf8 database < file.sql

MySQL string function

CONCAT

CONCAT('hi',' ','there','!')
UPDATE `table` SET `field`=CONCAT(`field`,'\r\nend')

REPLACE

REPLACE(field,'find','replace')
UPDATE `table` SET `field`=REPLACE(`field`,'hi','hello')