Unicode replacement characters for text matching
- by Christian Harms
I have some fun with unicode text sources (all correct encodet) and I want to match names. The classic problem, one source comes correctly, an other has more flatten names:
"Elblag" vs. "Elblag" (see the character a)
How can I "flatten" a, á, â or à to a for better matching? Are there unicode to ascii- matching tables?