Postgresql UTF-8 weirdness

I was recently moving data from a Postgresql version 8.0 database to a 8.4 on a new server. The database dump was made on a UTF-8 system and it was moved to another system using the same encoding. However I was getting some errors when trying to recover the data. Several encoding errors where poping out. A closer inspection revealed that those were indeed a few encoding-rule violations. For some odd reason some data fields ended up with bad data, some double-byte characters had the first byte missing (it was 0xc2 in my case).

I solved the problem creating a small filter program to add the missing byte of these characters from the database dump. Not very nice but it worked. Why the problem developed in the first place I do not know.


#include <stdio.h>

main() {
int a,c;
while((c=getchar())!= EOF)
{
if(c>0xe0) continue;
if(c>0xc0) {
a=getchar();
if(a>0x7f) putchar(c); putchar(a);
} else if(c<=0x7f) putchar(c);
}
return(0);
}

Comments

Anonymous said…
Did you send this issue to the mailing list? Let the project find out what went wrong. If it's a bug, they can fix it.

But how did you make your backup? Did you use the lastest version of pgdump to create the backup of your old database-version? That's the way to go when upgrading.
Miguel Sánchez said…
I still have to look up the mailing list to see if it is a known issue, but you are right about reporting the problem.

However, my first idea being the data coming from unfiltered web forms through PHP through Apache is that I cannot directly blame Postgresql (other than maybe accepting UTF code violations as inputs and then repeating that when queried.

Regarding the backup I created it with pg_dump. Please note it was not a software upgrade but a system upgrade. Old data is still running happily in the old server while I used the dump to configure a new server, with a more recent version of Postgresql.

I'll report back in this entry what I find out if it may be relevant to this problem.

Popular posts from this blog

VFD control with Arduino using RS485 link

4xiDraw: Another pen plotter

Software I2C for Arduino