Postgresql UTF-8 weirdness

I was recently moving data from a Postgresql version 8.0 database to a 8.4 on a new server. The database dump was made on a UTF-8 system and it was moved to another system using the same encoding. However I was getting some errors when trying to recover the data. Several encoding errors where poping out. A closer inspection revealed that those were indeed a few encoding-rule violations. For some odd reason some data fields ended up with bad data, some double-byte characters had the first byte missing (it was 0xc2 in my case).

I solved the problem creating a small filter program to add the missing byte of these characters from the database dump. Not very nice but it worked. Why the problem developed in the first place I do not know.


#include <stdio.h>

main() {
int a,c;
while((c=getchar())!= EOF)
{
if(c>0xe0) continue;
if(c>0xc0) {
a=getchar();
if(a>0x7f) putchar(c); putchar(a);
} else if(c<=0x7f) putchar(c);
}
return(0);
}

Comments

Anonymous said…
Did you send this issue to the mailing list? Let the project find out what went wrong. If it's a bug, they can fix it.

But how did you make your backup? Did you use the lastest version of pgdump to create the backup of your old database-version? That's the way to go when upgrading.
misan said…
I still have to look up the mailing list to see if it is a known issue, but you are right about reporting the problem.

However, my first idea being the data coming from unfiltered web forms through PHP through Apache is that I cannot directly blame Postgresql (other than maybe accepting UTF code violations as inputs and then repeating that when queried.

Regarding the backup I created it with pg_dump. Please note it was not a software upgrade but a system upgrade. Old data is still running happily in the old server while I used the dump to configure a new server, with a more recent version of Postgresql.

I'll report back in this entry what I find out if it may be relevant to this problem.

Popular posts from this blog

VFD control with Arduino using RS485 link

Importing OpenSCAD designs into Onshape

How to get sinusoidal s-curve for a stepper motor