OXP list fails to download

For test results, bug reports, announcements of new builds etc.

Moderators: winston, another_commander, Getafix

User avatar
hiran
Theorethicist
Posts: 2403
Joined: Fri Mar 26, 2021 1:39 pm
Location: a parallel world I created for myself. Some call it a singularity...

Re: OXP list fails to download

Post by hiran »

timer wrote: Tue Sep 12, 2023 1:32 pm
if in hosts file exists

Code: Select all

195.74.38.129 oolite.org addons.oolite.org
check direct download from origin server http://addons.oolite.org/api/1.0/overview
I got corrupted (bad utf-8 symbol) file.

At this moment, after downloading, my script (on VPS) fix this and then upload file to GitHub.
But what happened - I don't know :(
My actions have no any effect on the old server.
I downloaded the file and tried to open it using an editor with UTF-8 encoding. It reported illegal character sequences but did not point out which ones.
Running file reveals not much either:

Code: Select all

$ curl -o overview.plist http://addons.oolite.org/api/1.0/overview
$ file overview.plist 
overview.plist: ASCII text, with very long lines (456), with CRLF, LF line terminators
$
If that were true, opening the file as UTF-8 should not have failed.

If we analyze a bit further, the code that generates the file for sure did not change. We have noone to access that old server.
I guess it is a single machine running a webserver with PHP and a database. These installation parts are out of our hands - but it is likely the old provider is still patching the machine. A newer version of the database, the database driver or the PHP engine might show impact.

But actually I more believe the code dumbly pulls data from the DB and concatenates into an output stream, not really looking at any specific encoding. And the problem then can even come from the payload, which means the database entries. I cannot blame anyone today for using the all-present international character sets we have. Which does not make the whole story any better...

Then I loaded the file through my own plist parser - and it consumes it happily. Not a single character issue. But why? Is it reading UTF-8? (why can the editor then not read it?) Or is it using some special encoding (where the platform default encoding definitely is UTF-8!). This is a nut to crack.

It might be easier to take it from scratch: Let's find out what Oolite needs to digest, then ensure we can deliver such a file.
@DEVs, could you please confirm the requirements?
Sunshine - Moonlight - Good Times - Oolite
User avatar
timer
---- E L I T E ----
---- E L I T E ----
Posts: 336
Joined: Sat Mar 17, 2012 8:26 pm
Location: Laenin spiv club
Contact:

Re: OXP list fails to download

Post by timer »

On my local machine I have many many overview files - downloaded in debug time.
All the earlier files have a double byte character ü, but now the file is downloaded with a single byte \0xFC. And this is the result of the old server work - we could not influence it.

Right now, my script (on VPS server) is fix file data after downloading from origin server by stupid hack:

Code: Select all

$data =~ s/They are not \w{0,2}ber ships/They are not über ships/;
And Oolite works with utf-8 without any problems.
hiran wrote: Tue Sep 12, 2023 8:04 pm
We have noone to access that old server.
I guess it is a single machine running a webserver with PHP and a database. These installation parts are out of our hands - but it is likely the old provider is still patching the machine. A newer version of the database, the database driver or the PHP engine might show impact.
IMHO best version at this time!
Cobra MK III owner since 1994
User avatar
hiran
Theorethicist
Posts: 2403
Joined: Fri Mar 26, 2021 1:39 pm
Location: a parallel world I created for myself. Some call it a singularity...

Re: OXP list fails to download

Post by hiran »

timer wrote: Wed Sep 13, 2023 8:34 am
Right now, my script (on VPS server) is fix file data after downloading from origin server by stupid hack:

Code: Select all

$data =~ s/They are not \w{0,2}ber ships/They are not über ships/;
Oh, that means your fix focuses on replacing the faulty ü.
timer wrote: Wed Sep 13, 2023 8:34 am
And Oolite works with utf-8 without any problems.
This is really good information. Then I think I have a different hack for you...

Thanks to you I was able to access the old server and found out something. When I was searching for the content-encoding header I did not find any, but the mime-type is application/x-plist, and the server is openresty. It is an application server around nginx - you might like it. ;-)

Back on the file content, I had several tools analysis to claim the file is 7 bit ASCII. It is just not true, so even stranger that they all detect it that way.
When converting the file to UTF-8 I tried different source encodings, and guess what: cp1252 worked without problems. It even changed two locations in the file - the one that you fixed for the oolite.oxp.zzz.Montana05.FdL_enhanced_vipers expansion, but also one in oolite.oxp.zzz.Montana05_Griff_Glowroids that we have not yet identified to be problematic.

Hence my recommendation is to download the file, then to convert it from cp1252 to UTF-8:

Code: Select all

curl -v -o overview-cp1252.plist http://addons.oolite.org/api/1.0/overview
iconv -f cp1252 -t utf-8 <overview-cp1252.plist >overview.plist
Once that is in place, let's have phkb add some more non-ASCII characters and see how the solution behaves.



For the workflow, you can make the logic fairly simple:
1. git clone/pull
2. download
3. convert charset
4. git add
5. git commit
6. git push

You do not need to compare checksums. Why so? Git does it anyway.
Git add will tell you that there is nothing to add. Git commit will have nothing to commit. And the push has no new work for the server - no nothing is done in these steps. The first git clone/pull is necessary though, otherwise the script will fail as soon as we add a manual change to that branch.
Sunshine - Moonlight - Good Times - Oolite
User avatar
timer
---- E L I T E ----
---- E L I T E ----
Posts: 336
Joined: Sat Mar 17, 2012 8:26 pm
Location: Laenin spiv club
Contact:

Re: OXP list fails to download

Post by timer »

hiran wrote: Wed Sep 13, 2023 10:32 am
Back on the file content, I had several tools analysis to claim the file is 7 bit ASCII. It is just not true, so even stranger that they all detect it that way. When converting the file to UTF-8 I tried different source encodings, and guess what: cp1252 worked without problems. It even changed two locations in the file - the one that you fixed for the oolite.oxp.zzz.Montana05.FdL_enhanced_vipers expansion,
mmmm... interesting!
hiran wrote: Wed Sep 13, 2023 10:32 am
but also one in oolite.oxp.zzz.Montana05_Griff_Glowroids that we have not yet identified to be problematic.
yep, but it change long dash (utf-8 3 byte E2 80 93) to – (as in charmap)

interesting, that 0xFC is ü in cp1250/cp1252, but 0xE2 0x80 0x93 is long dash in utf-8...

we can convert file to utf-8, but fail convert FROM utf-8... we realy have file in 1252 encoding?

Well, WHAT is now in DB?
1) If in DB one byte 0xFC - lets try fix it to 0xC3 0xBC and see - what will happen...
2) If in DB 0xC3 0xBC - so, server convert file to cp1252 before sending (new some PHP libs version?)

@phkb, can u help us? but I don't know how exactly check it. I checked right now in DB dump (that u sent me) - it has correct 2 byte utf-8 char. What will be in new DB dump?

hiran wrote: Wed Sep 13, 2023 10:32 am
For the workflow, you can make the logic fairly simple:
1. git clone/pull
2. download
3. convert charset
4. git add
5. git commit
6. git push

You do not need to compare checksums. Why so? Git does it anyway.
Git add will tell you that there is nothing to add. Git commit will have nothing to commit. And the push has no new work for the server - no nothing is done in these steps.
I use GitHub API - IMHO it always write data, even if it not changed, and I don't want send it every 10 min without "profit". Compare md5sum is two strings of code )
hiran wrote: Wed Sep 13, 2023 10:32 am
The first git clone/pull is necessary though, otherwise the script will fail as soon as we add a manual change to that branch.
Well... u want to try merge results of old editor and manual or other (auto?) commits? Interesting... maybe by merge of different branches - for example, server commit to special new("his") branch and then merge it to only-media... many tiny problems, but... I have to think about it...
Cobra MK III owner since 1994
User avatar
hiran
Theorethicist
Posts: 2403
Joined: Fri Mar 26, 2021 1:39 pm
Location: a parallel world I created for myself. Some call it a singularity...

Re: OXP list fails to download

Post by hiran »

timer wrote: Wed Sep 13, 2023 1:20 pm
hiran wrote: Wed Sep 13, 2023 10:32 am
but also one in oolite.oxp.zzz.Montana05_Griff_Glowroids that we have not yet identified to be problematic.
yep, but it change long dash (utf-8 3 byte E2 80 93) to – (as in charmap)

interesting, that 0xFC is ü in cp1250/cp1252, but 0xE2 0x80 0x93 is long dash in utf-8...
Exactly. The long dash (em dash) becomes a multi byte character when stored in UTF-8. :-)
Here you can see the character side by side in UTF-8, UTF-16 and UTF-32. If the encoding allows, no escaping is necessary.
timer wrote: Wed Sep 13, 2023 1:20 pm
we can convert file to utf-8, but fail convert FROM utf-8... we realy have file in 1252 encoding?
Exactly what I suspect.
timer wrote: Wed Sep 13, 2023 1:20 pm
Well, WHAT is now in DB?
1) If in DB one byte 0xFC - lets try fix it to 0xC3 0xBC and see - what will happen...
2) If in DB 0xC3 0xBC - so, server convert file to cp1252 before sending (new some PHP libs version?)
I do not believe we should put specific bytes into the DB. Let's just cut and paste text as we used to do - there are many encodings inbetween.
But somehow the characters we saw made it in, and others should go in the same way. When they are retrieved by the plist generating code, that is when they get serialized into cp1252.
timer wrote: Wed Sep 13, 2023 1:20 pm
@phkb, can u help us? but I don't know how exactly check it. I checked right now in DB dump (that u sent me) - it has correct 2 byte utf-8 char. What will be in new DB dump?
Let's add a dummy expansion - one that just contains various non-ASCII characters in title or description. I predict as long as the characters are taken from here it somehow is ok. If we need other characters like these Mahjong Tiles we will see strange effects. Or they get correctly escaped and we are good.
timer wrote: Wed Sep 13, 2023 1:20 pm
I use GitHub API - IMHO it always write data, even if it not changed, and I don't want send it every 10 min without "profit". Compare md5sum is two strings of code )
hiran wrote: Wed Sep 13, 2023 10:32 am
The first git clone/pull is necessary though, otherwise the script will fail as soon as we add a manual change to that branch.
Well... u want to try merge results of old editor and manual or other (auto?) commits? Interesting... maybe by merge of different branches - for example, server commit to special new("his") branch and then merge it to only-media... many tiny problems, but... I have to think about it...
Ah, the function you use does not clone at all? It might be more like updating a file from the browser, just through API. Stay with it, it did not know it is available.
Sunshine - Moonlight - Good Times - Oolite
User avatar
timer
---- E L I T E ----
---- E L I T E ----
Posts: 336
Joined: Sat Mar 17, 2012 8:26 pm
Location: Laenin spiv club
Contact:

Re: OXP list fails to download

Post by timer »

hiran wrote: Wed Sep 13, 2023 1:47 pm
Ah, the function you use does not clone at all? It might be more like updating a file from the browser, just through API. Stay with it, it did not know it is available.
yep, using API from the server - some training before using API from CloudFlare Worker JS, for our new OXP manager.
Cobra MK III owner since 1994
Post Reply