Unintended Results - Have you allowed for the mathmatics bug

Pegleg · Post by **Pegleg** » Tue Jan 13, 2009 4:30 pm

I'm a noob to this forum and the Oolite game code but I've been a programmer since the mid 80s, having worked with many of the BASIC languages over the years and porting the games over many different machines and OSs from Radio Shack and Commodore through most of the line of IBM compatible systems. In browsing your forums I have found occasional references to 'minor' problems and glitches (like flashers leaving the ship and moving ahead of it on the viewscreen) which no one as yet has found a complete workaround for. I've run into similar problems in the past with Star Trek games and have some ideas that I've used successfully in the past to correct these kind of unintended consequences.

By no means do I wish to step on anyones toes or imply any criticisms to the fine, ongoing work going on here... the problem is not so much in your code or even the programming language as such... it's in the computer processors that are running it. Please bear with me as I elaborate.

When 16-bit computing became available, all of us set to work rewriting our code to take advantage of the expanded memory range so that our games wouldn't have to access the hard drive every other second. All of us also started crashing our programs with 'out of range' errors too because even though theoretically we could use any range of numbers, in actuality there were limitations. (try using more than 3 gigs memory on a 32-bit OS) While using direct mode in QBASIC to check variables to try and isolate on of these errors, I discovered that the '4' that should have been returned by my math function call was in reality '4.0000002857'.

About that same time, the early 90s, Intel announced that there was a mathmatics bug on some(all) of their computer chips. There was never a fix issued because it was said that most users would never be doing anything that would notice such an insignificant problem. Problem swept under the rug.

The problem is, they didn't correct it in future releases to the best of my knowledge. I have repeated this error on 386sx, 386dx, 486sx, 486dx, K-5, K-6, Pentium-60, Pentium-90, and Pentium II machines. I haven't tested any of the newer machines because I have long since changed my programming habits to correct for this. I don't know if the MACs have this problem, but it seems that all of the PC's are vulnerable.

This is a little long - sorry. I'll continue with my workarounds later on... unless of course you all think I'm just a dated old fogey who's full of ...

Pegleg · Post by **Pegleg** » Tue Jan 13, 2009 5:00 pm

To continue...

I've learned that these errors don't seem to appear when doing simple addition and subtraction, but that multiplication can sometimes pick up the extra 'noise' when the process takes it over the 8-bit layer and that division gets corrupted much more easily, let alone square roots and cube roots.

I believe that the problem is caused by the computer's internal translation of very large( and very minute) 'real' numbers as it switches between the different notations(integer, long integer, real, scientific notation, exponent,etc...) as it saves the result internally in each step performed in the computation. I don't know exactly how it happens. I just know that Pythagorean Triangles have integer sides and that the computer doesn't come up with the same answer.

The problem is that even if the error has not been compounded enough times to cause an out of range error, it can still screw up logic calls and branching subroutines. 4.000002857 does NOT equal 4. It is however Greater Than 4 so that your branch may go to 5 instead, depending on how the code handles exceptions.

When you have a looping mathmatical distance/range function( like in space games), because of the greater distances involved it is very likely that you will cross one or more of the 'bit' limitations and that eventually you will have an out of range error crash the program.(A 32-bit system cannot natively handle an integer greater than 4gig! It must internally be translated into a different format/notation for the computer to handle or store it.)

Pegleg · Post by **Pegleg** » Tue Jan 13, 2009 5:47 pm

The problem could also be rooted in how the programming language accesses and calls the math functions from the main/co-processor as anyone whose ever saved/loaded arrays across different platforms quickly finds out when one array begins with zero and the other starts at one.

What I've learned to do over the years(many apologies if I beleager the obvious or step on any toes) is to break up all of my formulas into smaller entities. While it may slightly interfere with the 'elegance' of the code, it allows me to much more easily check on variables and to clearly specify the order of operations in the equations( which manytimes allows me to remain clear of the 8-bit and other breakpoints when plausible).

I also use the breaks to clean and or clear the variables between operations, and if necessary, add error handling scripts to help avoid system crashes when corruption does creep in. Most people assume that operations involving integers return integers, yet it has been my experience that too often a real number gets sent instead until you take the time to convert the answer back into an integer. (You also have to be specific as to how this conversion happens... do you truncate, round up, round down, etc..)

Lastly, I clear and reuse the same variables over and over so that if it does pick up some 'noise' it gets wiped out before being compounded over and over and over again. There is where the migrating flashers get there legs from.

I hope you find this useful in removing some of your weird goings on. I've watched programming evolve over the decades and noticed that people tend to forget the old GIGO rule today, which is ultimately where this problem crept in 20 years ago. Your computer is STILL a retard... albeit a very fast retard.

I suppose now I'll have to learn a couple of more new programming languages so that I can put my money where my mouth is. Thank you all for all of the hours you've put in so far. I've really been enjoying shooting up the galaxies again.

wackyman465 · Post by **wackyman465** » Tue Jan 13, 2009 11:32 pm

What intel chips does this affect (chips in apple comps?)?

Website · Post by **DaddyHoggy** » Wed Jan 14, 2009 12:04 am

I was always lead to believe that all the errors in the Pentium multiplications (they were in the inbuilt quick lookup tables if I remember correctly) were covered by every OS created since about 1997 - the errors in Oolite as Frame is discovering is all down to precision - I think the long-standing errors in CPU mathematical functionality is a red herring.

Pegleg · Post by **Pegleg** » Wed Jan 14, 2009 12:44 am

As I haven't studied the core code and done a line by line debug myself, it's possible that I'm way offbase here. However, I've found at the very least that that status changes of variables in the midst of calculations(ie; changing integers into real numbers and then not changing back) can also sometimes produce discrepancies like I talked about above also. In any event, it may be worth a look see.

Beyond that, DaddyHoggy, you wouldn't happen to know if it is preferable to move the laser up as opposed to moving the front view down, or both... to correct the aiming point on a ship. I'm flying a Military Stingray right now and keep firing low. Any thoughts?

Commander McLane · Post by **Commander McLane** » Wed Jan 14, 2009 6:41 am

Pegleg wrote:
Beyond that, DaddyHoggy, you wouldn't happen to know if it is preferable to move the laser up as opposed to moving the front view down, or both... to correct the aiming point on a ship. I'm flying a Military Stingray right now and keep firing low. Any thoughts?

The weapon_position_foo-keys in shipdata are broken since (I think) 1.70. Therefore it isn't currently possible to move the lasers, only the viewpoints.

The developers are aware of the problem, and some day it will get fixed (hopefully).

Pegleg · Post by **Pegleg** » Wed Jan 14, 2009 12:46 pm

Ahhh... That explains why I've run into that problem on so many player ships then. Thanks! I'll be waiting eagerly.