Re: OXP Performance tips
Posted: Sun May 06, 2018 3:42 am
The new version of Telescope is about 35% faster than 1.13 (14% faster than 1.15) and garbage production was decreased by over 70% vs 1.15 (1.13 & 1.15 have similar rates). Here are some of the techniques I used to improve performance:
Take out the garbage
We've talked recently in this thread about reducing garbage by reusing objects & arrays and performing some vector functions locally. That got me much of the 70%. I also got rid of most calls that return arrays. For example, I was using splice to remove an item from a list. Splice returns an array of what was deleted, which I didn't need.
This is a wee bit slower, 1 microsec (0.001 ms), which I think is a fair trade off for less garbage.
Replace unnecessary function calls
A number of Math functions are convenient during development but can be replaced with inline expressons:
We usually never want to have duplicate code (maintenance, readability, duplicate bugs) but exceptions can be made where speed is essential. Telescope has a function for calculating distance which gets called in 14 places. But the loop that repositions effects has some common vector calculations, so moving some of the distance code inline eliminated duplicated work. Just be sure to document it well in both functions.
Prioritize compound conditions
As many of you know, JS will abort execution of conditional expressions in some situations:
We can use this to enhance speed by putting the faster ones first.
This is the most efficient ordering, all things being equal, as 'allowed' is local, 'docked' is a property of PlayerShip, 'bounty' is a property of ship, 'mass' is a property of entity and 'isSpecial' is a worldScripts property. We may re-order them if we know one is almost always true (shift right for &&, left for ||) or almost always false (shift left for &&, right for ||). The goal is get it to abort quickly when it does abort.
If the 'allowed' variable is only used to determine entrance into the if block, move it into the expression so it's only executed when absolutely necessary.
[whether to place it before or after the worldScripts property is determined by profiling to see which takes longer]
Profiling shows that WorldScriptsGetProperty is about 18 times slower that EntityGetProperty. Such comparisons are relative, as WorldScriptsGetProperty varies with the number of oxp's you load, and MissionVariablesGetProperty (about 3 times slower) varies with the size of your save file.
A rare example came up recently involving the && of two .indexOf calls, where one list was always much shorter than the other. Putting the short one first saves time.
Spread the load over several frames
Some jobs require repeats faster than a Timer (is interval still a min. 0.25 sec?) but take longer than we'd like for a frame callback. I use a scheme that allows a job to be spread over several frames. For example, updating the MFD entails selection, distance & heading calculations and formatting. I've been shooting for 2 ms operations and updating an MFD list of 10 ships takes triple that. The user will never notice a delay of a couple frames (no so for visual effects, that will be noticed), so I process a few ships and suspend this until the next frame. To do this, I maintain an array of pending functions and at the end of my frame callback, if any are present, I do one. Thus the MFD text is prepared over 3 frames.
Strictly speaking, this example doesn't save any time, it just levels out the load (MFD is updated 1/sec, so whether I do it all at once or across frames, the same work is done). I also use this for creating a new scan or updating an existing one. Time is saved though when pending function calls get purged. A new scan purges everything pending, as there's no point updating the MFD, for example, when a new list is being created.
Significant time is saved when you spread high frequency jobs across frames. For updating the position of visual effects, rather than update them all in one frame, I can do half in one frame, the rest in the next frame, with no noticeable difference. This effectively cuts the time for updating effect positons in half.
Vary Effects updates by distance
Visual effects are tricky, as the human eye can detect oddities even at 60 Hz. It may not be clear what's up; it just looks wrong. Telescope deals with objects both near and far. The near ones, within 2 * scannerRange, get updated every frame. But those outside that limit can be updated less frequently, like every other frame. And those beyond 4 * scannerRange get updated every 4th frame. This is only done when flying at normal speeds. With the torus drive, everything gets done each frame, as at those speeds, it does become noticeable.
Cache, Cache, Cache
Whenever you've performed an expensive op, you never want to repeat it if you can avoid it. Say you support a number of other oxp's planetary naming conventions. Once you figure out what that planet's name is, save it until you leave the system. I know this is obvious but there are often cases that you can miss (I know I can & do!). When your profiling points to a problem function, always check if there is something that doens't need doing on every call.
Just recently, I ran into this. I thought my MFD formatting was quick enough until a added a name shortening feature (to avoid really squished text when using randomshipnames). Suddenly, that function rose to the top of the list of speed hogs. But, like the planet, a ship's name doesn't change and a cache solved the problem:
Frame Rate customization
Telescope has a lot going on and its performance varies considerably from machine to machine. One way to combat this is to monitor its frame rate and adjust accordingly; sometimes even that's not enough, so you'll have to reduce functionality.
I wrote an oxp a while back to monitor frame rate (fps_monitor, clever title, no?) that collects lots of data but that may be overkill. If you have a frame callback, just sum the delta values it gets passed and increment a counter on each call. When the total delta summed hits 1, you counter has the # of frames in the last second. What my utility adds is average fps for set intervals, high and low values, different methods of calculating what the 'average' is.
I regularly zip through my list of sightings, checking their status to see if they need deleting. I've scaled this to be a function of the player's PC. The function has 2 modes, quickly vs full check.
I let the PC's frame rate dictate how many ships to check each frame, dividing by 5 for a full check as it takes much longer.
When creating a new list (the telescope 'scan'), I have to sort through all ships in the system, which is uaually over 150. Done all at once will cause the frame rate to crater, so I use the PC's frame rate to split the job across frames and monitor the effect. If the impact is too high, I increase the number of frames in the spread.
Telescope has a variable, MaxTargets, limiting the size of the list of sightings. Players may specify a value but if their hardware cannot handle it, it gets reduced. The adjustment is not always down, as a temporary dip in the frame rate would cause it's adjustment to be too low. Using a long term average (5 min) as a baseline, I compare the relative effect on frame rate and increase/decrease accordingly (see function init_growing if you're intrigued).
I also use frame rate to estimate the distance travelled in one frame, to accurately position effects when travelling at high speed. This can only ever be an estimate, as the frame rate can fluctuate a lot from one frame to the next, so I error on the side of caution.
Ensure gets are not repeated
[spoiler: involves closures]
Long or complex scripts must be broken into smaller functions, if for no other reason than our sanity. Function calls themselves are not very expensive (about 1.2 microsec) and smaller code chunks are easier to deal with logically, test in isolation and be understood by others. The problem with many smaller functions goes back to property gets. Not all can be cached, only the constant ones. Function references are fine as they never change, as are some objects, especially if they're in your control. But many object references cannot be cached, so each function must perform their own lookup. Writing one humungous function to realize the saving of only doing a lookup once is not my prefered solution. Another way is using a closure.
A closure in JS is simply a function that returns a reference to an inner function. This special feature of JS is to support independent features on a web page. Imagine a field that takes user input. It has to remember that input for when the user returns to that field. A closure is not required to do this but it makes it a lot easier. Without one, the value would have to be stored somewhere external to the function, as a normal function's variables get tossed when the function exits. By returning a reference to an internal function, JS must preserve its variables for when that referenced function is called. Think of it like a 'Do Not Disturb' sign on a hotel room door; the JS maid stays out and leaves everything as it is.
Closures have gotten a bad rep for causing memory leaks, among other things. This was due to programmer error, often generating these references in loops or creating many copies of the closure.
For our purposes, we only need a single instance of the closure. This can be done either by calling it at start up or have it self-initiate, so it's created when the script loads. Once created, we can cache distant lookups in local variables, so they are available to all. And we hardly ever need to type the word 'this' again
The lookup of the towbar script is done once, when closure is created. The lookup of $TowbarShip is limited to calls to setTowed and the mass property get is only done when the towed ship changes. 'towedMass' can be called repeatedly without repeating a property get.
This is a trivial example but in telescope, with 100+ functions and 70+ variables, the savings can really add up. The scheme I used involves setting all the local global variables (glocals?) to -1 when I start processing a new sighting. When a function needs a property:
It's longer code but fast. You can do hundreds of '< 0' tests before you come close to the expense of the property get. Another perk of this scheme is you don't need to keep track of when & where a property was gotten. Assume it wasn't, add the test and you're free to concentrate on your oxp!
FYI, fps_monitor is a self-initiating closure and I wrote Station Options as a closure too, so they are much shorter examples to check out.
(800 lines for fps_monitor vs 2300 for Station Options vs 5500 for Telescope)
Take out the garbage
We've talked recently in this thread about reducing garbage by reusing objects & arrays and performing some vector functions locally. That got me much of the 70%. I also got rid of most calls that return arrays. For example, I was using splice to remove an item from a list. Splice returns an array of what was deleted, which I didn't need.
Code: Select all
//mapping.splice( found, 1 );
for( var i = found, len = mapping.length; i < len - 1; i++ ) {
mapping[ i ] = mapping[ i + 1 ];
}
mapping.length = --maplen;
Replace unnecessary function calls
A number of Math functions are convenient during development but can be replaced with inline expressons:
Code: Select all
Math.abs( x ): x >= 0 ? x : -x
Math.max( x, y ): x > y ? x : y
Math.min( x, y ): x < y ? x : y
Math.floor( x ): ~~ x
Math.round( x ): ~~( x + 0.5 )
Math.ceil( x ): ~~( x + 1 )
[~ is the bitwise not, so x gets converted to an integer w/ all its bits flipped,
the 2nd ~ flips them back, leaving x without its fractional part]
Prioritize compound conditions
As many of you know, JS will abort execution of conditional expressions in some situations:
Code: Select all
if( a && b && c ) ... // If a is false, neither b or c will be evaluated, and
if( x || y || z ) ... // if x is true, neither y or z will be evaluated
// To do so would be pointless, as they wouldn't change the final value.
Code: Select all
var ws = worldScripts.someOxp;
var allowed = someFunction();
var ps = player.ship;
if( allowed && ps.docked && ps.bounty === 0 && ps.mass < 130000 && ws.$isSpecial )
If the 'allowed' variable is only used to determine entrance into the if block, move it into the expression so it's only executed when absolutely necessary.
Code: Select all
if( ps.docked && ps.bounty === 0 && ps.mass < 130000 && someFunction() && ws.$isSpecial )
Profiling shows that WorldScriptsGetProperty is about 18 times slower that EntityGetProperty. Such comparisons are relative, as WorldScriptsGetProperty varies with the number of oxp's you load, and MissionVariablesGetProperty (about 3 times slower) varies with the size of your save file.
A rare example came up recently involving the && of two .indexOf calls, where one list was always much shorter than the other. Putting the short one first saves time.
Spread the load over several frames
Some jobs require repeats faster than a Timer (is interval still a min. 0.25 sec?) but take longer than we'd like for a frame callback. I use a scheme that allows a job to be spread over several frames. For example, updating the MFD entails selection, distance & heading calculations and formatting. I've been shooting for 2 ms operations and updating an MFD list of 10 ships takes triple that. The user will never notice a delay of a couple frames (no so for visual effects, that will be noticed), so I process a few ships and suspend this until the next frame. To do this, I maintain an array of pending functions and at the end of my frame callback, if any are present, I do one. Thus the MFD text is prepared over 3 frames.
Strictly speaking, this example doesn't save any time, it just levels out the load (MFD is updated 1/sec, so whether I do it all at once or across frames, the same work is done). I also use this for creating a new scan or updating an existing one. Time is saved though when pending function calls get purged. A new scan purges everything pending, as there's no point updating the MFD, for example, when a new list is being created.
Significant time is saved when you spread high frequency jobs across frames. For updating the position of visual effects, rather than update them all in one frame, I can do half in one frame, the rest in the next frame, with no noticeable difference. This effectively cuts the time for updating effect positons in half.
Vary Effects updates by distance
Visual effects are tricky, as the human eye can detect oddities even at 60 Hz. It may not be clear what's up; it just looks wrong. Telescope deals with objects both near and far. The near ones, within 2 * scannerRange, get updated every frame. But those outside that limit can be updated less frequently, like every other frame. And those beyond 4 * scannerRange get updated every 4th frame. This is only done when flying at normal speeds. With the torus drive, everything gets done each frame, as at those speeds, it does become noticeable.
Cache, Cache, Cache
Whenever you've performed an expensive op, you never want to repeat it if you can avoid it. Say you support a number of other oxp's planetary naming conventions. Once you figure out what that planet's name is, save it until you leave the system. I know this is obvious but there are often cases that you can miss (I know I can & do!). When your profiling points to a problem function, always check if there is something that doens't need doing on every call.
Just recently, I ran into this. I thought my MFD formatting was quick enough until a added a name shortening feature (to avoid really squished text when using randomshipnames). Suddenly, that function rose to the top of the list of speed hogs. But, like the planet, a ship's name doesn't change and a cache solved the problem:
Code: Select all
var lastShipReports = {}; // cleared when entering witchspace
function ShowShipReport( map ) {
var name = '', cached = false;
key = ent.entityPersonality; // orbs don't have one, so are not cached (PlanetName has its own cache)
if( key && lastShipReports.hasOwnProperty( key ) ) {
name = lastShipReports[ key ];
cached = true;
}
...
if( !cached ) lastShipReports[ key ] = name;
}
Telescope has a lot going on and its performance varies considerably from machine to machine. One way to combat this is to monitor its frame rate and adjust accordingly; sometimes even that's not enough, so you'll have to reduce functionality.
I wrote an oxp a while back to monitor frame rate (fps_monitor, clever title, no?) that collects lots of data but that may be overkill. If you have a frame callback, just sum the delta values it gets passed and increment a counter on each call. When the total delta summed hits 1, you counter has the # of frames in the last second. What my utility adds is average fps for set intervals, high and low values, different methods of calculating what the 'average' is.
I regularly zip through my list of sightings, checking their status to see if they need deleting. I've scaled this to be a function of the player's PC. The function has 2 modes, quickly vs full check.
Code: Select all
...
if( parm === true ) {
quickly = that.quickly = true;
fps = that.fps = current_fps(); // quickly is fast, so check fps ships/frame
if( fps < 0 ) fps = that.fps = 30; // current_fps returns -1 until 1st min. has passed
starting = i = maplen;
} else if( parm === false || parm === undefined ) {
quickly = that.quickly = false;
fps = current_fps();
if( fps < 0 ) fps = 30; // current_fps returns -1 until 1st min. has passed
fps = that.fps = ~~(fps / 5); // store as fn prop for next frames' execution
starting = i = maplen;
} else { // parm is an index # to resume
quickly = that.quickly || true;
fps = that.fps || 6;
starting = maplen;
i = parm;
}
...
while( ...
...
if( i > 0 && i % fps === 0 ) { // checking list can take more time than we'd like in a frame
set_fn_pending( check_Sightings, i ); // so suspend the work until next frame
return; // so we do a chunk each frame, its size a fn of fps
}
...
When creating a new list (the telescope 'scan'), I have to sort through all ships in the system, which is uaually over 150. Done all at once will cause the frame rate to crater, so I use the PC's frame rate to split the job across frames and monitor the effect. If the impact is too high, I increase the number of frames in the spread.
Telescope has a variable, MaxTargets, limiting the size of the list of sightings. Players may specify a value but if their hardware cannot handle it, it gets reduced. The adjustment is not always down, as a temporary dip in the frame rate would cause it's adjustment to be too low. Using a long term average (5 min) as a baseline, I compare the relative effect on frame rate and increase/decrease accordingly (see function init_growing if you're intrigued).
I also use frame rate to estimate the distance travelled in one frame, to accurately position effects when travelling at high speed. This can only ever be an estimate, as the frame rate can fluctuate a lot from one frame to the next, so I error on the side of caution.
Ensure gets are not repeated
[spoiler: involves closures]
Long or complex scripts must be broken into smaller functions, if for no other reason than our sanity. Function calls themselves are not very expensive (about 1.2 microsec) and smaller code chunks are easier to deal with logically, test in isolation and be understood by others. The problem with many smaller functions goes back to property gets. Not all can be cached, only the constant ones. Function references are fine as they never change, as are some objects, especially if they're in your control. But many object references cannot be cached, so each function must perform their own lookup. Writing one humungous function to realize the saving of only doing a lookup once is not my prefered solution. Another way is using a closure.
A closure in JS is simply a function that returns a reference to an inner function. This special feature of JS is to support independent features on a web page. Imagine a field that takes user input. It has to remember that input for when the user returns to that field. A closure is not required to do this but it makes it a lot easier. Without one, the value would have to be stored somewhere external to the function, as a normal function's variables get tossed when the function exits. By returning a reference to an internal function, JS must preserve its variables for when that referenced function is called. Think of it like a 'Do Not Disturb' sign on a hotel room door; the JS maid stays out and leaves everything as it is.
Closures have gotten a bad rep for causing memory leaks, among other things. This was due to programmer error, often generating these references in loops or creating many copies of the closure.
For our purposes, we only need a single instance of the closure. This can be done either by calling it at start up or have it self-initiate, so it's created when the script loads. Once created, we can cache distant lookups in local variables, so they are available to all. And we hardly ever need to type the word 'this' again
Code: Select all
this.startUpComplete = function() { // closure is created here, as the towbar script may not exist when we exit startUp
// could be done in startUp if closure does not reference other oxp's
if( !this._towedMass ) {
let mc = this._myClosure(); // create closure by calling it
this._setTowed = mc.setTowed; // cache function references in script variables
this._clearTowed = mc.clearTowed;
this._towedMass = mc.towedMass;
// to get the mass of the towed ship, use this._towedMass( ship )
}
}
this._myClosure = function() {
var wt = worldScripts.towbar; // caches reference to towbar script
var towed = null; // reference to ship in question
var mass = 0; // persistent local variable
// private function that's only available inside _myClosure
function isTowed( ship ) {
if( towing === null )
setTowed();
return ship === towed;
}
// public functions because they are returned
function setTowed() {
var newShip = wt && wt.$TowbarShip;
if( newShip && newShip !== towed ) {
towed = newShip;
mass = towed.mass; // property get only when ship changes
}
}
function clearTowed() {
towed = null;
mass = 0;
}
function towedMass( ship ) {
if( !ship ) return 0;
if( !isTowed( ship ) ) return 0;
return mass;
}
return { setTowed : setTowed,
clearTowed: clearTowed,
towedMass: towedMass
};
}
This is a trivial example but in telescope, with 100+ functions and 70+ variables, the savings can really add up. The scheme I used involves setting all the local global variables (glocals?) to -1 when I start processing a new sighting. When a function needs a property:
Code: Select all
if( mass < 0 ) mass = ship.mass;
switch( mass ) {
...
FYI, fps_monitor is a self-initiating closure and I wrote Station Options as a closure too, so they are much shorter examples to check out.
(800 lines for fps_monitor vs 2300 for Station Options vs 5500 for Telescope)