Tag Archives: as3

Fast AS3 Signals with SignalsLite

I was playing around and ended up writing a lite Signals class (ok, 3 classes). The set works like a basic AS3 Signal, minus most of the extra functionality of AS3 Signals (run-time dispatching argument type checking as one example). The goal was to create a very fast Signal dispatcher, with very little overhead, and to dispatch with absolutely no heap allocation (check, check and check) – targeted mostly for mobile (AIR). Regular AS3 Signals does well, but it seemed to have a lot of extra stuff that I don’t need – and this was a fun kind of exercise anyway.

Some quick numbers from the performance-test with 1,000,000 iterations on a Core 2 Duo 2.6GHz (in milliseconds):

Func call time: 15
Runnable call time: 5
Event (1 listener) time: 863
Signal (1 listener) time: 260
SignalLite (1 listener) time: 232
RunnableSignal (1 listener) time: 56

Func call (10 listeners) time: 190
Runnable call (10 listeners) time: 399
Event (10 listeners) time: 2757
Signal (10 listeners) time: 741
SignalLite (10 listeners) time: 725
RunnableSignal (10 listeners) time: 221

The bold line is a vanilla SignalLite, and the line above Robert Penner’s AS3 Signals. They are pretty close, but SignalsLite takes a modest edge. But let’s look at the same test on iOS (iPhone 4S) with 100,000 iterations:

Func call time: 171
Runnable call time: 26
Event (1 listener) time: 3723
Signal (1 listener) time: 789
SignalLite (1 listener) time: 481
RunnableSignal (1 listener) time: 117

Func call (10 listeners) time: 2004
Runnable call (10 listeners) time: 1892
Event (10 listeners) time: 9217
Signal (10 listeners) time: 4030
SignalLite (10 listeners) time: 2074
RunnableSignal (10 listeners) time: 498

On iPhone you can see that SignalLite is almost twice as fast as AS3 Signals – a more substantial difference than on desktop. I’m not sure why that is, maybe the AOT compiler can optimize something about SignalLite better – IDK, but it sure is fast!

Then there’s that last line in each group – RunnableSignal. Now your talking speed. That one also solves a particular problem with function callback systems that they all seem to have – there is no compile time function signature checking. You have to wait until the thing runs, and then find out you are taking the wrong number of arguments, or the wrong type, etc. But, solving one problem (compile time type checking), solves the other (speed), and that brings us to SignalTyped which RunnableSignal in the test above extends (I’ll probably rename at some point).

SignalTyped is beginnings of a fast executing type safe implementation of AS3 Signals. The idea is, you extend 2 classes – SignalTyped and SlotLite. SignalTyped is effectively an abstract class – you must extend it and implement the dispatch method, and the constructor (at least for now, I’m looking for better ways to handle this). It takes a bit of boilerplate to implement this in a class that would expose signals. This example is based on the performance test from Jackson Dunstan’s CallbackTest which I borrowed (I hope that’s ok!):

// Interface for your class that might have listeners for the SignalTyped.
// Make one of these per listener type.
interface IRunnable {
	function run(): void;
} 

// Custom Slot has a specific property for the Runnable class.
class RunnableSlot extends SlotLite
{
	public function RunnableSlot( runnable:IRunnable ) {
		this.runnable = runnable;
	}
	public var runnable:IRunnable = new EmptyRunnable;
}

// An empty IRunnable class for first node.
class EmptyRunnable implements IRunnable {
	public function run():void {};
}

// You need one of these per dispatch type.
class RunnableSignal extends SignalTyped
{
	// last and first must be set to the typed Slot.
	public function RunnableSignal() {
		last = first = new RunnableSlot;
	} 

	// implement the dispatch method to call the runnable prop directly
	// It's easy to have it take and dispatch any type you want - with compile time type checking!
	public function dispatchRunnable():void
	{
		var node:RunnableSlot = first as RunnableSlot;
		while ( node = (node.next as RunnableSlot) ) {
			node.runnable.run(); // FAST!
		}
	}
}

That’s all necessary for the implementation requirements – a lot of boilerplate, I admit. Then you expose that in a class that might use it all:

class MyDisplayObject
{
	// could probably make this a getter..
	public var signaled:RunnableSignal = new RunnableSignal;
}

Now for the consumer to use this, it’s just a bit more boilerplate than a normal signal:

class MyConsumerOfSignalLite implements IRunnable // boilerplate point 1
{
	public function MyConsumerOfSignalLite()
	{
		var dspObj:MyDisplayObject = new MyDisplayObject();
		// add the signal (boilerplate point 2 - normal)
		dspObj.signaled.addSlot( this );
	} 
    
	// boilerplate 3 - normal, but more strict - naming is specific - FAST!
	public function run(): void {
		// do whatever when signals
	}
}
// boilerplates 2 and 3 are normal for any signal, except the strictness of #3

What’s cool about this is you get compile time type checking for your method signature, and the performance improvement that comes with skipping those checks at runtime.

I’m also thinking about a slightly different signal API that would be more like the Robot Legs’ contract system – think signals by contract – I’m working on it. Since we would be implementing a defined interface per signal type, we could boil the add methods and signal nodes down to one method to add all the listeners of a single object – one add method per dispatching class, instead of one per signal on the dispatching class. This could lead to a reduction in boilerplate. We’d filter by interface type instead of using multiple signal.add nodes and methods. So – improved runtime performance, reduction in (usage) boilerplate (if not implementation) and compile time type checking. I love it!

Note – I tested none of the example in this post, and the code in github is all very early stage stuff. The performance-test class works though – give it a try!

Oh, here’s the github repo:
https://github.com/CaptainN/SignalsLite

Performance Benchmarks with AIR 2.7 for iOS

I’ve been working on this Benchmark based on Iain Lobb’s BunnyMark. Being a bit confused sometimes about what things speed things up or slow things down, I didn’t want to guess anymore, so I grabbed Iain’s code base (cause I’m lazy, and didn’t want to start from scratch), and added some tests for things I suspect are slowing things down (or speeding things up). I think this will also help shed some light on why some folks see a huge gain in AIR 2.7 CPU mode, while others do not.

Some caveats – this only tests instances of flash.display.Bitmap on the display list, at the size they are, moving the way they move. It’s on my list to add Blitting (I have some initial work on that done, thanks to Iain, but I need to add the rotation, and alpha settings to it), and I’d like to add a vector test, and maybe some extra sized Bitmaps (I’ve heard that makes a difference).

Enough! Here are some results – quality had no effect on GPU mode, so I included only one line:

Note: some are reporting they see a difference in GPU mode, but I still don’t. Update: It appears some users are confusing “Mobile Performance Tester” with BunnyMark, which explains the discrepancy. BunnyMark is not currently in any App Store, which is one key distinguishing feature. ;-)

BunnyMark Results – 500 Bunnies
Alpha
Rotation
CaB
CaBM
iPhone 3GS – GPU
FPS 24 18 17 22 13 13 19 1 19
iPhone 3GS – CPU
FPS-L 28 21 19 9 19 5 7 5 5
FPS-M 28 21 19 4 18 3 7 5 3
FPS-H 28 21 19 3 18 2 7 5 2
FPS-B 28 21 19 3 18 2 7 5 2
iPhone 4 (Retina) – GPU
FPS 25 21 20 25 13 13 16 0.5 16
iPhone 4 (Retina) – CPU
FPS-L 32 23 20 10 21 6 8 6 7
FPS-M 32 23 20 5 20 3 8 6 3
FPS-H 32 23 19 4 20 2 8 6 2
FPS-B 32 23 19 4 20 2 8 6 2

Notes about the Benchmark:

  • In general, the CPU mode seems pretty consistent with the way you’d expect things to work on the desktop – the same optimizations you’d apply for the browser plugin, you’d also apply to mobile for CPU mode.
  • Rotation in this benchmark is not continuous – the Bunny graphics are only rotated at the edge of the stage, which is why cacheAsBitmap works to speed those up. If they were constantly updated, it would likely be much more expensive on CPU mode (probably more like rotation without CaB).
  • Alpha is continuous – the alpha value of each Bunny is based on the y position and is updated every frame. I would like to add a mode similar to the rotation, so see what effect CaB has on alpha transparent objects that don’t constantly change.
  • iPhone 4 and 3GS numbers aren’t directly comparable for practical purposes. The Bitmaps on the screen on 3GS take up much more real estate, since the 3GS screen res has 1/4 as many pixels as the iPhone 4. In a normal app, we’d probably resize things to look comparable between the two devices. I’ll try to add a mode that makes this more comparable (because I suspect we’ll find that 3GS can keep up with iPhone 4 with similar looking content).
  • Touching the screen seems to cost about 4 fps across the board.
  • I think there may be an issue with returning to rotation = 0 costing some performance in GPU mode. Still have to test that.
  • I’m definitely getting some variance on default speeds – basically, before any settings are messed with on some runs I get the faster numbers (the baseline numbers in the tables above). Other times it runs at default settings a couple of FPS slower (on start, or after resetting the switches). With any of the settings, everything is consistent across multiple runs.

 

It’d be nice to have more benchmarks for more devices, but I only have the above devices available. This should run just fine on Android, Blackberry Playbook, and iPads. If anyone wants to contribute a set of benchmarks, hit the comments. Here is the source. One of these days I’ll make another post, and try to draw some conclusions, maybe wrap the bullet points into a narrative, and edit some of this, but the tables are there, and the source code, and that’s the important stuff.

In the midst of playing with this benchmark, I found (or was pointed at) some great resources. Here are some of them:

Here is the Benchmark to see it in action:

Flash iPhone Game at Silky 60FPS on 3GS

Well, it’s only a tech demo at the moment. I’ve been playing with this Breakout like game for a while, trying to learn the ins and outs of Flash mobile development – particularly as it relates to performance. I now have the unBrix demo running at close to 60FPS (59.1) – smooth as silk.

This won’t run at 60FPS on Android Flash Player plugin in the browser (or Firefox on Mac!) – this post is about the iPhone build – but here’s the web version to look at anyway.

Here is a blurry video of the thing running as a native iPhone app on a 3GS (I smoothed out the choppy splash transition in a later build by setting the BG element with cacheAsBitmapMatrix):

The most important thing was to make sure GPU acceleration was working, and to learn what things will impact performance in that area.

It turns out, there are some important differences with how GPU accelerated Flash woks compared with the traditional software renderer. In the software Flash renderer keeping your display list shallow, and sparse (using addChild/removeChild a lot) or avoiding the display list completely (by writing to BitmapData – as the Flixel game engine does) is a key optimization for performance. This is how the exploding bunny video demo is done, and why it’s so fast.

My current theory is that on GPU accelerated content (even on desktop) the reverse is true. You want to avoid CPU/system RAM to GPU/video RAM updates as much as possible – which means avoiding BitmapData updates which cause the player to upload a new texture to the GPU VRAM with every update. Because I don’t have access to the internals of the Flash Player architecture, I can’t be sure, but I think the bottle neck comes from clogging up the lanes between the CPU and GPU, and all stressing all three areas of the rendering pipeline (CPU, GPU and the bus) as they juggle around objects in memory. The key observation this conclusion is based on is the large performance impact addChild and removeChild has on the framerate. So I relentlessly avoid that in my iPhone Flash development – I precache everything, and don’t mess with the display list. This is also one reason why filters (which operate on a BitmapData representation of the DisplayObject you apply them to) are not recommended on mobile content.

Anyway, hopefully I can turn this into a full app for iPhone in a reasonable timeframe. :-)

Stop All Child MovieClips in Flash with Actionscript 3.0

While trying to come up with a way to get two different movies loaded at the same time, to play at different frame rates, I came up with a method to recursively stop all child movies of an as3 MovieClip. I didn’t end up using it, but I thought it might be useful for someone, so here it is:

import flash.display.DisplayObjectContainer;
import flash.display.MovieClip;

function stopAll(content:DisplayObjectContainer):void
{
    if (content is MovieClip)
        (content as MovieClip).stop();
   
    if (content.numChildren)
    {
        var child:DisplayObjectContainer;
        for (var i:int, n:int = content.numChildren; i < n; ++i)
        {
            if (content.getChildAt(i) is DisplayObjectContainer)
            {
                child = content.getChildAt(i) as DisplayObjectContainer;
               
                if (child.numChildren)
                    stopAll(child);
                else if (child is MovieClip)
                    (child as MovieClip).stop();
            }
        }
    }
}

The plan was to use that to stop playing movies, every other frame, and restart them on the alternative frames, but there is apparently no way to tell if a MovieClip is currently playing or not, to know which ones to restart.

I did end up hacking Tweener to add support for a Timer based update method. That way I can adjust the stage FPS to match the older timeline content (at 12fps) and have my Tweener based interactions work at a silky smooth 60 FPS.

I’ll post more on that later.