define('DISALLOW_FILE_EDIT', true);
Some quick numbers from the performance-test with 1,000,000 iterations on a Core 2 Duo 2.6GHz (in milliseconds):
Func call time: 15
Runnable call time: 5
Event (1 listener) time: 863
Signal (1 listener) time: 260
SignalLite (1 listener) time: 232
RunnableSignal (1 listener) time: 56
Func call (10 listeners) time: 190
Runnable call (10 listeners) time: 399
Event (10 listeners) time: 2757
Signal (10 listeners) time: 741
SignalLite (10 listeners) time: 725
RunnableSignal (10 listeners) time: 221
The bold line is a vanilla SignalLite, and the line above Robert Penner’s AS3 Signals. They are pretty close, but SignalsLite takes a modest edge. But let’s look at the same test on iOS (iPhone 4S) with 100,000 iterations:
Func call time: 171
Runnable call time: 26
Event (1 listener) time: 3723
Signal (1 listener) time: 789
SignalLite (1 listener) time: 481
RunnableSignal (1 listener) time: 117
Func call (10 listeners) time: 2004
Runnable call (10 listeners) time: 1892
Event (10 listeners) time: 9217
Signal (10 listeners) time: 4030
SignalLite (10 listeners) time: 2074
RunnableSignal (10 listeners) time: 498
On iPhone you can see that SignalLite is almost twice as fast as AS3 Signals – a more substantial difference than on desktop. I’m not sure why that is, maybe the AOT compiler can optimize something about SignalLite better – IDK, but it sure is fast!
Then there’s that last line in each group – RunnableSignal. Now your talking speed. That one also solves a particular problem with function callback systems that they all seem to have – there is no compile time function signature checking. You have to wait until the thing runs, and then find out you are taking the wrong number of arguments, or the wrong type, etc. But, solving one problem (compile time type checking), solves the other (speed), and that brings us to SignalTyped which RunnableSignal in the test above extends (I’ll probably rename at some point).
SignalTyped is beginnings of a fast executing type safe implementation of AS3 Signals. The idea is, you extend 2 classes – SignalTyped and SlotLite. SignalTyped is effectively an abstract class – you must extend it and implement the dispatch method, and the constructor (at least for now, I’m looking for better ways to handle this). It takes a bit of boilerplate to implement this in a class that would expose signals. This example is based on the performance test from Jackson Dunstan’s CallbackTest which I borrowed (I hope that’s ok!):
[sourcecode language=”actionscript3″]
// Interface for your class that might have listeners for the SignalTyped.
// Make one of these per listener type.
interface IRunnable {
function run(): void;
}
// Custom Slot has a specific property for the Runnable class.
class RunnableSlot extends SlotLite
{
public function RunnableSlot( runnable:IRunnable ) {
this.runnable = runnable;
}
public var runnable:IRunnable = new EmptyRunnable;
}
// An empty IRunnable class for first node.
class EmptyRunnable implements IRunnable {
public function run():void {};
}
// You need one of these per dispatch type.
class RunnableSignal extends SignalTyped
{
// last and first must be set to the typed Slot.
public function RunnableSignal() {
last = first = new RunnableSlot;
}
// implement the dispatch method to call the runnable prop directly
// It’s easy to have it take and dispatch any type you want – with compile time type checking!
public function dispatchRunnable():void
{
var node:RunnableSlot = first as RunnableSlot;
while ( node = (node.next as RunnableSlot) ) {
node.runnable.run(); // FAST!
}
}
}
[/sourcecode]
That’s all necessary for the implementation requirements – a lot of boilerplate, I admit. Then you expose that in a class that might use it all:
[sourcecode language=”actionscript3″]
class MyDisplayObject
{
// could probably make this a getter..
public var signaled:RunnableSignal = new RunnableSignal;
}
[/sourcecode]
Now for the consumer to use this, it’s just a bit more boilerplate than a normal signal:
[sourcecode language=”actionscript3″]
class MyConsumerOfSignalLite implements IRunnable // boilerplate point 1
{
public function MyConsumerOfSignalLite()
{
var dspObj:MyDisplayObject = new MyDisplayObject();
// add the signal (boilerplate point 2 – normal)
dspObj.signaled.addSlot( this );
}
// boilerplate 3 – normal, but more strict – naming is specific – FAST!
public function run(): void {
// do whatever when signals
}
}
// boilerplates 2 and 3 are normal for any signal, except the strictness of #3
[/sourcecode]
What’s cool about this is you get compile time type checking for your method signature, and the performance improvement that comes with skipping those checks at runtime.
I’m also thinking about a slightly different signal API that would be more like the Robot Legs’ contract system – think signals by contract – I’m working on it. Since we would be implementing a defined interface per signal type, we could boil the add methods and signal nodes down to one method to add all the listeners of a single object – one add method per dispatching class, instead of one per signal on the dispatching class. This could lead to a reduction in boilerplate. We’d filter by interface type instead of using multiple signal.add nodes and methods. So – improved runtime performance, reduction in (usage) boilerplate (if not implementation) and compile time type checking. I love it!
Note – I tested none of the example in this post, and the code in github is all very early stage stuff. The performance-test class works though – give it a try!
Oh, here’s the github repo:
https://github.com/CaptainN/SignalsLite
Some caveats – this only tests instances of flash.display.Bitmap on the display list, at the size they are, moving the way they move. It’s on my list to add Blitting (I have some initial work on that done, thanks to Iain, but I need to add the rotation, and alpha settings to it), and I’d like to add a vector test, and maybe some extra sized Bitmaps (I’ve heard that makes a difference).
Enough! Here are some results – quality had no effect on GPU mode, so I included only one line:
Note: some are reporting they see a difference in GPU mode, but I still don’t. Update: It appears some users are confusing “Mobile Performance Tester” with BunnyMark, which explains the discrepancy. BunnyMark is not currently in any App Store, which is one key distinguishing feature.
BunnyMark Results – 500 Bunnies | |||||||||
---|---|---|---|---|---|---|---|---|---|
Alpha | ✓ | ✓ | ✓ | ||||||
Rotation | ✓ | ✓ | ✓ | ✓ | |||||
CaB | ✓ | ✓ | ✓ | ✓ | ✓ | ||||
CaBM | ✓ | ✓ | |||||||
iPhone 3GS – GPU | |||||||||
FPS | 24 | 18 | 17 | 22 | 13 | 13 | 19 | 1 | 19 |
iPhone 3GS – CPU | |||||||||
FPS-L | 28 | 21 | 19 | 9 | 19 | 5 | 7 | 5 | 5 |
FPS-M | 28 | 21 | 19 | 4 | 18 | 3 | 7 | 5 | 3 |
FPS-H | 28 | 21 | 19 | 3 | 18 | 2 | 7 | 5 | 2 |
FPS-B | 28 | 21 | 19 | 3 | 18 | 2 | 7 | 5 | 2 |
iPhone 4 (Retina) – GPU | |||||||||
FPS | 25 | 21 | 20 | 25 | 13 | 13 | 16 | 0.5 | 16 |
iPhone 4 (Retina) – CPU | |||||||||
FPS-L | 32 | 23 | 20 | 10 | 21 | 6 | 8 | 6 | 7 |
FPS-M | 32 | 23 | 20 | 5 | 20 | 3 | 8 | 6 | 3 |
FPS-H | 32 | 23 | 19 | 4 | 20 | 2 | 8 | 6 | 2 |
FPS-B | 32 | 23 | 19 | 4 | 20 | 2 | 8 | 6 | 2 |
Notes about the Benchmark:
It’d be nice to have more benchmarks for more devices, but I only have the above devices available. This should run just fine on Android, Blackberry Playbook, and iPads. If anyone wants to contribute a set of benchmarks, hit the comments. Here is the source. One of these days I’ll make another post, and try to draw some conclusions, maybe wrap the bullet points into a narrative, and edit some of this, but the tables are there, and the source code, and that’s the important stuff.
In the midst of playing with this benchmark, I found (or was pointed at) some great resources. Here are some of them:
Here is the Benchmark to see it in action:
This won’t run at 60FPS on Android Flash Player plugin in the browser (or Firefox on Mac!) – this post is about the iPhone build – but here’s the web version to look at anyway.
Here is a blurry video of the thing running as a native iPhone app on a 3GS (I smoothed out the choppy splash transition in a later build by setting the BG element with cacheAsBitmapMatrix):
The most important thing was to make sure GPU acceleration was working, and to learn what things will impact performance in that area.
It turns out, there are some important differences with how GPU accelerated Flash woks compared with the traditional software renderer. In the software Flash renderer keeping your display list shallow, and sparse (using addChild/removeChild a lot) or avoiding the display list completely (by writing to BitmapData – as the Flixel game engine does) is a key optimization for performance. This is how the exploding bunny video demo is done, and why it’s so fast.
My current theory is that on GPU accelerated content (even on desktop) the reverse is true. You want to avoid CPU/system RAM to GPU/video RAM updates as much as possible – which means avoiding BitmapData updates which cause the player to upload a new texture to the GPU VRAM with every update. Because I don’t have access to the internals of the Flash Player architecture, I can’t be sure, but I think the bottle neck comes from clogging up the lanes between the CPU and GPU, and all stressing all three areas of the rendering pipeline (CPU, GPU and the bus) as they juggle around objects in memory. The key observation this conclusion is based on is the large performance impact addChild and removeChild has on the framerate. So I relentlessly avoid that in my iPhone Flash development – I precache everything, and don’t mess with the display list. This is also one reason why filters (which operate on a BitmapData representation of the DisplayObject you apply them to) are not recommended on mobile content.
Anyway, hopefully I can turn this into a full app for iPhone in a reasonable timeframe.
]]>[cc lang=’actionscript3′]
import flash.display.DisplayObjectContainer;
import flash.display.MovieClip;
function stopAll(content:DisplayObjectContainer):void
{
if (content is MovieClip)
(content as MovieClip).stop();
if (content.numChildren)
{
var child:DisplayObjectContainer;
for (var i:int, n:int = content.numChildren; i < n; ++i)
{
if (content.getChildAt(i) is DisplayObjectContainer)
{
child = content.getChildAt(i) as DisplayObjectContainer;
if (child.numChildren)
stopAll(child);
else if (child is MovieClip)
(child as MovieClip).stop();
}
}
}
}
[/cc]
The plan was to use that to stop playing movies, every other frame, and restart them on the alternative frames, but there is apparently no way to tell if a MovieClip is currently playing or not, to know which ones to restart.
I did end up hacking Tweener to add support for a Timer based update method. That way I can adjust the stage FPS to match the older timeline content (at 12fps) and have my Tweener based interactions work at a silky smooth 60 FPS.
I'll post more on that later.