Actionscript 3.0 – unFocus Projects – Kevin Newman and Ken Newman

Fast AS3 Signals with SignalsLite

I was playing around and ended up writing a lite Signals class (ok, 3 classes). The set works like a basic AS3 Signal, minus most of the extra functionality of AS3 Signals (run-time dispatching argument type checking as one example). The goal was to create a very fast Signal dispatcher, with very little overhead, and to dispatch with absolutely no heap allocation (check, check and check) – targeted mostly for mobile (AIR). Regular AS3 Signals does well, but it seemed to have a lot of extra stuff that I don’t need – and this was a fun kind of exercise anyway.

Some quick numbers from the performance-test with 1,000,000 iterations on a Core 2 Duo 2.6GHz (in milliseconds):

Func call time: 15
Runnable call time: 5
Event (1 listener) time: 863
Signal (1 listener) time: 260
SignalLite (1 listener) time: 232
RunnableSignal (1 listener) time: 56

Func call (10 listeners) time: 190
Runnable call (10 listeners) time: 399
Event (10 listeners) time: 2757
Signal (10 listeners) time: 741
SignalLite (10 listeners) time: 725
RunnableSignal (10 listeners) time: 221

The bold line is a vanilla SignalLite, and the line above Robert Penner’s AS3 Signals. They are pretty close, but SignalsLite takes a modest edge. But let’s look at the same test on iOS (iPhone 4S) with 100,000 iterations:

Func call time: 171
Runnable call time: 26
Event (1 listener) time: 3723
Signal (1 listener) time: 789
SignalLite (1 listener) time: 481
RunnableSignal (1 listener) time: 117

Func call (10 listeners) time: 2004
Runnable call (10 listeners) time: 1892
Event (10 listeners) time: 9217
Signal (10 listeners) time: 4030
SignalLite (10 listeners) time: 2074
RunnableSignal (10 listeners) time: 498

On iPhone you can see that SignalLite is almost twice as fast as AS3 Signals – a more substantial difference than on desktop. I’m not sure why that is, maybe the AOT compiler can optimize something about SignalLite better – IDK, but it sure is fast!

Then there’s that last line in each group – RunnableSignal. Now your talking speed. That one also solves a particular problem with function callback systems that they all seem to have – there is no compile time function signature checking. You have to wait until the thing runs, and then find out you are taking the wrong number of arguments, or the wrong type, etc. But, solving one problem (compile time type checking), solves the other (speed), and that brings us to SignalTyped which RunnableSignal in the test above extends (I’ll probably rename at some point).

SignalTyped is beginnings of a fast executing type safe implementation of AS3 Signals. The idea is, you extend 2 classes – SignalTyped and SlotLite. SignalTyped is effectively an abstract class – you must extend it and implement the dispatch method, and the constructor (at least for now, I’m looking for better ways to handle this). It takes a bit of boilerplate to implement this in a class that would expose signals. This example is based on the performance test from Jackson Dunstan’s CallbackTest which I borrowed (I hope that’s ok!):

[sourcecode language=”actionscript3″]
// Interface for your class that might have listeners for the SignalTyped.
// Make one of these per listener type.
interface IRunnable {
function run(): void;
}

// Custom Slot has a specific property for the Runnable class.
class RunnableSlot extends SlotLite
{
public function RunnableSlot( runnable:IRunnable ) {
this.runnable = runnable;
}
public var runnable:IRunnable = new EmptyRunnable;
}

// An empty IRunnable class for first node.
class EmptyRunnable implements IRunnable {
public function run():void {};
}

// You need one of these per dispatch type.
class RunnableSignal extends SignalTyped
{
// last and first must be set to the typed Slot.
public function RunnableSignal() {
last = first = new RunnableSlot;
}

// implement the dispatch method to call the runnable prop directly
// It’s easy to have it take and dispatch any type you want – with compile time type checking!
public function dispatchRunnable():void
{
var node:RunnableSlot = first as RunnableSlot;
while ( node = (node.next as RunnableSlot) ) {
node.runnable.run(); // FAST!
}
}
}
[/sourcecode]

That’s all necessary for the implementation requirements – a lot of boilerplate, I admit. Then you expose that in a class that might use it all:

[sourcecode language=”actionscript3″]
class MyDisplayObject
{
// could probably make this a getter..
public var signaled:RunnableSignal = new RunnableSignal;
}
[/sourcecode]

Now for the consumer to use this, it’s just a bit more boilerplate than a normal signal:

[sourcecode language=”actionscript3″]
class MyConsumerOfSignalLite implements IRunnable // boilerplate point 1
{
public function MyConsumerOfSignalLite()
{
var dspObj:MyDisplayObject = new MyDisplayObject();
// add the signal (boilerplate point 2 – normal)
dspObj.signaled.addSlot( this );
}

// boilerplate 3 – normal, but more strict – naming is specific – FAST!
public function run(): void {
// do whatever when signals
}
}
// boilerplates 2 and 3 are normal for any signal, except the strictness of #3
[/sourcecode]

What’s cool about this is you get compile time type checking for your method signature, and the performance improvement that comes with skipping those checks at runtime.

I’m also thinking about a slightly different signal API that would be more like the Robot Legs’ contract system – think signals by contract – I’m working on it. Since we would be implementing a defined interface per signal type, we could boil the add methods and signal nodes down to one method to add all the listeners of a single object – one add method per dispatching class, instead of one per signal on the dispatching class. This could lead to a reduction in boilerplate. We’d filter by interface type instead of using multiple signal.add nodes and methods. So – improved runtime performance, reduction in (usage) boilerplate (if not implementation) and compile time type checking. I love it!

Note – I tested none of the example in this post, and the code in github is all very early stage stuff. The performance-test class works though – give it a try!

Oh, here’s the github repo:
https://github.com/CaptainN/SignalsLite

Performance Benchmarks with AIR 2.7 for iOS

I’ve been working on this Benchmark based on Iain Lobb’s BunnyMark. Being a bit confused sometimes about what things speed things up or slow things down, I didn’t want to guess anymore, so I grabbed Iain’s code base (cause I’m lazy, and didn’t want to start from scratch), and added some tests for things I suspect are slowing things down (or speeding things up). I think this will also help shed some light on why some folks see a huge gain in AIR 2.7 CPU mode, while others do not.

Some caveats – this only tests instances of flash.display.Bitmap on the display list, at the size they are, moving the way they move. It’s on my list to add Blitting (I have some initial work on that done, thanks to Iain, but I need to add the rotation, and alpha settings to it), and I’d like to add a vector test, and maybe some extra sized Bitmaps (I’ve heard that makes a difference).

Enough! Here are some results – quality had no effect on GPU mode, so I included only one line:

Note: some are reporting they see a difference in GPU mode, but I still don’t. Update: It appears some users are confusing “Mobile Performance Tester” with BunnyMark, which explains the discrepancy. BunnyMark is not currently in any App Store, which is one key distinguishing feature. 😉

BunnyMark Results – 500 Bunnies
Alpha							✓	✓	✓
Rotation				✓	✓	✓			✓
CaB		✓	✓		✓	✓		✓
CaBM			✓			✓
iPhone 3GS – GPU
FPS	24	18	17	22	13	13	19	1	19
iPhone 3GS – CPU
FPS-L	28	21	19	9	19	5	7	5	5
FPS-M	28	21	19	4	18	3	7	5	3
FPS-H	28	21	19	3	18	2	7	5	2
FPS-B	28	21	19	3	18	2	7	5	2
iPhone 4 (Retina) – GPU
FPS	25	21	20	25	13	13	16	0.5	16
iPhone 4 (Retina) – CPU
FPS-L	32	23	20	10	21	6	8	6	7
FPS-M	32	23	20	5	20	3	8	6	3
FPS-H	32	23	19	4	20	2	8	6	2
FPS-B	32	23	19	4	20	2	8	6	2

Notes about the Benchmark:

In general, the CPU mode seems pretty consistent with the way you’d expect things to work on the desktop – the same optimizations you’d apply for the browser plugin, you’d also apply to mobile for CPU mode.
Rotation in this benchmark is not continuous – the Bunny graphics are only rotated at the edge of the stage, which is why cacheAsBitmap works to speed those up. If they were constantly updated, it would likely be much more expensive on CPU mode (probably more like rotation without CaB).
Alpha is continuous – the alpha value of each Bunny is based on the y position and is updated every frame. I would like to add a mode similar to the rotation, so see what effect CaB has on alpha transparent objects that don’t constantly change.
iPhone 4 and 3GS numbers aren’t directly comparable for practical purposes. The Bitmaps on the screen on 3GS take up much more real estate, since the 3GS screen res has 1/4 as many pixels as the iPhone 4. In a normal app, we’d probably resize things to look comparable between the two devices. I’ll try to add a mode that makes this more comparable (because I suspect we’ll find that 3GS can keep up with iPhone 4 with similar looking content).
Touching the screen seems to cost about 4 fps across the board.
I think there may be an issue with returning to rotation = 0 costing some performance in GPU mode. Still have to test that.
I’m definitely getting some variance on default speeds – basically, before any settings are messed with on some runs I get the faster numbers (the baseline numbers in the tables above). Other times it runs at default settings a couple of FPS slower (on start, or after resetting the switches). With any of the settings, everything is consistent across multiple runs.

It’d be nice to have more benchmarks for more devices, but I only have the above devices available. This should run just fine on Android, Blackberry Playbook, and iPads. If anyone wants to contribute a set of benchmarks, hit the comments. Here is the source. One of these days I’ll make another post, and try to draw some conclusions, maybe wrap the bullet points into a narrative, and edit some of this, but the tables are there, and the source code, and that’s the important stuff.

In the midst of playing with this benchmark, I found (or was pointed at) some great resources. Here are some of them:

Understanding GPU Rendering in Adobe AIR for Mobile – Ton of good info here.
Mobile Performance Tester — Now Live in App Stores – This is very similar to what I’ve done, and has some of the same kinds of tests. It shows some different results though, which I’m a bit confused about. I suspect it has something to do with using larger images, which have more alpha transparency (smoother edges). I intend to look at it more closely to see what’s up in there.
Original BunnyMark by Iain Lobb
Source for BunnyMark for AIR Mobile

Here is the Benchmark to see it in action:

unBrix Alpha in Android Marketplace!!

My first Android app is in the market place! Built with Adobe AIR, unBrix Alpha is a quick take on the classic breakout style game. This is more of a “lite” game at this point (hence the “Alpha” suffix), but it is already more complete than many of the other Arkanoid clones in iOS App Store. There also seems to be some last minute performance problems on the Android version. :-/ I guess that’s what I get for only testing on an iPhone for most of the development. I’ll have fixes to that soon. I’m pretty sure it’s related to the scaleMode I set in Flash – the problem is if I set that the faster mode – NO_SCALE – it’s way too small on most Android devices. I’ll probably need to add some manual sizing based on measurement. Of course non of this was needed on the iPhone version.

Download unBrix Alpha and let me know what you think! I’d provide a link, but I don’t know how.

Update: I Nerfed the framerate a bit to get it to run a little smoother. I think the problem will be solved better by setting NO_SCALE, but I’ll have to do that another time (probably when I get to iPad port!). I also fixed the red line, and the icon too (I don’t know that didn’t show up last time). There is a report of the paddle jumping to one side when some users remove their finger from the screen. I haven’t been able to reproduce, but please let me know if this happens to you! Here is a link to unBrix Alpha on appbrain (it isn’t showing the update yet).

Update 2: I switched to CPU rendering, because it seems as though GPU rendering is just slower on Android devices than CPU rendering – at least in this kind of game. Anyway, this solved a lot of problems, including missing text and missing affects. I also had to set a fullScreenRect to match the original intended size of the game (iPhone 3Gs size). Doing these two things cleaned up most of the performance issues and graphics glitches. I’ll work on getting the remainder of the basics in place, like proper shutdowns – so this doesn’t run in the background like it does now (didn’t have to worry about that for iOS!).

Flash iPhone Game at Silky 60FPS on 3GS

Well, it’s only a tech demo at the moment. I’ve been playing with this Breakout like game for a while, trying to learn the ins and outs of Flash mobile development – particularly as it relates to performance. I now have the unBrix demo running at close to 60FPS (59.1) – smooth as silk.

This won’t run at 60FPS on Android Flash Player plugin in the browser (or Firefox on Mac!) – this post is about the iPhone build – but here’s the web version to look at anyway.

Here is a blurry video of the thing running as a native iPhone app on a 3GS (I smoothed out the choppy splash transition in a later build by setting the BG element with cacheAsBitmapMatrix):

The most important thing was to make sure GPU acceleration was working, and to learn what things will impact performance in that area.

It turns out, there are some important differences with how GPU accelerated Flash woks compared with the traditional software renderer. In the software Flash renderer keeping your display list shallow, and sparse (using addChild/removeChild a lot) or avoiding the display list completely (by writing to BitmapData – as the Flixel game engine does) is a key optimization for performance. This is how the exploding bunny video demo is done, and why it’s so fast.

My current theory is that on GPU accelerated content (even on desktop) the reverse is true. You want to avoid CPU/system RAM to GPU/video RAM updates as much as possible – which means avoiding BitmapData updates which cause the player to upload a new texture to the GPU VRAM with every update. Because I don’t have access to the internals of the Flash Player architecture, I can’t be sure, but I think the bottle neck comes from clogging up the lanes between the CPU and GPU, and all stressing all three areas of the rendering pipeline (CPU, GPU and the bus) as they juggle around objects in memory. The key observation this conclusion is based on is the large performance impact addChild and removeChild has on the framerate. So I relentlessly avoid that in my iPhone Flash development – I precache everything, and don’t mess with the display list. This is also one reason why filters (which operate on a BitmapData representation of the DisplayObject you apply them to) are not recommended on mobile content.

Anyway, hopefully I can turn this into a full app for iPhone in a reasonable timeframe. 🙂

Frash shows Flash on iPhone can be Great (with Screenshots)

Warning: RANT ahead

Steve Jobs is full of crap. I could actually understand and respect a straightforward admission that the Flash Platform is a threat to Apple’s iOS business model – which is the real reason Jobs won’t let Flash on the iPhone and iPad. That’s not even a very good reason – the App Store has many compelling features on it’s own, even if Flash is in the browser, not the least of which is the easy to understand path to monitization. Performance is another issue – Flash is fast enough, faster than JS/Canvas by quite a bit, but it’s still not as fast as a native app and all it’s OpenGL goodness (among all the other great Apple APIs). Keeping Flash off iPhone (and especially CS5 iPhone apps) has nothing to do with performance, or compatibility. That’s just bunk! And nobody likes a liar.

/rant

Here are some screenshots to show how well many sites I’ve been involved with work in Flash on the iPhone (3GS using jailbreak and Frash).

adcSTUDIO's home page — adcSTUDIO’s Home Page

adcSTUDIO's about us section — adcSTUDIO’s About Us Screen

Terminal Design (before activating Frash)

Terminal Design's Home Page (with Flash) — Terminal Design’s Home Page (with Flash)

A quit note on the technology: this is using Frash, which is a hack (a wrapper or compatibility layer) to get the Android version of Flash Player 10.1 running inside of iOS – it has not been optimized (or even completed at this point) to run well on iOS yet, and probably will never run as well as it can on Android. A recent test app I’ve been playing with runs 50% faster on a Droid Eris, vs. the iPhone 3GS – the Eris is slower hardware, running Android 2.1. It’s also crashy, and is missing features like streaming movie support (it does work with videos embedded in a swf) – and touch events are not quite as streamlined as they are on a real Android device (hover works better on Android for example, and hot spots are easier to hit). It’s also got all the quirks of the Wii and desktop Flash Player’s “noscale” feature (on Android there is a workaround that solves this, not implemented in Frash yet).

I just like to point that out, because some users are judging the viability of Flash on iOS/iPhone/iPad based on this hack (which is current at version 0.02), which is beyond silly.

For anyone interested, I followed these instructions to install it on my iPhone. Note: this uses SSH and dpkg to install. If you don’t know how to reverse that, you may want to find an apt repo and use Cydia to grab this, so you can uninstall Frash when you are done playing, as it can be quite unstable.