SignalsLite.js, Unit Testing, and Buggy IE

I decided to finally learn unit testing, so I downloaded QUnit (after looking at the 20,000 different unit testing options), and figured I’d give porting tiny SignalsLite to JavaScript a try, and see how the process goes.

While doing that, I found a crazy IE7/IE8 JS bug, that I’m sure has had me scratching my head in the past. Here is a quick unit test to show the problem:

test( "Basic Requirements", function testReqs() {
	expect(1);
	var T;
	(function makeT() {
		T=function T(){}
		T.prototype.test = 1;
	})();
	ok((new T).test, "Instance of exported T should have prototype methods");
});

If you run that IE7 or IE8 it’ll fail!

The cool thing is, without having created unit tests for SignalsLite.js, I would never have known that could be an issue, and instead would continue to scratch my head when stuff like that broke in IE7/8. I found this because I was trying to export SignalLite from within a closure (I try to always define my stuff inside of closures to avoid namespace pollution), with this:

(function() { "use strict"; // standard header

// naming inline functions makes the debug console easier to read.
window.SignalLite = function SignalLite() {
	// stuff
}
SignalLite.prototype = {
	// methods
};

// The fix is to use an anonymous function, or export elsewhere:
// window.SignalLite = SignalLite;

})();

For whatever reason, that doesn’t work in IE7 and IE8. Unit testing is crazy!

If you are interested, go fork SignalsLite.json GitHib.

P.S. You can run the SignalsLite.js unit tests here to see the fail for yourself! I disabled that test in the SignalsLite.js tests.

Backstage2D – the GPU Augmented Flash Display List

I’ve been playing with some 2D API ideas built on top of Flash’s Stage3D and Actionscript 3.0. I call it Backstage2D, the GPU augmented flash display list.

Currently, Backstage2D’s code base is mostly a playground for proof of concept of some API ideas. Some stuff in this post may not match the git repo (for example, I’m still using “layer” instead of “surface”). There’s a bunch left to do, but it is working enough to run a modified version of MoleHill_BunnMark that some folks from Adobe put together (I actually lifted most of my GPU code from that example code, heh). The BunnyMark example was adapted from Iain Lobb’s BunnyMark, with some additions from Phillipe Elsass. You can view the Backstage2D version of BunnyMark here (and check out the original BunnyMark MoleHill here).

Fork Backstage2D at GitHub.

The rest of this post describes the thought process that went into Backstage2D.

The Flash AS3 display list API is not the best way to utilize the massively parallel capabilities of a GPU, and deal with the other limitations of a CPU/GPU architecture. The display list’s deeply nestable DisplayObject metaphor, and all the fun filters and blend modes just doesn’t translate well to very parallel, flat GPU hardware renderer. All of this is especially true on mobile like iPhones, iPads and Android devices, and that’s the primary target for Backstage2D.

With an API like the traditional flash display list, it’s easy to create situations that can’t easily be batched due to branching operations and other things which change the GPU state, and break parallel processing – slowing everything down. You see this in Adobe AIR’s GPU render mode, where seemingly random things can have a huge negative impact on performance. Behind the scenes AIR attempts to break the content into batches to speed things up. The use of certain features, or normal features in certain ways can drop you out of a batch. When performance degradation happens, it’s not always clear why. Because of that, to get great performance you must target just a subset of the normal features, and apply a lot of discipline to make sure everything keeps working smoothly.

I wanted do something different. I wanted to play with an API that is intentionally unlike the Flash display list – one designed to help the implementor (Flash developer or designer) understand how to arrange their content, so that it renders very quickly, even on mobile devices – and still get the benefits of all the glorious Flash stuff we are all used to.

Here are some of the primary principles I came up with, which impact the API:

  • In order to take advantage of the parallel nature of GPUs, we need to batch many Quads (think, DisplayObject) into batches. The API should make batches easy to understand and use, so there’s no guessing about what’s going on.
  • GPUs like shallow content – they draw a lot triangles all at the same time. There is no nesting on the GPU, so while some form of organization is necessary, the infinite nesting model must be reigned in.
  • Backstage2D shouldn’t do too much in an automagic kind of way. Guessing about the impact of nesting things a certain way, or using a blend mode, or the performance impact of using certain features translates into extra effort and cost during production because of the unpredictable negative impact features can have on performance. Features should work as you expect them to, and the performance impacts of doing certain things should be clear.
  • Think of the GPU as a remote server that you send instructions to. Uploading things like Textures to the GPU from system memory is slow (especially on mobile). Backstage2D should make these stress points clear.
  • Flash’s vector engine is tops, and working with Bitmaps (and sprite sheets) sucks! The API should enable the continued use of the display list, in GPU friendly ways. Drawing vector art on the GPU is hard, and is ugly anyway. So leverage the CPU rasterizer, and make sure the API makes the GPU upload bandwidth and render time overhead clear.
  • Backstage2D objects shouldn’t look like traditional display list objects – we’ll use names other than Sprite, MovieClip, DisplayObject, etc.

Of these, batching is the starting point, since it is the most necessary for advanced performance, and effects how data must be organized the most. You can draw each Quad (think Flash DisplayObject, or Sprite) individually by setting up the vertex, program, texture, etc. data for each quad, and calling drawTriangle for each and every Sprite. But the GPU can’t optimize to run in parallel if you do that – most of its processing cores end up underutilized in that model.

Batching let’s more than one quad be draw simultaneously, but there are limitations – Every item in a batch must use the same vector data (a giant array of x,y,other data), single texture and other state information, like blend modes. Additionally, the entire batch must be drawn without interruption, which means you can’t insert items from other batches (with other state settings like a different blend mode) in the middle of the batch.

So batches resemble layers, or surfaces. The model for Backstage2D will be a series of stacked surfaces, instead of a deeply nested tree structure starting at the root.

In this paradigm, the surface gets batched, and the children it contains get rendered in parallel – perfect for GPUs. To eliminate batch breaking APIs, certain “state changing” operations can be applied to only an entire layer, not to each element – operations like blend mode settings – or adding and removing elements from a surface. The limitations of surface API should help the implementor understand the impact of doing certain things. If you need to have 100 elements, and every other element has a blend mode of Multiply, while the one below it has a blend mode of Normal – in the traditional Flash API, this is fine, and can actually run pretty well. On the GPU, all 100 elements must be rendered individually in 100 distinct surfaces. Having that many surfaces feels heavy because it is heavy.

Texture changes are one of the things that break batching – a Shader can deal with only one texture (well, actually up to 8 – but that makes the pixel shader more expensive to run), so a set of elements in a batch must be combined into a sprite sheet or texture atlas. If you’ve tried to use a texture atlas in another 2D rendering engine, you may have noticed these are a pain to deal with – and usually it requires setting them up manually before compilation. This is one thing that Backstage2D handles for you – at runtime – in an automagic kind of way.

This feature was actually done for a bunch of reasons. I’d like to add is a resolution (screen DPI) independent measurement mode, where assets get generated on each device an app runs on, from high quality vector art, for exactly the necessary DPI the system is running at, and scaled to real life sizes. Type specified at 12-point, should truly measure at 12-point.

Additionally, Flash vector art looks great (especially with the new 16X16 quality modes), but they look their best when rendered to match the screen exactly. Resizing prerendered vector art can ruin the beautiful anti-aliasing in vector art. Proper sizing can also help performance with older hardware like an iPhone 3GS, which is actually pretty capable, but doesn’t cope well with iPhone 4 retina screen sized material (4x more pixels than will be displayed).

Setting all this up is expensive – especially generating the sprite sheet. But just setting up vector data and loading even predigested textures is already expensive enough that you wouldn’t want to do those tasks while your app is running some smooth animation – it will cause missed frames, and your users will notice. So Backstage2d’s API should guid the user to avoid doing expensive things while an app or animation is running. It exposes a build, load, and/or upload commands per layer. That way, the implementer always knows that what they are doing is computationally expensive (down the road, the plan would be to move much of that into concurrency – more on that another time).

The characteristics of this are very different from normal Flash, which is to load only the minimum of whats needed, when it’s needed, and try to keep as much as you can off the display list. In the Backstage2D model (in the standard surface type anyway), an entire surface, and all it’s children (called “Actors” to avoid colliding with AS3’s “Sprite”, etc.) gets rendered up front to a big TextureAtlas, and stored in memory or on disk. How to optimize and organize your assets to avoid running out of memory becomes an entirely different matter from the way to optimize for the CPU. A surface will have an associated sprite sheet bitmapData asset though, which can be measured.

With these restrictions in mind, the idea would be to create a variety of surface types to suit differing kinds of content. For classic content, a standard static Quads surface (done!), still frame animations (sprite sheet animations – generated at runtime), tweened animations (inverse kinematics – the bone tool), and streaming animations (dynamic MovieClips, large MovieClips, or video) – maybe even some surfaces useful for standard UI, like scroll panes. For more advanced 2D assets, a variety of different mesh layer types could be added (that’s where GPU stage3D programming gets fun!).

I’d love to flesh this out with more features, including an animation subsystem that would include a couple of different Animation display types. Alas, free time is short, and I’ll probably never get to it. But I already spent a lot of time on this (I broke my foot, and was couch bound for a while) so thought I’d share where I got to. :-)

Scripts n Styles Update 3.1

Scripts n Styles received a major update today. The two big features added are LESS.js support and Dynamic Shortcodes! The “Global” Settings page now has a LESS editor with syntax highlighting (via CodeMirror) and on-the-fly compiling so you can see how it’ll be outputted on the theme-side. The per-page meta-box has gained a new tab in which you can create one-off shortcodes which can contain arbitrary HTML content.

Scripts n Styles is a free OpenSource GPL project that you can fork and contribute to on github! (You can also fork and contribute to CodeMirror and LESS.js)

As a Shortcode example: I placed the following html into the Shortcodes tab and gave it the name “tweet test”.

<a href="https://twitter.com/share" data-via="WraithKenny" data-size="large" data-related="unFocusProjects" data-hashtags="ScriptsnStyles">Tweet</a>
<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>

I then use the shortcode [sns_shortcode name="tweet test"] to display:

Fast AS3 Signals with SignalsLite

I was playing around and ended up writing a lite Signals class (ok, 3 classes). The set works like a basic AS3 Signal, minus most of the extra functionality of AS3 Signals (run-time dispatching argument type checking as one example). The goal was to create a very fast Signal dispatcher, with very little overhead, and to dispatch with absolutely no heap allocation (check, check and check) – targeted mostly for mobile (AIR). Regular AS3 Signals does well, but it seemed to have a lot of extra stuff that I don’t need – and this was a fun kind of exercise anyway.

Some quick numbers from the performance-test with 1,000,000 iterations on a Core 2 Duo 2.6GHz (in milliseconds):

Func call time: 15
Runnable call time: 5
Event (1 listener) time: 863
Signal (1 listener) time: 260
SignalLite (1 listener) time: 232
RunnableSignal (1 listener) time: 56

Func call (10 listeners) time: 190
Runnable call (10 listeners) time: 399
Event (10 listeners) time: 2757
Signal (10 listeners) time: 741
SignalLite (10 listeners) time: 725
RunnableSignal (10 listeners) time: 221

The bold line is a vanilla SignalLite, and the line above Robert Penner’s AS3 Signals. They are pretty close, but SignalsLite takes a modest edge. But let’s look at the same test on iOS (iPhone 4S) with 100,000 iterations:

Func call time: 171
Runnable call time: 26
Event (1 listener) time: 3723
Signal (1 listener) time: 789
SignalLite (1 listener) time: 481
RunnableSignal (1 listener) time: 117

Func call (10 listeners) time: 2004
Runnable call (10 listeners) time: 1892
Event (10 listeners) time: 9217
Signal (10 listeners) time: 4030
SignalLite (10 listeners) time: 2074
RunnableSignal (10 listeners) time: 498

On iPhone you can see that SignalLite is almost twice as fast as AS3 Signals – a more substantial difference than on desktop. I’m not sure why that is, maybe the AOT compiler can optimize something about SignalLite better – IDK, but it sure is fast!

Then there’s that last line in each group – RunnableSignal. Now your talking speed. That one also solves a particular problem with function callback systems that they all seem to have – there is no compile time function signature checking. You have to wait until the thing runs, and then find out you are taking the wrong number of arguments, or the wrong type, etc. But, solving one problem (compile time type checking), solves the other (speed), and that brings us to SignalTyped which RunnableSignal in the test above extends (I’ll probably rename at some point).

SignalTyped is beginnings of a fast executing type safe implementation of AS3 Signals. The idea is, you extend 2 classes – SignalTyped and SlotLite. SignalTyped is effectively an abstract class – you must extend it and implement the dispatch method, and the constructor (at least for now, I’m looking for better ways to handle this). It takes a bit of boilerplate to implement this in a class that would expose signals. This example is based on the performance test from Jackson Dunstan’s CallbackTest which I borrowed (I hope that’s ok!):

// Interface for your class that might have listeners for the SignalTyped.
// Make one of these per listener type.
interface IRunnable {
	function run(): void;
} 

// Custom Slot has a specific property for the Runnable class.
class RunnableSlot extends SlotLite
{
	public function RunnableSlot( runnable:IRunnable ) {
		this.runnable = runnable;
	}
	public var runnable:IRunnable = new EmptyRunnable;
}

// An empty IRunnable class for first node.
class EmptyRunnable implements IRunnable {
	public function run():void {};
}

// You need one of these per dispatch type.
class RunnableSignal extends SignalTyped
{
	// last and first must be set to the typed Slot.
	public function RunnableSignal() {
		last = first = new RunnableSlot;
	} 

	// implement the dispatch method to call the runnable prop directly
	// It's easy to have it take and dispatch any type you want - with compile time type checking!
	public function dispatchRunnable():void
	{
		var node:RunnableSlot = first as RunnableSlot;
		while ( node = (node.next as RunnableSlot) ) {
			node.runnable.run(); // FAST!
		}
	}
}

That’s all necessary for the implementation requirements – a lot of boilerplate, I admit. Then you expose that in a class that might use it all:

class MyDisplayObject
{
	// could probably make this a getter..
	public var signaled:RunnableSignal = new RunnableSignal;
}

Now for the consumer to use this, it’s just a bit more boilerplate than a normal signal:

class MyConsumerOfSignalLite implements IRunnable // boilerplate point 1
{
	public function MyConsumerOfSignalLite()
	{
		var dspObj:MyDisplayObject = new MyDisplayObject();
		// add the signal (boilerplate point 2 - normal)
		dspObj.signaled.addSlot( this );
	} 
    
	// boilerplate 3 - normal, but more strict - naming is specific - FAST!
	public function run(): void {
		// do whatever when signals
	}
}
// boilerplates 2 and 3 are normal for any signal, except the strictness of #3

What’s cool about this is you get compile time type checking for your method signature, and the performance improvement that comes with skipping those checks at runtime.

I’m also thinking about a slightly different signal API that would be more like the Robot Legs’ contract system – think signals by contract – I’m working on it. Since we would be implementing a defined interface per signal type, we could boil the add methods and signal nodes down to one method to add all the listeners of a single object – one add method per dispatching class, instead of one per signal on the dispatching class. This could lead to a reduction in boilerplate. We’d filter by interface type instead of using multiple signal.add nodes and methods. So – improved runtime performance, reduction in (usage) boilerplate (if not implementation) and compile time type checking. I love it!

Note – I tested none of the example in this post, and the code in github is all very early stage stuff. The performance-test class works though – give it a try!

Oh, here’s the github repo:
https://github.com/CaptainN/SignalsLite

Adobe’s Flash/AIR Messaging Nightmare

Update: Mike Chambers posted an explanation and clarification on where Adobe is headed with Flash and AIR. Update 2: TechCrunch picks up (part of) the narrative.

I published an old post with my thoughts on the “Flash is Dead” thing that pops up routinely in media circles after anything happens to shake things up (like an Apple ban on Flash, or Adobe dropping a supported platform, etc.) yesterday. I optimistically highlighted in that piece the promise that AIR technology represents – it’s even in the title “Flash and AIR, Nothing But Opportunity“. I really believe the technology represents, and could fulfill all the promise those of us down in the weeds perceive. I also believe that Adobe’s Flash Platform engineers and evangelists also see that promise, and would like to see it fulfilled.

Yesterday Adobe unceremoniously dropped support for an entire class of platforms. No more Flash Player in mobile browsers. It’s not a terrible technical decision – working in AIR and native app land offers a ton more flexibility. It even makes business sense. Browser makers are increasingly hostile to Flash – Apple has never let it in the door on iOS (and never will), and Microsoft announced plans to kill off plugins even on the desktop in Windows 8 Metro interface. Browsers have become hostile territory for Flash, so it makes sense to move emphasis in the two directions the industry is headed – app store apps with AIR (which no one knows about) and HTML5 for browsers. In an important way, this does mean Flash is dead – it’s not going to be in the browser going forward. It really is out of Adobe’s control.

But there’s a problem. The longer Adobe’s bumbles the messaging, the harder it is to say for sure whether there is a lack of commitment to their platform (including AIR), or if it is truly just a PR problem. This kind of announcement had an easy to predict effect on Flash’s brand, yet there was no attempt to get out in front of that narrative that would show they are committed to the larger “Flash Platform” of which AIR is an important part going forward. In the non-technical parts of the industry – the media, managers, and creative side of production teams – they all heard Adobe Flash is out of mobile – use HTML5. It’s even worse in client land, where the term “HTML5 app” is used regularly along with “app store” – this news was so harmful to them, that clients with existing Flash content, which can be ported to the app space easily with AIR, are really freaking out. I can tell them about AIR all I want, but it’s hard for me to counteract all the media buzz (repetition is reality – brain science).

But what if they got the right message. This kind of move could represent a real intent on the part of Adobe’s leadership to get out of the Flash Platform altogether, and maybe out of the platform space entirely, and focus instead only on tooling to produce for the platform commons that HTML5 represents. Look at the kinds of decisions they’ve made recently. Adobe has essentially dropped internal support for their “Flash Platform” on every system platform they can, by either straight up dumping it (Linux, mobile flash, TV), or by farming out porting and support to partners like RIM.

On the other hand, Adobe and Flash evangelists and engineers seem committed to the “Flash Platform” which in an un-articulated narrative (narrative – it’s how we think – more brain science), really means AIR in app stores (mobile and desktop), but I’m not sure I’m getting the same message from the real decision makers at Adobe. I don’t know if it’s intent, or just plain old bad PR judgement, but it feels like I’m standing on the greasy platform, and it’s getting pretty tough to hold my balance. Some folks are already sliding off.

I think they are in it for the long haul, and they’ve even built some of their own apps on the little known Flash based mobile app technology that is AIR. But guessing someone’s intent is problematic – that only makes the PR problem clearer. I shouldn’t have to guess.

It boils down to this. I know technology, and I know the Flash Platform. I know it has merit and potential. But if people can’t tell if the decision makers at Adobe are serious about supporting it into the future, it’s going to be a tough haul to convince anyone to build anything on that platform. I already know a few platforms, including HTML, learning a new one isn’t scary, but I really prefer Flash and AIR because of it’s potential and even it’s legacy, which has value (despite the tar Steve Jobs dumped on it). If Adobe can’t or won’t make it clear that they are committed to AIR and the Flash Platform, I’ll have to find an alternative – and the decision won’t be mine. At this point, we need a clear unambiguous statement of intent from Adobe – are you committed to the Flash Platform and AIR, or not? A public roadmap wouldn’t hurt either.

Home of Scripts 'n Styles for WordPress, Backstage2D and History Keeper!