iPhone – unFocus Projects – Kevin Newman and Ken Newman

Backstage2D – the GPU Augmented Flash Display List

I’ve been playing with some 2D API ideas built on top of Flash’s Stage3D and Actionscript 3.0. I call it Backstage2D, the GPU augmented flash display list.

Currently, Backstage2D’s code base is mostly a playground for proof of concept of some API ideas. Some stuff in this post may not match the git repo (for example, I’m still using “layer” instead of “surface”). There’s a bunch left to do, but it is working enough to run a modified version of MoleHill_BunnMark that some folks from Adobe put together (I actually lifted most of my GPU code from that example code, heh). The BunnyMark example was adapted from Iain Lobb’s BunnyMark, with some additions from Phillipe Elsass. You can view the Backstage2D version of BunnyMark here (and check out the original BunnyMark MoleHill here).

Fork Backstage2D at GitHub.

The rest of this post describes the thought process that went into Backstage2D.

The Flash AS3 display list API is not the best way to utilize the massively parallel capabilities of a GPU, and deal with the other limitations of a CPU/GPU architecture. The display list’s deeply nestable DisplayObject metaphor, and all the fun filters and blend modes just doesn’t translate well to very parallel, flat GPU hardware renderer. All of this is especially true on mobile like iPhones, iPads and Android devices, and that’s the primary target for Backstage2D.

With an API like the traditional flash display list, it’s easy to create situations that can’t easily be batched due to branching operations and other things which change the GPU state, and break parallel processing – slowing everything down. You see this in Adobe AIR’s GPU render mode, where seemingly random things can have a huge negative impact on performance. Behind the scenes AIR attempts to break the content into batches to speed things up. The use of certain features, or normal features in certain ways can drop you out of a batch. When performance degradation happens, it’s not always clear why. Because of that, to get great performance you must target just a subset of the normal features, and apply a lot of discipline to make sure everything keeps working smoothly.

I wanted do something different. I wanted to play with an API that is intentionally unlike the Flash display list – one designed to help the implementor (Flash developer or designer) understand how to arrange their content, so that it renders very quickly, even on mobile devices – and still get the benefits of all the glorious Flash stuff we are all used to.

Here are some of the primary principles I came up with, which impact the API:

In order to take advantage of the parallel nature of GPUs, we need to batch many Quads (think, DisplayObject) into batches. The API should make batches easy to understand and use, so there’s no guessing about what’s going on.
GPUs like shallow content – they draw a lot triangles all at the same time. There is no nesting on the GPU, so while some form of organization is necessary, the infinite nesting model must be reigned in.
Backstage2D shouldn’t do too much in an automagic kind of way. Guessing about the impact of nesting things a certain way, or using a blend mode, or the performance impact of using certain features translates into extra effort and cost during production because of the unpredictable negative impact features can have on performance. Features should work as you expect them to, and the performance impacts of doing certain things should be clear.
Think of the GPU as a remote server that you send instructions to. Uploading things like Textures to the GPU from system memory is slow (especially on mobile). Backstage2D should make these stress points clear.
Flash’s vector engine is tops, and working with Bitmaps (and sprite sheets) sucks! The API should enable the continued use of the display list, in GPU friendly ways. Drawing vector art on the GPU is hard, and is ugly anyway. So leverage the CPU rasterizer, and make sure the API makes the GPU upload bandwidth and render time overhead clear.
Backstage2D objects shouldn’t look like traditional display list objects – we’ll use names other than Sprite, MovieClip, DisplayObject, etc.

Of these, batching is the starting point, since it is the most necessary for advanced performance, and effects how data must be organized the most. You can draw each Quad (think Flash DisplayObject, or Sprite) individually by setting up the vertex, program, texture, etc. data for each quad, and calling drawTriangle for each and every Sprite. But the GPU can’t optimize to run in parallel if you do that – most of its processing cores end up underutilized in that model.

Batching let’s more than one quad be draw simultaneously, but there are limitations – Every item in a batch must use the same vector data (a giant array of x,y,other data), single texture and other state information, like blend modes. Additionally, the entire batch must be drawn without interruption, which means you can’t insert items from other batches (with other state settings like a different blend mode) in the middle of the batch.

So batches resemble layers, or surfaces. The model for Backstage2D will be a series of stacked surfaces, instead of a deeply nested tree structure starting at the root.

In this paradigm, the surface gets batched, and the children it contains get rendered in parallel – perfect for GPUs. To eliminate batch breaking APIs, certain “state changing” operations can be applied to only an entire layer, not to each element – operations like blend mode settings – or adding and removing elements from a surface. The limitations of surface API should help the implementor understand the impact of doing certain things. If you need to have 100 elements, and every other element has a blend mode of Multiply, while the one below it has a blend mode of Normal – in the traditional Flash API, this is fine, and can actually run pretty well. On the GPU, all 100 elements must be rendered individually in 100 distinct surfaces. Having that many surfaces feels heavy because it is heavy.

Texture changes are one of the things that break batching – a Shader can deal with only one texture (well, actually up to 8 – but that makes the pixel shader more expensive to run), so a set of elements in a batch must be combined into a sprite sheet or texture atlas. If you’ve tried to use a texture atlas in another 2D rendering engine, you may have noticed these are a pain to deal with – and usually it requires setting them up manually before compilation. This is one thing that Backstage2D handles for you – at runtime – in an automagic kind of way.

This feature was actually done for a bunch of reasons. I’d like to add is a resolution (screen DPI) independent measurement mode, where assets get generated on each device an app runs on, from high quality vector art, for exactly the necessary DPI the system is running at, and scaled to real life sizes. Type specified at 12-point, should truly measure at 12-point.

Additionally, Flash vector art looks great (especially with the new 16X16 quality modes), but they look their best when rendered to match the screen exactly. Resizing prerendered vector art can ruin the beautiful anti-aliasing in vector art. Proper sizing can also help performance with older hardware like an iPhone 3GS, which is actually pretty capable, but doesn’t cope well with iPhone 4 retina screen sized material (4x more pixels than will be displayed).

Setting all this up is expensive – especially generating the sprite sheet. But just setting up vector data and loading even predigested textures is already expensive enough that you wouldn’t want to do those tasks while your app is running some smooth animation – it will cause missed frames, and your users will notice. So Backstage2d’s API should guid the user to avoid doing expensive things while an app or animation is running. It exposes a build, load, and/or upload commands per layer. That way, the implementer always knows that what they are doing is computationally expensive (down the road, the plan would be to move much of that into concurrency – more on that another time).

The characteristics of this are very different from normal Flash, which is to load only the minimum of whats needed, when it’s needed, and try to keep as much as you can off the display list. In the Backstage2D model (in the standard surface type anyway), an entire surface, and all it’s children (called “Actors” to avoid colliding with AS3’s “Sprite”, etc.) gets rendered up front to a big TextureAtlas, and stored in memory or on disk. How to optimize and organize your assets to avoid running out of memory becomes an entirely different matter from the way to optimize for the CPU. A surface will have an associated sprite sheet bitmapData asset though, which can be measured.

With these restrictions in mind, the idea would be to create a variety of surface types to suit differing kinds of content. For classic content, a standard static Quads surface (done!), still frame animations (sprite sheet animations – generated at runtime), tweened animations (inverse kinematics – the bone tool), and streaming animations (dynamic MovieClips, large MovieClips, or video) – maybe even some surfaces useful for standard UI, like scroll panes. For more advanced 2D assets, a variety of different mesh layer types could be added (that’s where GPU stage3D programming gets fun!).

I’d love to flesh this out with more features, including an animation subsystem that would include a couple of different Animation display types. Alas, free time is short, and I’ll probably never get to it. But I already spent a lot of time on this (I broke my foot, and was couch bound for a while) so thought I’d share where I got to. 🙂

Flash and AIR, Nothing But Opportunity

Preface: I wrote this one of the last few times the Flash is dead thing made the media rounds, because it seems as though many participants in the discussion are simply missing the bigger picture, that the market for rich interactive work is splitting between app store apps (native applications), and desktop browser-based apps (websites), and that those divisions are deep enough to require different development mindsets. The post is overly long – I don’t have an editor – but I figured I’d post it in its current draft state, since this keeps coming up, and so I don’t have to noodle with it anymore. 😛 So here it is. (Instant update: Lee Brimelow has said similar things in fewer words on his blog Update 2: Thibault Imbert chimes in. Update 3: Mike Chambers rolls the narrative. Now back to making awesome!).

In the technology business, if you aren’t looking ahead, you are being left behind. There is fundamental shift occurring in the content technology space, where Flash and HTML live their happy lives. This shift has mostly been explained using old terms, like “apps” and “HTML5 vs. Flash” – these explanations miss the point. They all describe how things were yesterday and are today, but miss how they will be tomorrow. The browser has been and is today, the primary means of application and content delivery. A new set of opportunities for delivering content are changing all that. The Split puts the traditional desktop browser market on one side, and app stores on new platforms, with new hardware, and new interface paradigms on the other.

App stores should be more broadly called content stores, because the line between apps and other kinds of content is pretty thin. Market specific content stores have been around for a while already on the desktop. Game shops like Steam and Direct2Drive already make up the lion’s share of the PC games market, and iTunes was already a form of an app store, before apps where apps.

The companies behind every platform are adopting apps stores, including all major operating systems on traditional PCs, including OSX, and soon Windows. Open source trail blazers like Ubuntu have actually had something like app stores for a long while now. Additionally, more and more types of content are being pulled into them, from apps, to music and movies, to Magazines and local newspapers. The models for monetization are so much clearer, and the tools to take advantage of the various models are already built, and for the consumer, very convenient. App stores are the new reality.

To really understand why this is happening, and what it means for those of us who make a living in the weeds, we need to understand where we are, and how we got here.

The PC Era

In the early days of personal computing, “applications” (or “programs”) were the hot action. You needed something to do with your new beige personal computer (PC), so you bought (or borrowed) software or other types of content on diskettes, and later CD-ROM (oh the magic) and installed that software to run on your PC or Mac. It was an offline process, but it was the only realistic way to go. Even if you had access to the internet, you weren’t going to download megabytes of data over your cutting edge 14.4KB fax/modem connection. Traditional forms of acquisition ruled in those days. You had to take yourself to the store, and buy a box or a publication or whatever else, to obtain content – probably paying with cash.

When the internet hit mainstream in the 90s, and data speeds increased, the transition from “applications” delivered through boxed diskettes, to continuously updated “websites” began. The internet had some advantages over boxed content. The biggest was that accessing a web site through the internet was exceedingly convenient for consumers. Far more convenient than traveling to the store and buying a box with a CD of clip art on it. For content producers there is also a sense of limitless shelf space compared with traditional retail outlets, so they were quick to try to carve out advantage there. Search engines and content indexing services like Google and Yahoo! made a killing on both ends by providing a way for content producers to get their content in front of users.

Broadband completed the transition. At the dawn of the new millennium and “the internet,” became the primary means of content and application delivery (aside from a few important smaller markets like games and productivity apps). The browser was the primary means of application and content delivery, and for good reason. The content is easy to access from multiple platforms, and is super convenient. All you need is an internet connection, and a browser.

A Flash of brilliance

At around the same time, Microsoft mostly won “the browser wars” with Internet Explorer 6, and basically stopped forward movement in their browser, and for many years, the internet – the commerce in the browser era’s “website” based economy was able to mature. The stagnant development of the dominant browser platform created a challenging environment, one in which it’s easy to see why Flash was able to thrive.

Flash brought many improvements over the browser, through constant performance and scriptability advancements, as well as significant additional features the browsers in the aggregate simply couldn’t match – video being an important notable feature. Additionally, Flash provided consistency across browsers and operating systems, and comparably great performance, when measured against HTML and JavaScript. A browser-based app simply couldn’t (still can’t) match it. Flash in the browser became the go to platform for serious interactive work on the internet. You just couldn’t get similar levels of awesome out of IE6 and the rest of the browsers of the time.

All good things

The split started to happen in 2006. On the PC, which really means in the PC browser, Adobe was getting more serious about the application space in the browser by releasing the first version of Flash with AVM2 (and s 3.0), a much more stable foundation than Actionscript 2.0 had been, along with an update to its application framework, Flex that took advantage of the improvements to Actionscript 3.0. This helped move trends in Flash’s direction, as seemingly every great site was build using the plugin technology. IE7 had come out that same year, but it only added to developers’ pain in the short-term, and it still wasn’t the robust interoperable platform that browser ecosystem needed to compete in the applications space. So in that space, movement continued toward Flash.

This could be considered the golden age for Flash. Flash ruled the content space during that time, in everything from banner ads, to browser-based games, to anything dealing with charts, and data (so-called RIAs), to just about all the video delivery on the internet.

Browsers didn’t come without problems. They have been slow to innovate, incompatible with one another – universally slow, buggy and crashy – and often full of horrible security holes (especially IE – the dominant player). They were mired in standards battles, forks, company and social politics (open source/EU fines) – but mostly, the leader – Microsoft with IE6, just held everything up. On top of all that, it was difficult for content producers, like traditional newspapers, to find revenue sources other than ad systems. The market was set for change.

That’s about when Apple fired the first warning shots across the bows of the PC browser fleet, by releasing the first iPhone, which could browse the internet, but didn’t run Flash. A brand new platform – software and hardware, with a brand new interface paradigm – touch, instead of mouse and keyboard. This would be a platform built from the lessons of the browser era, and it provided a wide open space for Apple to do what it does best. They rapidly iterated on their ecosystem, and came up with the overwhelmingly successful App Store, a system that seemingly everyone wanted in to. This was a system that came with multiple obvious revenue systems built right in – app sales, technology cross-licensing, advertising, etc. – all things that could be done in the browser space, but the app store made exceedingly convenient, to both producers and consumers. Apple catered to that demand masterfully, and over time expanded opportunities to include, in-app purchases, magazine publishing platforms, and subscriptions services, among others.

In the same way the internet – the modern PC era – had provided enough advantages over the previous content delivery systems to overshadow any of its shortcomings, the App Store model would provide enough promise to overshadow its possible shortcomings measured against its predecessor. App stores proved so compelling, and so big a threat to the existing browser-based models, it almost immediately ended a cozy relationship between Apple and Google, who ruled the browser era, as the gatekeeper to content, and the owner of essentially all advertising on the web. Google moved quickly to duplicate the app system for Android, and the other platform makers – WebOS, and Microsoft Windows Phone 7 Series – have been playing catch-up ever since. Eventually, Apple brought the app store system to the desktop in OSX Lion, and even Microsoft is picking it up in their Windows 8 Metro interface for full app store coverage in the traditional PC markets.

The rapidly evolving iPhone (later iOS) platform created new ways to think about a lot things. The most important new things were app and content delivery, and revenue sources through new monetization strategies. The Apple App Store changed everything.

The end of an era

When Apple released the iPad in April, 2010, Steve Jobs announced the “end of the PC era.” With the release of the iPad Apple did nothing less than complete and publish the rule book rewrite they began with the iPhone. More than anyone else, the folks at Apple seemed to understand that there is a divide between the “PC era” – which is really the “PC browser era” – and the new app store era. They understood that these two are on two different trajectories, and the app store era will supersede the browser. From now on, for better or for worse, applications would exist in App Stores, and websites would just be websites.

In the same month Apple announced the iPad, Steve Jobs followed up with a special letter in his open letter titled “Thoughts on Flash”, which highlighted some of the negatives of the browser-based “PC era,” where Flash was settling in as the dominant platform. The letter also exploited a division between the Flash crowd and the standards and open source crowds. And he directly addressed the “full web,” – Adobe’s tone-deaf name for “the PC era”. In that direct critique Jobs highlighted the disadvantages of the new app store model, by putting the “full web” flash apps in the “free” – or unprofitable box, and painting the technology with the old brush. Even the main part of the label “PC” is an old term, from a time that came before the modern browser era.

That letter was truly a brilliant piece of market positioning magic, but it was ultimately unnecessary, and Apple has since backed off. The app store model provides a marvelous promise without the need to degenerate the old browser based economy. Content makers, all of whom struggled to find revenue from websites, now have multiple new revenue streams to explore, through app sales, and licensing, and other kinds of content transactions within apps.

During the PC Era, browsers dominated users’ mind share, and time on the PC, native applications were still the clear leaders in performance, access to hardware, and close integration with the underlying OS platform. Despite that advantage, native apps were hamstrung by seemingly insurmountable inconvenience – the boxed distribution model – an inconvenience that most online distribution stores of the time simply duplicated (download, unzip setup, run setup, store setup file somewhere in case you lose your hard drive, etc.).

App stores solve these native application distribution problems by providing a central hub for content, simple e-commerce (no more credit card into the random unverified website), and can be integrated with the legacy system – the website.

My head hurts.

So what does this all mean for us, the front line Flash developers? It means opportunity. There are now three platforms to develop for!

Yeah, that’s right – three.

The transition to app stores on the desktop will take a while to roll out, and old habbits die hard, and Flash will stick around in that space for .. well, as long as that space exists. There are still a chunk of 98%+ of the user out there on the internet, still accessing the web through their existing PCs. That won’t change overnight. Even initiatives like Microsoft’s plugin blockade with Windows 8 and Metro mode take effect, they will come hand in hand with app stores, so there’s a workaround.

But let’s get real for a second, the Flash Player – in the browser – sits at the core of entire new lucrative markets on the PC, in the browser. Take browser era social gaming and Zynga – a game company, with a quirky social media, micro-transaction game library, integrated with Facebook’s social platform, is more profitable than top traditional PC game companies like E.A. Flash in the browser is having a grand time. Stage3d was just released, Unreal Engine was shown running on it at MAX. Flash is still tops for the best kinds of awesome on the internet.

Second, you have all the HTML5 opportunity – not directly relevant for Flash devs (yet), but for those of us that have had their hands in both worlds this whole time, this is exciting! HTML, JavaScript and CSS are finally getting to the point where you can build really awesome stuff with it. And, for app store monetization to work, discovery is key. Searchable HTML (and HTML5) will dominate for that. App stores are easy to search and easy to link into – from a website. Websites aren’t going anywhere – in every way, the app store model can’t work without the browser based internet.

And finally, the new kid on the block, the app store. For Flash devs, that means AIR – which is essentially Flash for app stores. If you have Flash (or even HTML) skills to burn, you can almost, just recompile your Flash app for AIR. Adobe has built this amazing tool – the best kept secret they didn’t mean to keep (don’t get me started on their PR). The sky is the start with AIR for Mobile, never mind the limit (Apollo indeed). The best part is, once you build for one app store with AIR, you can build for basically all of them, with very little additional effort.

Have a look at Machinarium. A traditionally packaged standalone desktop app, made with Flash, and distributed in a box through traditional outlets (and the specialty PC app stores, like Steam) with an online demo that runs in the desktop browser in Flash Player. Now republished for the Apple App Store with AIR and some optimizations, to run on iPad as a native app.

So where are we? Flash is alive and kicking – thriving even – despite the clueless ramblings of know-nothing media pundits and their bandwagon seeking behavior. You don’t need to listen to them, just get out there, and make cool apps/websites/games/whatever else with the same technology you’ve always used. These are exciting times.

Performance Benchmarks with AIR 2.7 for iOS

I’ve been working on this Benchmark based on Iain Lobb’s BunnyMark. Being a bit confused sometimes about what things speed things up or slow things down, I didn’t want to guess anymore, so I grabbed Iain’s code base (cause I’m lazy, and didn’t want to start from scratch), and added some tests for things I suspect are slowing things down (or speeding things up). I think this will also help shed some light on why some folks see a huge gain in AIR 2.7 CPU mode, while others do not.

Some caveats – this only tests instances of flash.display.Bitmap on the display list, at the size they are, moving the way they move. It’s on my list to add Blitting (I have some initial work on that done, thanks to Iain, but I need to add the rotation, and alpha settings to it), and I’d like to add a vector test, and maybe some extra sized Bitmaps (I’ve heard that makes a difference).

Enough! Here are some results – quality had no effect on GPU mode, so I included only one line:

Note: some are reporting they see a difference in GPU mode, but I still don’t. Update: It appears some users are confusing “Mobile Performance Tester” with BunnyMark, which explains the discrepancy. BunnyMark is not currently in any App Store, which is one key distinguishing feature. 😉

BunnyMark Results – 500 Bunnies
Alpha							✓	✓	✓
Rotation				✓	✓	✓			✓
CaB		✓	✓		✓	✓		✓
CaBM			✓			✓
iPhone 3GS – GPU
FPS	24	18	17	22	13	13	19	1	19
iPhone 3GS – CPU
FPS-L	28	21	19	9	19	5	7	5	5
FPS-M	28	21	19	4	18	3	7	5	3
FPS-H	28	21	19	3	18	2	7	5	2
FPS-B	28	21	19	3	18	2	7	5	2
iPhone 4 (Retina) – GPU
FPS	25	21	20	25	13	13	16	0.5	16
iPhone 4 (Retina) – CPU
FPS-L	32	23	20	10	21	6	8	6	7
FPS-M	32	23	20	5	20	3	8	6	3
FPS-H	32	23	19	4	20	2	8	6	2
FPS-B	32	23	19	4	20	2	8	6	2

Notes about the Benchmark:

In general, the CPU mode seems pretty consistent with the way you’d expect things to work on the desktop – the same optimizations you’d apply for the browser plugin, you’d also apply to mobile for CPU mode.
Rotation in this benchmark is not continuous – the Bunny graphics are only rotated at the edge of the stage, which is why cacheAsBitmap works to speed those up. If they were constantly updated, it would likely be much more expensive on CPU mode (probably more like rotation without CaB).
Alpha is continuous – the alpha value of each Bunny is based on the y position and is updated every frame. I would like to add a mode similar to the rotation, so see what effect CaB has on alpha transparent objects that don’t constantly change.
iPhone 4 and 3GS numbers aren’t directly comparable for practical purposes. The Bitmaps on the screen on 3GS take up much more real estate, since the 3GS screen res has 1/4 as many pixels as the iPhone 4. In a normal app, we’d probably resize things to look comparable between the two devices. I’ll try to add a mode that makes this more comparable (because I suspect we’ll find that 3GS can keep up with iPhone 4 with similar looking content).
Touching the screen seems to cost about 4 fps across the board.
I think there may be an issue with returning to rotation = 0 costing some performance in GPU mode. Still have to test that.
I’m definitely getting some variance on default speeds – basically, before any settings are messed with on some runs I get the faster numbers (the baseline numbers in the tables above). Other times it runs at default settings a couple of FPS slower (on start, or after resetting the switches). With any of the settings, everything is consistent across multiple runs.

It’d be nice to have more benchmarks for more devices, but I only have the above devices available. This should run just fine on Android, Blackberry Playbook, and iPads. If anyone wants to contribute a set of benchmarks, hit the comments. Here is the source. One of these days I’ll make another post, and try to draw some conclusions, maybe wrap the bullet points into a narrative, and edit some of this, but the tables are there, and the source code, and that’s the important stuff.

In the midst of playing with this benchmark, I found (or was pointed at) some great resources. Here are some of them:

Understanding GPU Rendering in Adobe AIR for Mobile – Ton of good info here.
Mobile Performance Tester — Now Live in App Stores – This is very similar to what I’ve done, and has some of the same kinds of tests. It shows some different results though, which I’m a bit confused about. I suspect it has something to do with using larger images, which have more alpha transparency (smoother edges). I intend to look at it more closely to see what’s up in there.
Original BunnyMark by Iain Lobb
Source for BunnyMark for AIR Mobile

Here is the Benchmark to see it in action: