He puts the abstract-summary at the end.
- It’s slower than native code by about 5
- It’s comparable to IE8
- It’s slower than x86 C/C++ by about 50
- It’s slower than server-side Java/Ruby/Python/C# by a factor of about 10 if your program fits in 35MB, and it degrades exponentially from there
- The most viable path for it to get faster is by pushing the hardware to desktop-level performance. This might be viable long-term, but it’s looking like a pretty long wait.
- The language itself doesn’t seem to be getting faster these days, and people who are working on it are saying that with the current language and APIs, it will never be as fast as native code
- Garbage collection is exponentially bad in a memory-constrained environment. It is way, way worse than it is in desktop-class or server-class environments.
- Every competent mobile developer, whether they use a GCed environment or not, spends a great deal of time thinking about the memory performance of the target device
- If they did change their minds and allowed developers to think about memory, experience suggests this is a technically hard problem.
The genre of “Which is better: native or HTML5″ “Who will win?”
Three criticisms about benchmarks
- Whether JIT is appreciably slower where it matters (benchmarks do not matter).
- JIT gets better every day, native does not; oOne day soon, JIT will be “faster than native.”
- Python, PHP, Ruby (fully-interpreted code) is already fast enough for ultra-high scale, this is single-user, so what’s the point?
Performance Baseline & Benchmarks
Performance Evolution and Possibilities
Language Tradeoffs: Native vs Managed
Managed languages optimize for developer productivity with JIT thrown in to recover some of the drain. Native languages don’t have that overhead. Even the proponents admit this. In archaeological order, not article order:
On Garbage Collection contra Explicit Memory Management
Hertz, Berger; Quantifying the Performance of Garbage Collection vs Explicit Memory Management
Claim: Garbage Collectors need 6x (4x) more memory than “is necessary” in order to be efficient enough for real-time UX-type applications. See the chart where the relative memory footprint approaches 1x; consider that 1.5x to 2x is “acceptable performance degadation.”
How Much Memory is Available on iOS?
- iOSMemoryBudgetTest by Jan Ilavsky
- Observed limits in the field, on his gear
- iPhone 4S
- warn => 40MB (around)
- killed => 213MB (around)
- iPad 3
- warned => 400MB (around)
- killed => 550MB (around)
- Walk the scenarios against the limits
- iPhone 4S photos are 3264×2448 => 30MB/photo
- iPhone warning at “2 photos”
- iPhone kills at “7 photos.”
- iPad 3 video
- display is “between 2K and 4K”
- video frames => 12MB
- 45 frames uncompressed video animation => warn limit
- 1.5 sec @ 30 Hz (fps)
- 0.75 sec @ 60 Hz (fps)
- 1 sec of buffered video => killed
- Detecting the AirPlay Latency; On StackOverflow; 2012-04-03.
- But 2 sec is the observed Airplay latency
- Multiple copies of the same photo in memory
Citing also the slide from Session 242, iOS App Performance – Memory, 2012
- The camera screen that shows you what the camera sees,
- the photo that the camera actually took,
- the buffer that you’re trying to fill with compressed JPEG data to write to disk,
- the version of the photo that you’re preparing for display in the next screen
- the version of the photo that you’re uploading to some server,
- the buffer that is going to hold a smaller photo suitable for display in the next screen,
- the buffer that resizes the photo in the background because it is too slow to do it in the foreground.</quote>
- Multiple copies of the same video frame in memory
Citing also Technical Q&A QA1708 Improving Image Drawing Performance on iOS
- Q: What can I do to improve my image drawing performance (
- “Every UIView is backed with a CALayer and images as layer contents remain in memory as long as the CALayer stays in the hierarchy.”
- Compare the iPad 3 display with a pure display
(though these are larger, brighter, faster, etc.)
Packaging of ARM Technology
Addressing the need/ability to add more memory to ARM PoP in order to make garbage collection performant; i.e. can one get 6x more memory on some future hypothetical ARM PoP in order to make GC be performant enough to use?
In archaeological order
- Ask the Chromium developers in a Google Group
- ECMAScript specification contain the word “allocation”, the only reference to “memory” essentially says that the entire subject is “host-defined.”
- EMCA 6 wiki has several pages of draft proposal
- RubyMotion does this
- Called RM-3
- It “doesn’t work”
- Other Alternatives
- ASM.js, Mozilla
- Dart, Chrome
- PNaCl, Chrome
- WebKit, Apple
- Trident (IE), Microsoft
- Native (C, Objective-C, C++)
- ICC (Intel-closed-secret-proprietary)
- There is only One. True. Compiler. here, right?
- V8 of Google
- Nitro JS
- A simpler language with a simpler interpreter, via Brendan Eich
- Period Pieces
- Internet Explorer 8 (veeerrrryyyy sllllooowwwww)
- Firefox 3.0.3, when Firefox becomes “fast”
- Firefox 19 (Firefox 22), current
- Chome 8, when Chrome became “fast”
Apple Developer Documentation
- Andreas Gal’s dissertation
Pithy, trenchant, money (quote), etc.
Unless otherwise stated, from: Why mobile web apps are slow; In His Blog; 2013-07-09.
- <quote>The ground truth is that in a memory constrained environment garbage collection performance degrades exponentially. If you write Python or Ruby or JS that runs on desktop computers, it’s possible that your entire experience is in the right hand of the chart, and you can go your whole life without ever experiencing a slow garbage collector. Spend some time on the left side of the chart and see what the rest of us deal with.</quote>
- <quote>With garbage collection, the winning move is not to play. A weaker form of this “the winning move is not to play” philosophy is embedded in the official Android documentation:
Object creation is never free. A generational garbage collector with per-thread allocation pools for temporary objects can make allocation cheaper, but allocating memory is always more expensive than not allocating memory. As you allocate more objects in your app, you will force a periodic garbage collection, creating little “hiccups” in the user experience. The concurrent garbage collector introduced in Android 2.3 helps, but unnecessary work should always be avoided. Thus, you should avoid creating object instances you don’t need to… Generally speaking, avoid creating short-term temporary objects if you can. Fewer objects created mean less-frequent garbage collection, which has a direct impact on user experience.
- <quote>I can give you three frames of reference that are both useful and approximately correct.
- If you are a web developer, think about the iPhone 4S Nitro as IE8, as it benchmarks in the same class. That gets you in the correct frame of mind to write code for it. JS should be used very sparingly, or you will face numerous platform-specific hacks to make it perform. Some apps will just not be cost-effective to write for it, even though it’s a popular browser.
- If you are a Java, Ruby, Python, C# developer, think about iPhone 4S web development in the following way. It’s a computer that runs 10x slower than you expect (since ARM) and performance degrades exponentially if your memory usage goes above 35MB at any point, because that is how garbage collectors behave on the platform. Also, you get killed if at any point you allocate 213MB. And nobody will give you any information about this at runtime “by design”. Oh, and people keep asking you to write high-memory photo-processing and video applications in this environment.
- <quote>The desktop market is shrinking year-on-year. Computers are going to be what the hardcore professionals use–Photoshop and Visual Studio will always stick around–but mere mortals who spend all day in Excel or Outlook or Powerpoint are going to migrate to ARM tablets. (Maybe even ARM notebooks.) Some of us like desktop computers for ideological reasons, or like x86 on the technical merits, or whatever. But the truth on the ground is that ARM is rising and x86 is falling, like it or not. Even if we throw out all the smartphones and tablets, you have reasonable research firms projecting things like a 60-40 ARM-Intel netbook split for 2013. And once you throw the tablets and smartphones back in, well, let’s just say that more ARM chips were fabbed last yearthan all the x86 chips ever made. The sky is falling. The building is on fire.Whenever you make a platform decision, you’re making a bet. If you’re writing a web app, you’re essentially betting either 1) that ARM doesn’t matter, 2) that ARM customers will just suck it up and use your slow product, 3) that the web browser guys will wave a wand and make it faster, or 4) that the WiFi guys will fix the speed of light so that everybody has a zero-latency always-on connection to an x86 chip. Unless you’re writing Photoshop, or writing an app with two buttons, I think you’re nuts.</quote>
From: Mobile web apps are slow; In His Blog; 2013-05-06.
Humorous, ironic or off-the-cuff