It began with Doom 2016 - a Switch port so ambitious, it simply didn't seem possible. However, since then, a procession of technologically ambitious current-gen console titles have migrated onto the Nintendo console hybrid, culminating in the arrival of the wonderful Metro Redux from 4A Games - highly impressive conversions and perhaps the closest, most authentic first-person shooter ports we've seen. So what's the secret? How do developers manage to achieve such impressive results from five-year-old Nvidia mobile hardware?
"At first, I did have really big concerns performance-wise," admits 4A's chief technical officer, Oles Shishkovstov. "You know, going from base PS4/Xbox One with approximately six and a half or seven CPU cores running at 1.6 GHz to 1.75GHz down to only three cores at 1.0GHz sounds scary. The GPU was fine, as graphics can be scaled up and down much easier than, for example, game simulation code."
The results of the conversion work are certainly impressive bearing in mind the yawning gap in CPU specs. 4A started out by translating over the existing Metro Redux games from PS4 and Xbox One (and to stress the point, Switch doesn't get last-gen ports here), a process the 4A team carried out very quickly, but this early version of the game could only manage frame-rates of around seven to 15 frames per second. The games were entirely CPU-bound.
Halving the target frame-rate from the PS4 and Xbox One's 60fps down to 30fps was required before the task of optimising systems began. "First, we backported some optimisations from Exodus to the Redux codebase," Shishkovstov explains. "Then we focused on animation processing on the high level and on extracting ILP (instruction-level parallelism) out of the A57 on the low level - down to assembly. The low level optimizations alone got us to an unstable 30Hz when we were not GPU bound. Then the bone LODding arrived - the CPU [issue] was 'solved' even with some headroom necessary for stable framerate."

Explained like that, 4A's solution to the Switch's CPU limitation seems fairly straightforward but the process of coding at the assembly level - literally the native language of the Switch ARM Cortex-A57 CPU cluster - can't have been a walk in the park. Animation sucks up a lot of processor cycles, so the idea of adding level of detail (LOD) transitions to the system makes a lot of sense.
After this, 4A moved on to GPU optimisations, and it all began with the choice of graphics API. The firm has a long history of supporting the most performant, low-level APIs, with Metro Exodus running on DX11, DX12, Vulkan and GNM across its various multi-platform releases. Switch itself supports OpenGL and Vulkan, but for optimal performance, 4A chose the API developed by Nvidia itself for best performance on Switch.
"NVN is is lowest possible graphics API on NX," explains Shishkovstov. "CPU overhead is negligible, in most cases that's just a few DWORDs written to the GPU command buffer. It is well-designed, clean and exposes everything the hardware is capable of. Much better than Vulkan, for example."
And it's here where we're especially interested in how Switch delivers so much from so little. When the Nintendo hardware was first announced, our only experience of the Tegra X1 processor came from the Shield Android TV, where last-gen console conversions typically under-performed. It seems that NVN really makes a key difference here, with 4A suggesting that it gives direct access to the Nvidia Maxwell architecture. So what Maxwell features are used in Metro Redux?
"I am not sure I can talk that about, but we use all of them it seems," explains Shishkovstov. "Much of our GPU optimisations were focused on reducing memory bandwidth/off-chip traffic. For example, NVN exposes a lot of controls for memory compression, tile cache behavior and binning, memory layout and aliasing. For example, the straight immediate mode rendering is only used during g-buffer creation and shadow map rendering. Every other pass, including forward rendering and deferred lighting uses binning rasteriser with different settings for tile cache."
In common with a lot of games of this generation, Metro Redux also sees the developer make the jump to using temporal super-sampling - or temporal super resolution, as 4A calls it. The idea is very straightforward. Traditional super-sampling is the process of rendering at a higher-than-native resolution, before downsampling to the developer's chosen pixel-count. TSR is the same basic idea, except additional detail is gleaned from past frames instead. The technology is being used extensively in improving smartphone camera quality, but outside of games, there are other uses too.

Switch Docked
"That's a well-known FBI solution for reading car plate numbers from the space satellites," says Oles Shishkovstov. "The problem is it is extremely texture sampling and math heavy for the Switch's GPU. We have to derive something which is much cheaper and without major quality compromises. It wasn't easy. I spent more than a month on that - it seems like Maxwell GPU ISA is my native language now.
"The end result takes approximately 2ms at 1080p with only nine texture samples and tricky math. It also does anti-aliasing as a byproduct. When pushed way to hard (it happens in 1080p) the algorithm still produces pixel perfect edges and sharp texture details and only AA quality somewhat degrades - but that is barely visible even for the trained eye."
Using temporal super resolution, Shishkovstov reckons that the concept of native resolution rendering as we know it isn't particularly relevant, which raises some interesting questions. Look back at our analysis and you'll see that we were able to pull a few pixel counts from individual frames. However, it's games like this, Modern Warfare 2019 and many others that are making us consider new techniques of getting some kind of measure on image quality. Redux on Switch doesn't look as clean as the PS4 version, but if we pull a like-for-like image of Metro from the locked 720p of the last-gen versions, image quality is on another level.
Whether you're docked or running in handheld mode, the accumulated output is 1080p or 720p respectively, but the clarity of the image does adjust, according to content. In terms of overall clarity, the technique chosen does look especially impressive when played portably, which raises the question of how 4A scaled the game across docked and handheld modes.
"Going docked you get 2x faster-clocked GPU but only moderately more bandwidth, so it is not magically 2x faster at all, but still considerably faster," explains Shishkovstov. "That allowed us, for example, to render per-pixel velocities for more objects resulting in slightly more correct TSR and AA. In handheld mode we only draw velocity for HUD/weapon - that's all we can afford.
"Also, Redux content was lacking geometry LODs for a lot of meshes. As the art team was busy with Exodus' (huge) DLCs - we programmatically generated missing ones. Both docked and handheld use original PS4/X1 geometry, but handheld uses more aggressive LOD switching, although it is barely noticeable on a small screen. From the user/gamer point of view, handheld is always 720p, docked is always 1080p, otherwise they are the same."
What's also impressive about the Metro Redux port is its sheer consistency in maintaining its target 30fps frame-rate. It's an important point to make because whether we're talking about the id Tech 6 conversions, The Witcher 3, Warframe or most of the other 'impossible ports' to the Switch, it's rare that you find a consistent performance level.
"I am glad we hit a consistent 30fps," shares Shishkovstov. "The only way to hit close to 60 would be to run two render-frames per one simulation frame, at radically reduced quality and inconsistent input lag. That's not the price I want to pay. Running at 30fps allowed no quality compromises - even the material and lighting shaders are exactly the same as PS4 and Xbox One."
As for how the game runs so doggedly at 30fps, 4A puts it down to over-optimisation. "Even without any TSR, the game keeps producing consistent 30fps at 720p in handheld mode in over 99 per cent of frames across the whole game. TSR is more [useful] for 1080p/docked mode."
With continued rumours of improved Switch hardware in development, I thought it would be interesting to see where Nintendo and Nvidia might choose to innovate. After all, a lot of the success of PlayStation 4's design comes from Sony shifting focus and taking onboard developer feedback.
"Since we are generally CPU bound, additional cores would definitely be on the list. Bandwidth and GPU power never hurts either," offers Shishkovstov. Putting CPU power at the forefront may sound surprising, but graphics scale much more easily than the core game code - and in our Switch overclocking tests, ramping up CPU frequency proved more impactful on many games than up-clocking the graphics core.
And while we're on the subject of new hardware, what about the next-gen consoles from Sony and Microsoft? Developers are under NDA, so can't talk about the technical specifics of the hardware. However, key aspects of the new machines are public knowledge - such as the fact that both PS5 and Xbox Series X feature hardware accelerated support in the GPU for real-time ray tracing.
"We are fully into ray tracing, dropping old-school codepath/techniques completely," reveals Shishkovstov - and in terms of how RT has evolved since Metro Exodus? "Internally we experimented a lot, and with spectacular results so far. You will need to wait to see what we implement into our future projects."