Google is known to come up with the most amazing features that are known to change the outlook of the world of technology. Yet again, Google’s Research team has introduced two new approaches which use machine learning to enhance images. The new models are ‘SR3 – Image Super-Resolution’ and ‘CDM – Class-Conditional ImageNet Generation’. Super resolution can be used to restore old family portraits and to improve medical imaging systems.
With SR3 and CDM, Google says it has “pushed the performance of diffusion models to state-of-the-art on super-resolution and class-conditional ImageNet generation benchmarks.” Researchers from Google Research’s Brain Team have published a post on Google’s AI blog, detailing both SR3 and CDM diffusion models.
Today we present connected approaches that push the limits of high-fidelity image synthesis through use of a pipeline of multiple diffusion models that perform progressive iterative refinement and super-resolution. Learn more here: https://t.co/V28qyOc4ky pic.twitter.com/nUgq9ngcYv
— Google AI (@GoogleAI) July 16, 2021
SR3: Image Super-Resolution
Google notes that SR3 is a super-resolution diffusion model that takes as input a low-resolution image, and builds a corresponding high-resolution image from pure noise. The model is trained on an image corruption process in which noise is progressively added to a high-resolution image until only pure noise remains. It then learns to reverse this process, beginning from pure noise and progressively removing noise to reach a target distribution through the guidance of the input low-resolution image. According to Google, SR3 works efficiently when upscaling portraits and natural images. It showed a ‘confusion rate’ of nearly 50 per cent while existing methods only go up to 34 per cent when used to 8x upscale faces.
Google has shared a few impressive examples of how a 64×64 pixels resolution image is scaled into a 1,024×1,024 pixels resolution photo using SR3.
CDM: Class-conditional Diffusion
This model is trained on ImageNet data to generate high-resolution natural images. Since ImageNet is a difficult, high-entropy dataset, we built CDM as a cascade of multiple diffusion models. This cascade approach involves chaining together multiple generative models over several spatial resolutions: one diffusion model that generates data at a low resolution, followed by a sequence of SR3 super-resolution diffusion models that gradually increase the resolution of the generated image to the highest resolution. Using the CDM method, a low-resolution image of 64×64 can be diffused to 264×264 resolution and then further to 1024×1024.
Google will also introduce a new data augmentation technique called conditioning augmentation, that will further improve the sample quality results of CDM.
With the introduction of these models, Google is looking to improve the natural image synthesis that has wide-ranging applications but poses design challenges.