It is still a great challenge in the Pose Transfer task to generate visually coherent images, to preserve the texture of clothes, to maintain the source identity and to realistically generate key human features such as the face or the hands. To tackle these challenges, we first conduct a study to obtain the most robust conditioning labels for this task and the baseline method [??] that we choose. We then improve upon the baseline by including deep source features from an Auto-encoder through an Attention mechanism. Finally we add region discriminators that are focused on key human features, thus obtaining results competitive with the state-of-the-art.
CVPRW 2021
Perceptual Image Quality Assessment with Transformers
Manri Cheon, Sung-Jun Yoon, Byungyeon Kang, and Junwoo Lee
IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), Jun. 2021
In this paper, we propose an image quality transformer (IQT) that successfully applies a transformer architecture to a perceptual full-reference image quality assessment (IQA) task. Perceptual representation becomes more important in image quality assessment. In this context, we extract the perceptual feature representations from each of input images using a convolutional neural network (CNN) backbone. The extracted feature maps are fed into the transformer encoder and decoder in order to compare a reference and distorted images. Following an approach of the transformer-based vision models, we use extra learnable quality embedding and position embedding. The output of the transformer is passed to a prediction head in order to predict a final quality score. The experimental results show that our proposed model has an outstanding performance for the standard IQA datasets. For a large-scale IQA dataset containing output images of generative model, our model also shows the promising results. The proposed IQT was ranked first among 13 participants in the NTIRE 2021 perceptual image quality assessment challenge. Our work will be an opportunity to further expand the approach for the perceptual IQA task.
CVPRW 2021
NTIRE 2021 Challenge on Perceptual Image Quality Assessment
Manri Cheon, Sungjun Yoon, Byungyeon Kang, and Junwoo Lee
IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), Jun. 2021
This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021. As a new type of image processing technology, perceptual image processing algorithms based on Generative Adversarial Networks (GAN) have produced images with more realistic textures. These output images have completely different characteristics from traditional distortions, thus pose a new challenge for IQA methods to evaluate their visual quality. In comparison with previous IQA challenges, the training and testing datasets in this challenge include the outputs of perceptual image processing algorithms and the corresponding subjective scores. Thus they can be used to develop and evaluate IQA methods on GAN-based distortions. The challenge has 270 registered participants in total. In the final testing stage, 13 participating teams submitted their models and fact sheets. Almost all of them have achieved much better results than existing IQA methods, while the winning method can demonstrate state-of-the-art performance.
2020
CVPRW 2020
NTIRE 2020 Challenge on Image Demoireing: Methods and Results
Manri Cheon, Sung-Jun Yoon, Byungyeon Kang, and Junwoo Lee
IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), Jun. 2020
This paper reviews the Challenge on Image Demoireing that was part of the New Trends in Image Restoration and Enhancement (NTIRE) workshop, held in conjunction with CVPR 2020. Demoireing is a difficult task of removing moire patterns from an image to reveal an underlying clean image. The challenge was divided into two tracks. Track 1 targeted the single image demoireing problem, which seeks to remove moire patterns from a single image. Track 2 focused on the burst demoireing problem, where a set of degraded moire images of the same scene were provided as input, with the goal of producing a single demoired image as output. The methods were ranked in terms of their fidelity, measured using the peak signal-to-noise ratio (PSNR) between the ground truth clean images and the restored images produced by the participants’ methods. The tracks had 142 and 99 registered participants, respectively, with a total of 14 and 6 submissions in the final testing stage. The entries span the current state-of-the-art in image and burst image demoireing problems.
2019
KSII TIIS
Weighted DCT-IF for Image up Scaling
Jae-Yung Lee, Sung-Jun Yoon, Jae-Gon Kim, and Jong-Ki Han
KSII Transactions on Internet and Information Systems Feb. 2019
The design of an efficient scaler to enhance the edge data is one of the most important issues in video signal applications, because the perceptual quality of the processed image is sensitively affected by the degradation of edge data. Various conventional scaling schemes have been proposed to enhance the edge data. In this paper, we propose an efficient scaling algorithm for this purpose. The proposed method is based on the discrete cosine transform-based interpolation filter (DCT-IF) because it outperforms other scaling algorithms in various configurations. The proposed DCT-IF incorporates weighting parameters that are optimized for training data. Simulation results show that the quality of the resized image produced by the proposed DCT-IF is much higher than that of those produced by the conventional schemes, although the proposed DCT-IF is more complex than other conventional scaling algorithms.
2018
IEEE TIP
Hierarchical Extended Bilateral Motion Estimation based Frame Rate Up-Conversion using Learning based Linear Mapping
We present a novel and effective learning-based frame rate upconversion (FRUC) scheme, using linear mapping. The proposed learning-based FRUC scheme consists of: 1) a new hierarchical extended bilateral motion estimation (HEBME) method; 2) a light-weight motion deblur (LWMD) method; and 3) a synthesis-based motion-compensated frame interpolation (S-MCFI) method. First, the HEBME method considerably enhances the accuracy of the motion estimation (ME), which can lead to a significant improvement of the FRUC performance. The proposed HEBME method consists of two ME pyramids with a three-layered hierarchy, where the motion vectors (MVs) are searched in a coarse-to-fine manner via each pyramid. The found MVs are further refined in an enhanced resolution of four times by jointly combining the MVs from the two pyramids. The HEBME method employs a new elaborate matching criterion for precise ME which effectively combines a bilateral absolute difference, an edge variance, pixel variances, and an MV difference among two consecutive blocks and its neighboring blocks. Second, the LWMD method uses the MVs found by the HEBME method and removes the small motion blurs in original frames via transformations by linear mapping. Third, the S-MCFI method finally generates interpolated frames by applying linear mapping kernels for the deblurred original frames. In consequence, our FRUC scheme is capable of precisely generating interpolated frames based on the HEBME for accurate ME, the S-MCFI for elaborate frame interpolation, and the LWMD for contrast enhancement. The experimental results show that our FRUC significantly outperforms the state-of-the-art non-deep learning-based schemes with an average of 1.42 dB higher in the peak signal-to-noise-ratio and shows comparable performance with the state-of-the-art deep learning-based scheme.
* Sung-Jun Yoon and Hyun-Ho Kim contribute equally to the work.
MASTER THESIS
A Study on Learning-based Approaches to Video Frame Interpolation using Linear Mapping Kernels and CNN-based Nonlinear Mapping Kernels
Frame rate up-conversion, also called video frame interpolation (VFI), is a low-level computer vision problem for generating one ore more intermediate frames between two original consecutive frames in videos. The FRUC problem has been solved for several decades by heuristic approaches, and deep-learning based FRUC has recently been studied. We propose two approaches to FRUC: (i) a learning-based direct linear mapping approach; and (ii) a kernel-based approach using a hierarchical deep convolutional neural network (CNN). We present a novel and effective learning-based FRUC scheme, using linear mapping. The proposed learning-based FRUC scheme consists of (i) a novel hierarchical extended bilateral motion estimation (HEBME) method and (ii) a synthesis-based motion-compensated frame interpolation (S-MCFI) method. We also present a kernel-based FRUC scheme based on a convolution neural network (CNN) where two sets of horizontal and vertical kernels are learned for two consecutive input frames by the proposed hierarchical CNN. The proposed learning-based FRUC scheme consists of (i) kernel estimation and (ii) shift-able local convolution for interpolating intermediate pixels. The shift-able‘ local convolution can yield the estimated kernels that can cover large regions that are often out of the ranges in conventional kernel-based approaches. The experimental results show that our linear mapping-based FRUC significantly outperforms the state-of-the-art schemes which are based on heuristic approaches with average 1.50 dB higher in PSNR and our hierarchical CNN-based FRUC outperforms the state-of-the-art schemes including the latest deep-learning-based FRUC scheme. Specifically, the hierarchical CNN-based FRUC scheme with our proposed shift-able local convolution can interpolate an intermediate frame with high-quality when objects in the original frames have fast motions.
KCVS
A Study on Hierarhical CNN based Frame Rate Up-Conversion
In this paper, we propose a frame rate enhancement method using hierarchical convolution neural network. The convolutional neural network is constructed hierarchically, and the convolution operation is performed at the optimal position adaptively, so that the performance is robust against the interpolation of the fast moving object. As a result, compared with the existing ICCV 2017 "Video Frame Interpolation via Adaptive Separable Convolution" algorithm, the improvement was 0.41dB in terms of peak signal-to-noise ratio (PSNR).