Multispectral Demosaicing via Dual Cameras


1AI Center-Toronto, Samsung Electronics    2York University
*Equal Contribution
ICCV 2025 [Highlight]

Abstract

Multispectral (MS) images capture detailed scene information across a wide range of spectral bands, making them invaluable for applications requiring rich spectral data. Integrating MS imaging into multi-camera devices, such as smartphones, has the potential to enhance both spectral applications and RGB image quality. A critical step in processing MS data is demosaicing, which reconstructs color information from the mosaic MS images captured by the camera. This paper proposes a method for MS image demosaicing specifically designed for dual-camera setups where both RGB and MS cameras capture the same scene. Our approach leverages co-captured RGB images, which typically have higher spatial fidelity, to guide the demosaicing of lower-fidelity MS images. We introduce the Dual-camera RGB-MS Dataset — a large collection of paired RGB and MS mosaiced images with ground-truth demosaiced outputs — that enables training and evaluation of our method. Experimental results demonstrate that our method achieves state-of-the-art accuracy compared to existing techniques.



Material


Releases (Models & Dataset)

This release packages everything needed to reproduce and extend our RGB-guided MS demosaicing work. It includes a checkpoints/ directory with pretrained weights: checkpoints/base_models/ (backbone weights) and checkpoints/models/ (task-specific RGB-guided MS demosaicing checkpoints), and the dataset in NPY format under npys/ as four zip archives: cropped_ms_npys.zip, cropped_rgb_npys.zip, ms_npys.zip, and rgb_npys.zip. The dataset pairs co-captured RGB frames with MS mosaics and provides high-fidelity ground-truth MS images, enabling rigorous training and evaluation of RGB-guided MS demosaicing methods. Please refer to the steps on the Code page to set up the experiments.



Citation

If you use our dataset or code, please cite:

@inproceedings{TedlaLee2025Multispectral,
  title={{Multispectral Demosaicing via Dual Cameras}},
  author={{Tedla, SaiKiran and Lee, Junyong and Yang, Beixuan and Afifi, Mahmoud and Brown, Michael S}},
  booktitle={{Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}},
  year={{2025}}
}

Results Visualization

We visualize two asymmetric dual-camera scenarios: (1) different CFA patterns at equal resolution, and (2) different sensor resolutions (low-res MS guided by high-res RGB). Each scenario compares our method with baselines and ground truth.

Equal Resolution
Scene Selection

NAFNet baseline — S1 NAFNet + Ours — S1
Restormer baseline — S1 Restormer + Ours — S1
Ground Truth — S1

Asymmetric Resolution
Scene Selection

NAFSR baseline — S1 NAFSR + Ours — S1
Restormer baseline — S1 Restormer + Ours — S1
Ground Truth — S1


Dual-Camera MS-RGB Capture Setup

We built a controlled MS-RGB rig that emulates two synchronized cameras with a fixed 1 cm baseline: a Sony Alpha 1 is shuttled between MS and RGB positions on an Arduino-driven linear stage, while a Telelumen OctaLight box provides the only illumination in a darkroom. RGB frames are captured under D65. MS data are simulated by re-illuminating the scene with seven narrow-band spectra and combining captures, yielding 21 responses (7x3 RGB). We then keep the 16 most informative channels (discarding five low-signal/redundant ones) to form the 16-channel MS representation used in our experiments.

Camera on a linear stage facing a scene inside an illumination box; labels mark pixel-shift camera, illumination box, and linear stage.
Capture setup. Linear stage + programmable illumination emulate a fixed-baseline MS-RGB dual camera.
SPDs with RGB CFA producing 21 channel responses; grid of per-channel images with the 16 selected channels highlighted.
Channel formation. Seven illuminations x RGB yield 21 responses; 16 are selected to simulate a 4x4 MSFA (highlighted by red boxes).

Dataset Visualization

Explore sample scenes that include both multispectral (MS) and RGB data. The left panels cycle through seven LED illuminations plus D65 for both camera positions (with a baseline of 1 cm between view 1 and view 2); the middle panels loop through channels (16 MS and 3 RGB); and the right panels display the mosaics.

Scene Selection
MS sRGB visualization
Illumination View 1 Violet
MS channel frame
MS channels 1 / 16
MS mosaic
MS mosaic
RGB sRGB visualization
Illumination View 2 Violet
RGB channel frame
RGB channels 1 / 3
RGB mosaic
RGB mosaic

Dataset Overview

We set up 28 scenes featuring colorful, high-frequency materials. Each scene includes 15-30 different arrangements or camera positions (objects moved, rearranged, added/removed), yielding 490+ distinct views in total. Below we visualize one representative image per scene.

Scene 1 sample
1 / 28