Skip to Content
DocsCourse Materials06. Optical ComputingReview of Optical Neural Networks

Review of Optical Neural Networks

One of the most influential papers on diffractive deep neural networks (D2NNs) is “All-optical machine learning using diffractive deep neural networks” [1] .

In this lecture, we reproduce the core D2NN architecture from [1]  in SVETlANNa and train it on MNIST using the same physical design principles. The goal is to understand how a conventional neural-network operation can be mapped to optical propagation and phase modulation.

Linear Diffractive Neural Network (D2NN)

Theory

From conventional neural networks to optical layers

For a classical neural network, the output of layer ll is

Xl+1=Fl(W^lXl+Bl),\vec{X}^{l+1} = F^l(\hat{W}^{l} \vec{X}^l + \vec{B}^l),

where:

  • W^l\hat{W}^l is the weight matrix,
  • Bl\vec{B}^l is the bias vector,
  • FlF^l is a nonlinear activation function.

Stacking layers gives a composition of linear transformations and nonlinearities.

In a D2NN, the trainable part is implemented by diffractive phase masks, and layer-to-layer coupling is performed by free-space propagation. At each optical layer:

  1. The complex field is multiplied by a phase mask.
  2. The modulated field propagates to the next plane.

This can be written as

Yl+1=W^l(eiϕlYl),\vec{Y}^{l+1} = \hat{W}^l * \left(e^{i\phi^l} \odot \vec{Y}^l\right),

where:

  • Yl\vec{Y}^l is the complex optical field at layer ll,
  • eiϕle^{i\phi^l} is the phase-only modulation at layer ll,
  • W^l\hat{W}^l is the free-space propagation operator,
  • * denotes convolution induced by propagation,
  • \odot denotes element-wise multiplication.

Therefore, the trainable parameters are the phase values ϕl\phi^l of each diffractive layer.

Connection between conventional neural network and D2NN

Implementation Plan

To implement the D2NN from [1] , we will follow this workflow:

  1. Define physical and simulation parameters (wavelength, layer resolution, neuron size, axial distances).
  2. Prepare MNIST and encode each image as an input optical field.
  3. Build the optical model in SVETlANNa:
    • define detector regions at the output plane (one region per class),
    • create a sequence of diffractive layers separated by free-space propagation.
  4. Define the training pipeline.
  5. Train, validate, and visualize the learned optical system.

Parameters Reported in the Original Papers

Below are the key experimental and simulation settings reported by the authors.

From [1] :

  • Task: MNIST digit classification (00 to 99).
  • Architecture: five-layer phase-only D2NND^2NN.
  • Dataset split used in training: 55,00055{,}000 training and 5,0005{,}000 validation samples.
  • Illumination frequency: 0.4THz0.4\,\text{THz} (continuous-wave).
  • Neuron size: 400μm400\,\mu\text{m}.
  • Axial spacing between successive layers: 3.0cm3.0\,\text{cm}.
  • Detector size: (6.4λ×6.4λ)(6.4\lambda \times 6.4\lambda).
  • Batch size: 88.
  • Optimizer: Adam.

Additional clarifications from [2] :

  • Neuron size expressed relative to wavelength: approximately 0.53λ0.53\lambda.
  • Layer dimensions: 200×200=40,000200 \times 200 = 40{,}000 neurons per diffractive layer.
  • Detector normalization at the output plane:
Il=Ilmax{Il}×10,I_l' = \frac{I_l}{\max\{I_l\}} \times 10,

where IlI_l is the total optical intensity measured by detector ll.

  • Optimization details: Adam with learning rate 10310^{-3}.

These values define a practical baseline for reproducing the published D2NN behavior in SVETlANNa.

Diffractive Recurrent Neural Network (D-RNN)

In this lecture, we implement the recurrent diffractive architecture proposed in [1]  for human action recognition on the Weizmann dataset .

The main objective is to show how temporal memory, usually handled by recurrent neural networks, can be realized in an optical diffractive system.

Theory

Why recurrence is needed

For sequential inputs (for example, video frames), the prediction at time tt should depend not only on the current frame xtx_t, but also on what the system has seen before. This is the core idea behind recurrent models.

In a diffractive recurrent neural network (D-RNN), the optical layers are reused across time steps. The model maintains a hidden optical state hth_t that combines:

  • memory from the previous state ht1h_{t-1},
  • information from the current input xtx_t.

The recurrence is written as

ht=λf1(ht1)+(1λ)f2(xt),h_t = \lambda f_1(h_{t-1}) + (1-\lambda)f_2(x_t),

where:

  • f1f_1 is the memory mapping,
  • f2f_2 is the input (read-in) mapping,
  • λ(0,1)\lambda \in (0,1) controls the memory-input tradeoff.

Before the detectors, a read-out mapping is applied:

y=f3(ht).y = f_3(h_t).

In our optical implementation, f1f_1, f2f_2, and f3f_3 are realized using diffractive layers and free-space propagation.

For three time steps, the computation is:

h0=f2(x0),h_0 = f_2(x_0), h1=λf1(h0)+(1λ)f2(x1),h_1 = \lambda f_1(h_0) + (1-\lambda)f_2(x_1), h2=λf1(h1)+(1λ)f2(x2),h_2 = \lambda f_1(h_1) + (1-\lambda)f_2(x_2), y=f3(h2).y = f_3(h_2).

Architecture of the D-RNN

Implementation Plan

To implement the D-RNN from [1] , we follow this workflow:

  1. Define physical and simulation parameters (wavelength, grid size, neuron size, propagation distances).
  2. Prepare the Weizmann dataset and convert videos into short frame sequences.
  3. Build the optical model in SVETlANNa:
    • define detector regions at the output plane (one region per class),
    • implement read-in, memory, and read-out diffractive mappings.
  4. Define the training pipeline for sequence-wise forward passes and loss computation.
  5. Train, validate, and visualize classification performance and optical fields.

Parameters Reported in the Original Paper

Key settings reported by the authors in [1] :

  • Input masks were extracted by background subtraction; provided mask data include both original and aligned masks.
  • Mask resolution: 64×6464 \times 64.
  • Data split: six subjects (6060 videos) for training and three subjects (3030 videos) for testing.
  • Sequence construction: each video is split into sub-sequences of three frames with a frame interval of two.
  • Number of detector regions equals number of classes:
    • ten regions for MNIST, Fashion-MNIST, and Weizmann,
    • six regions for KTH,
    • each output region has width 0.46mm0.46\,\text{mm}.
  • Recurrent state update:
ht=λf1(ht1)+(1λ)f2(xt),h_t = \lambda f_1(h_{t-1}) + (1-\lambda)f_2(x_t),

with mt=f1(ht1)m_t=f_1(h_{t-1}) as the memory mapping, it=f2(xt)i_t=f_2(x_t) as the input mapping, and λ(0,1)\lambda \in (0,1) controlling the relative contribution of memory and current input.

These settings provide a practical baseline for reproducing D-RNN experiments in SVETlANNa.

Convolutional Diffractive Network

In this lecture, we implement the Convolutional Diffractive Network introduced in “Optical Diffractive Convolutional Neural Networks Implemented in an All-Optical Way” [1] .

The key idea is to perform the convolution operation optically with a 4f system, and then feed the result into a trainable diffractive network for classification.

Theory

Convolution in the spatial domain is equivalent to multiplication in the Fourier domain:

fg=F1(F(f)F(g)),f * g = \mathcal{F}^{-1}\left(\mathcal{F}(f)\cdot\mathcal{F}(g)\right),

where F\mathcal{F} denotes the Fourier transform.

This identity is especially useful in optics because a lens can perform a Fourier transform. In a 4f optical setup:

  1. The first lens transforms the input field gg into F(g)\mathcal{F}(g).
  2. A filter in the Fourier plane applies multiplication by F(f)\mathcal{F}(f).
  3. The second lens performs the inverse transform, producing the convolution result.

So the optical system computes:

fg=F1(F(f)F(g)).f * g = \mathcal{F}^{-1}\left(\mathcal{F}(f)\cdot\mathcal{F}(g)\right).

In this lecture, the Fourier-plane diffractive mask that defines the convolution kernel is fixed. The output of the 4f convolutional stage is then propagated through a trainable diffractive deep neural network (D2NN), which learns the final classification mapping.

Architecture of the Convolutional Diffractive Network

Implementation Plan

To implement the convolutional diffractive architecture from [1] , we follow this workflow:

  1. Define physical and simulation parameters (wavelength, layer resolution, neuron size, axial distances).
  2. Prepare MNIST and encode each image as an input optical field.
  3. Build the optical model in SVETlANNa:
    • define detector regions at the output plane (one region per class),
    • define fixed convolution kernels in the Fourier plane,
    • create a 4f system to perform optical convolution,
    • add trainable diffractive layers with free-space propagation.
  4. Define the training pipeline (forward propagation, detector readout, loss, optimization).
  5. Train, validate, and visualize the learned optical system.

Parameters Used in This Lecture

Paper [1]  does not provide all low-level physical parameters needed for a full implementation. For reproducibility in SVETlANNa, we reuse compatible physical settings from [2] , as in previous lectures.

One important preprocessing detail is:

  • the input image of size 28×2828 \times 28 is zero-padded to 200×200200 \times 200.