Implementing the Horn and Schunck optical flow algorithm

GPU implementation of Horn and Schunck’s variational method for estimating optical flow.

Background

The Horn and Schunck variational method for computing optical flow is one of the seminal works in the field. It introduces the idea of using a global smoothness constrain on the estimated optical flow. This constrain helps the numerical solution to find a good flow estimate even in image regions with poor texture.

Let $\mathbf{E}(x, y, t)$ be the image brightness at point $(x, y)$ and time $t$. Considering the constant brightness assumption, where the change in brightness is zero, that is,

$$ \frac{d \mathbf{E}}{d t} = 0 $$

Taking the partial derivatives over $(x, y, t)$, one has:

$$ \frac{\partial \mathbf{E}}{\partial x} \frac{\partial x}{\partial t} + \frac{\partial \mathbf{E}}{\partial y} \frac{\partial y}{\partial t} + \frac{\partial \mathbf{E}}{\partial t} = 0 $$

For convenience, let:

$$ \begin{align*} \mathbf{E}_x &= \frac{\partial \mathbf{E}}{\partial x} \\ \mathbf{E}_y &= \frac{\partial \mathbf{E}}{\partial y} \\ \mathbf{E}_t &= \frac{\partial \mathbf{E}}{\partial t} \end{align*} $$

be the image gradient in the $x$ and $y$ directions, and the partial derivative in time, respectively, and

$$ \begin{align} u &= \frac{\partial x}{\partial t} \\ v &= \frac{\partial y}{\partial t} \end{align} $$

be the $x$ and $y$ components of the optical flow, respectively. The constant brightness equation is then

$$ \mathbf{E}_x u + \mathbf{E}_y v + \mathbf{E}_t = 0 $$

which is the basis for the differential methods for computing optical flow (e.g. Lukas-Kanade).

Minimization

Differential methods for estimating optical flow try to minimize the cost function

$$ \epsilon_b = \mathbf{E}_x u + \mathbf{E}_y v + \mathbf{E}_t $$

that is, to try to find values $(u, v)$ of the optical flow such that the constant brightness constrain is maintained. Notice that there is a single cost funcion and two unknowns $(u, v)$. To solve this, the Horn and Schunck algorithm adds a smoothness constrain based on the average value of the flow in a neighborhood, as

$$ \epsilon_c^2 = (\bar{u} - u )^2 + (\bar{v} - v)^2 $$

Combining both cost functions, one has

$$ \epsilon^2 = \alpha^2 \epsilon_b^2 + \epsilon_c^2 $$

From these equations, a numerical solution is derived. The reader is encouraged to go to the paper for more details. The iterative solution for $(u, v)$ is

$$ \begin{align*} u^{n+1} &= \bar{u}^n - \mathbf{E}_x \frac{\mathbf{E}_x \bar{u}^n + \mathbf{E}_y \bar{v}^n + \mathbf{E}_t}{\alpha^2 + \mathbf{E}_x^2 + \mathbf{E}_y^2} \\ v^{n+1} &= \bar{v}^n - \mathbf{E}_y \frac{\mathbf{E}_x \bar{u}^n + \mathbf{E}_y \bar{v}^n + \mathbf{E}_t}{\alpha^2 + \mathbf{E}_x^2 + \mathbf{E}_y^2} \end{align*} $$

where $(u^{n+1}, v^{n+1})$ is the estimated optical flow at iteration $n + 1$, using the estimated flow at previous iterations and image parameters computed from an image pair.

Implementation

The figure below illustrates the pipeline implementing the algorithm:

@startuml
skinparam linetype ortho

state HS as "HornSchunck" {

  state in_gray <<inputPin>>

  state ImageProcessor
  state ImageNormalize_uint_C1

  state NI_1 as "NumericIteration 1"
  state NI_2 as "NumericIteration 2"
  state NI_3 as "NumericIteration 3"
  state NI_N as "NumericIteration N"
  
  in_gray -down-> ImageProcessor
  in_gray -down-> ImageNormalize_uint_C1
  
  
  ImageNormalize_uint_C1 -down-> ImageProcessor: in_gray_old
  
  ImageProcessor -down-> NI_1: in_image_params
  ImageProcessor -down-> NI_2
  ImageProcessor -down-> NI_3
  ImageProcessor -down-> NI_N: in_image_params
  
  NI_1 -> NI_2
  NI_2 -> NI_3
  NI_3 -> NI_N: ...
  NI_N -> NI_1: in_flow, used for next image iteration
  
  NI_N -down-> out_flow <<outputPin>>
  ImageNormalize_uint_C1 -down> out_gray <<outputPin>>
}

note top of HS
Parameters
----------
alpha : float. Defaults to 0.05.
    Regularization gain.

iterations : int. Defaults to 1.
    Number of iterations run to compute the optical flow.

float_precision : int. Defaults to ll.FloatPrecision.FP32.
    Floating point precision used accross the algorithm. The outputs out_gray
    and out_flow will be of this floating point precision.
end note
@enduml

The HornSchunck is a ContainerNode that instantiates several ComputeNode implementing the algorithm. In particular, the ImageProcessor node computes image parameters from the pair of images in_gray and in_gray_old. Those parameters are transfered to the instances of NumericIteration through in_image_params, organized as follows:

  • in_image_params.x: X component of the image gradient
  • in_image_params.y: Y component of the image gradient
  • in_image_params.z: temporal derivative between in_gray and in_gray_old.
  • in_image_params.w: gain for this pixel computed from image gradient and alpha parameter.

This packaging of the image parameters is convenient as all values are packed together in a singe RGBA pixel. The floating point precision of this, and the estimated optical flow is controlled by the float_precision parameter.

The NumericIteration node takes the image parameters and a prior estimation of the optical flow, in_flow, and computes the next iteration of the flow field. The algorithm requires several iterations for the estimated flow to be of acceptable quality. In the figure above, the last iteration is denoted as NumericIteration_N and it feeds its output back as input to the first one, as well as the output of the HornSchunck node. The number of iterations is controlled by the iterations parameter.

The code block below shows how to run a simple pipeline:

@startuml
skinparam linetype ortho

state RGBA2Gray
state HS as "HornSchunck"
state Flow2RGBA

RGBA2Gray -down-> HS: in_gray
HS -down-> Flow2RGBA: in_flow
@enduml

where RGBA2Gray converts an input RGBA image to gray scale, HornSchunck computes the optical flow, and Flow2RGBA converts the optical flow to color representation.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
import lluvia as ll
import lluvia.util as ll_util
import matplotlib.pyplot as plt

# read two images as numpy arrays
frame_0 = ll_util.readRGBA('path to first image...')
frame_1 = ll_util.readRGBA('path to second image...')

# global session and memory objects
session = ll.createSession()
memory = session.createMemory(ll.MemoryPropertyFlagBits.DeviceLocal)

# this is the input of the comple pipeline
in_rgba = memory.createImageViewFromHost(frame_0)

RGBA2Gray = session.createComputeNode('lluvia/color/RGBA2Gray')
RGBA2Gray.bind('in_rgba', in_rgba)
RGBA2Gray.init()
RGBA2Gray.run() # run the node immediately in order to populate out_gray with valid values

in_gray = RGBA2Gray.getPort('out_gray')

HornSchunck = session.createContainerNode('lluvia/opticalflow/HornSchunck/HornSchunck')
HornSchunck.setParameter('alpha', ll.Parameter(0.05))
HornSchunck.setParameter('iterations', ll.Parameter(1000))
HornSchunck.setParameter('float_precision', ll.Parameter(ll.FloatPrecision.FP32.value))
HornSchunck.bind('in_gray', in_gray)

# when the node is initialized, it transfers the content of in_gray to out_gray.
HornSchunck.init()

out_gray = HornSchunck.getPort('out_gray')
out_flow = HornSchunck.getPort('out_flow')

# Convert the optical flow field to color images
flow2RGBA = session.createComputeNode('lluvia/viz/Flow2RGBA')
flow2RGBA.setParameter('max_flow', ll.Parameter(float(2)))
flow2RGBA.bind('in_flow', out_flow)
flow2RGBA.init()

out_flow_rgba = flow2RGBA.getPort('out_rgba')

duration = session.createDuration()

# Record the command buffer to run the pipeline in one go
cmdBuffer = session.createCommandBuffer()
cmdBuffer.begin()
cmdBuffer.run(RGBA2Gray)
cmdBuffer.memoryBarrier()
cmdBuffer.durationStart(duration) # start recording the duration to measure runtime
cmdBuffer.run(HornSchunck)
cmdBuffer.memoryBarrier()
cmdBuffer.durationEnd(duration)   # stop recording duration
cmdBuffer.run(flow2RGBA)
cmdBuffer.end()

# copy the content of the second frame to the in_rgba image before running the whole pipeline
in_rgba.fromHost(frame_1)

# run the pipeline
session.run(cmdBuffer)

# print runtime in milliseconds
print('{0:.02f} ms'.format(duration.nanoseconds / 1e6))

fig = plt.figure(figsize=(10, 6)); fig.set_tight_layout(True)
plt.subplot2grid((1,2), (0, 0)); plt.imshow(out_gray.toHost(), vmin=0, vmax=1, cmap='gray')
plt.subplot2grid((1,2), (0, 1)); plt.imshow(out_flow_rgba.toHost())
plt.show()

Evaluation on the Middlebury dataset

The Middlebury optical flow dataset from Baker et. al. provides several real-life and synthetic image sequences with ground truth optical flow. The figures below shows the estimated optical flow for the test sequences whose ground truth is available.

The Horn ans Schunck algorithm is not well suited for large pixel displacements. Considering this, the input images are scaled to half before entering the compute pipeline. The ground truth flow is scaled accordingly in order to be compared with the estimated flow. The Endpoint Error measures the different in magnitude between the ground truth and the estimation, it is computed as:

$$ EE = \sqrt{(u - u_\text{gt})^2 + (v - v_\text{gt})^2} $$

The algorithm is configured as follows:

  • alpha: 15.0/255
  • iterations: 2000
  • float_precision: FP32

In general, the estimated optical flow yields acceptable results in image regions with small displacements (e.g. Dimetrodon, Grove2, Hydrangea, and RubberWhale). In image regions with large displacements, the method is not able to compute a good results, as can be visualized in the Urban2 and Urban3 sequences.

The results reported in this post were run on a Razer Blade 2021 Laptop equipped with an Nvidia RTX 3070 GPU. The runtime is reported in the title of each figure, and is in the order of 20 milliseconds for most of the image sequences. Section runtime performance evaluates the performance of the algorithm on different devices, resolutions, and floating point precisions.

Runtime performance

For the runtime analysis of the algorithm, two GPU devices were used:

  • A Nvidia GTX 1080 Desktop GPU.
  • A Nvidia RTX 3070 Laptop GPU running on a Razer Blade 2021.

The Horn and Schunck pipeline is configured using the same number of iterations used for the Middlebury evalatuon, that is, iterations = 2000. The pipeline is configured for 5 different image resolutions (VGA 640x480, HD 1280x720, HD 1920x1080, WQHD 2560x1440, UHD 3840x2160). For each resolution, the pipeline is run both using FP16 and FP32 floating point precision. The table and figure below show the runtime performance for each configuration.

ResolutionFloat precisionDeviceRuntime median (ms)
VGA 640x480FP16GTX 108068.8196
RTX 307039.4354
FP32GTX 108097.5005
RTX 307063.6458
HD 1280x720FP16GTX 1080193.977
RTX 3070115.626
FP32GTX 1080279.538
RTX 3070175.635
HD 1920x1080FP16GTX 1080429.256
RTX 3070257.624
FP32GTX 1080623.555
RTX 3070386.718
WQHD 2560x1440FP16GTX 1080757.101
RTX 3070449.536
FP32GTX 10801099.35
RTX 3070682.558
UHD 3840x2160FP16GTX 10801694.16
RTX 30701010.16
FP32GTX 10802453.45
RTX 30701551.34

It is not surprising that the RTX 3070 GPU is faster than the GTX 1080, as the former is of a newer generation than the latter.

Discussion

This post presented a GPU implementation of the Horn and Schunck optical flow algorithm. Evaluation in the Middlebury test sequences show the validity of the implementation. A runtime performance analysis was conducted on two GPUs using several image resolutions and floatin point precisions.

Future work includes:

  • Implementing a pyramidal scheme, for instance that of Llopis et. al., to improve the accuracy of the algorithm in presence of large displacements.
  • Use the smoothness constrain and numerical scheme in the FlowFilter algorithm to improve the accuracy.

References

  • Horn, Berthold KP, and Brian G. Schunck. “Determining optical flow.” Artificial intelligence 17.1-3 (1981): 185-203. Google Scholar.
  • Baker, S., Scharstein, D., Lewis, J.P., Roth, S., Black, M.J. and Szeliski, R., 2011. A database and evaluation methodology for optical flow. International journal of computer vision, 92(1), pp.1-31. Google Scholar.
  • Meinhardt-Llopis, E. and Sánchez, J., 2013. Horn-schunck optical flow with a multi-scale strategy. Image Processing on line. Google Scholar
  • Adarve, Juan David, and Robert Mahony. “A filter formulation for computing real time optical flow.” IEEE Robotics and Automation Letters 1.2 (2016): 1192-1199. Google Scholar