[1]:

# -*- coding: utf-8 -*-
#  Copyright 2025 United Kingdom Research and Innovation
#
#  Licensed under the Apache License, Version 2.0 (the "License");
#  you may not use this file except in compliance with the License.
#  You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
#  Unless required by applicable law or agreed to in writing, software
#  distributed under the License is distributed on an "AS IS" BASIS,
#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#  See the License for the specific language governing permissions and
#  limitations under the License.
#
#   Authored by: Mariam Demir (STFC-UKRI)
#                Maike Meier (STFC-UKRI)

Memory and Performance Profiling CIL Algorithms: CGLS vs. LSQR#

This notebook describes a newly-added CIL algorithm, LSQR, and compares its performance and memory usage to CGLS.

The aims of this notebook are to:

Introduce the LSQR algorithm, and the LSQR Tikhonov variant
Compare the reconstructions obtained from LSQR to CGLS, and to a ground truth (FBP reconstruction)
Compare LSQR’s running times across iterations with CGLS
Benchmark and validate two memory profiling techniques, using CGLS as an example
Use the memory profiling techniques to investigate where the LSQR algorithm increases the memory
Compare the memory usage of LSQR to CGLS
Compare and investigate the memory usages of CGLS and LSQR with block operators, and LSQR Tikhonov

This notebook was tested and developed with the following package versions: | Package | Version | |———|———| | CIL (Core Imaging Library) | 24.2.0 | | NumPy | 1.26.4 | | Matplotlib | 3.10.1 | | memory-profiler | 0.61.0 | | astra-toolbox | 2.1.0 | | scipy | 1.15.2 | | ipp | 2021.12.1 | | jinja2 | 3.1.6 |

Create the environment using the command:

conda create --name cil -c conda-forge -c https://software.repos.intel.com/python/conda -c ccpi cil=24.2.0 ipp=2021.12 astra-toolbox=*=cuda* ipykernel ipywidgets scipy Jinja2

And install the following packages using pip:

pip install -U memory_profiler

[ ]:

import sys
sys.path.append("./Algorithms")
sys.path.append("./Scripts")

from LSQR import *
from LSQR_Tikhonov import *

from LSQRLP import *
from LSQRMP import *

from CGLSLP import *
from CGLSMP import *

# CIL core components needed
from cil.processors import Normaliser, TransmissionAbsorptionConverter, Padder, CentreOfRotationCorrector
from cil.framework import AcquisitionGeometry, AcquisitionData, BlockDataContainer

# CIL optimisation algorithms and linear operators
from cil.optimisation.algorithms import CGLS
from cil.optimisation.operators import BlockOperator, IdentityOperator

# CIL example synthetic test image
from cil.utilities import dataexample

# CIL display tools
from cil.utilities.display import show2D

# CIL FBP
from cil.recon import FBP

# Forward/backprojector from CIL ASTRA plugin
from cil.plugins.astra import ProjectionOperator

# Utility functions for this dataset & notebook
from Utilities import timed_iterations, mem_prof_process, line_prof_process, line_peak_process, plot_mem, print_output_subset
from IPython.display import display, HTML

# Third-party imports
import numpy as np
import scipy
import matplotlib.pyplot as plt
import os

MB = 1/(1024*1024)
dtypeSize = 4

Loading The Sandstone Dataset#

The dataset used in this notebook is a 2D Tomogram of sandstone, containing:

1500 projections
A single vertical slice
2560 horizontal pixels (which will later become padded to 3760 pixels)

From this information, we can calculate the estimated sizes of the AcquisitionData, and the ImageData (a.k.a Reconstruction Volume). These values will be important when looking at memory usage:

AcquisitionData size: (1500 * 3760 * dtypeSize) = 21.5 MB
ImageData size: (3760 * 3760 * dtypeSize) = 53.9 MB

We will download the data using the method dataexample.SANDSTONE.download_data(). Then, we pre-process and reconstruct the data using Filtered Back Projection to get a ground truth. Finally, we apply noise to the dataset sinogram, which will be used when running the algorithms.

[3]:

dataexample.SANDSTONE.download_data(data_dir='.', prompt=False)

Dataset folder already exists in .

[4]:

datapath = './sandstone'

filename = "slice_0270_data.mat"
all_data = scipy.io.loadmat(os.path.join(datapath,filename))

[5]:

sandstone = all_data['X_proj'].astype(np.float32)
sandstone.shape

[5]:

(2560, 1500)

[6]:

ag = AcquisitionGeometry.create_Parallel2D()  \
         .set_panel(num_pixels=(2560))        \
         .set_angles(angles=np.linspace(0,180,1500,endpoint=False)) \
         .set_labels(['horizontal','angle'])
print(ag.dimension_labels)

sandstone = AcquisitionData(sandstone, geometry=ag, deep_copy=False)
sandstone.reorder('astra')

show2D(sandstone)

(<AcquisitionDimension.HORIZONTAL: 'horizontal'>, <AcquisitionDimension.ANGLE: 'angle'>)

../../_images/demos_memory_profiling_sandstone_8_1.png

[6]:

<cil.utilities.display.show2D at 0x7f3d02c7dc40>

Ground Truth - FBP Reconstruction of pre-processed sandstone#

Pre-processing:#

[7]:

flats = all_data['X_flat'].astype(np.float32)
flats.shape

darks = all_data['X_dark'].astype(np.float32)
darks.shape

[7]:

(2560, 30)

[8]:

sandstone_norm = Normaliser(flat_field=flats.mean(axis=1),
                   dark_field=darks.mean(axis=1))(sandstone)

sandstone_norm = TransmissionAbsorptionConverter()(sandstone_norm)

[9]:

show2D([sandstone, sandstone_norm], title=['Sandstone', 'Sandstone Normalised'])

../../_images/demos_memory_profiling_sandstone_13_0.png

[9]:

<cil.utilities.display.show2D at 0x7f3ce5ae2d50>

[10]:

padsize = 600
sandstone_pad = Padder.edge(pad_width={'horizontal': padsize})(sandstone_norm)

[11]:

show2D(sandstone_pad)

../../_images/demos_memory_profiling_sandstone_15_0.png

[11]:

<cil.utilities.display.show2D at 0x7f3d08465310>

[12]:

sandstone_cor = CentreOfRotationCorrector.image_sharpness(backend='astra', search_range=100, tolerance=0.1)(sandstone_pad)
sandstone_cor.geometry.get_centre_of_rotation(distance_units='pixels')

[12]:

{'offset': (44.48700388605622, 'pixels'), 'angle': (0.0, 'radian')}

Obtain ground truth reconstruction:#

[13]:

ig = sandstone_norm.geometry.get_ImageGeometry()
recon = FBP(sandstone_cor, ig, backend='astra').run()

FBP recon

Input Data:
        angle: 1500
        horizontal: 3760

Reconstruction Volume:
        horizontal_y: 2560
        horizontal_x: 2560

Reconstruction Options:
        Backend: astra
        Filter: ram-lak
        Filter cut-off frequency: 1.0
        FFT order: 13
        Filter_inplace: False
        Split processing: 0

Reconstructing in 1 chunk(s):

[14]:

show2D(recon, title='Sandstone Reconstruction (Ground Truth)')

../../_images/demos_memory_profiling_sandstone_19_0.png

[14]:

<cil.utilities.display.show2D at 0x7f3cdfea6b10>

Generating Test Sinogram - Apply Noise:#

[15]:

# Incident intensity: lower counts will increase the noise
background_counts = 10000

# Convert the simulated absorption sinogram to transmission values using Lambert-Beer.
# Use sandstone_cor as mean for Poisson data generation.
# Convert back to absorption sinogram.
counts = background_counts * np.exp(-sandstone_cor.as_array())
noisy_counts = np.random.poisson(counts)
sand_noisy_data = -np.log(noisy_counts/background_counts)

# Create new AcquisitionData object with same geometry as sandstone_cor and fill with noisy data.
sandstone_noisy = sandstone_cor.geometry.allocate()
sandstone_noisy.fill(sand_noisy_data)

plots = [sandstone_cor, sandstone_noisy]
titles = ["sandstone sinogram", "sandstone sinogram noisy"]
show2D(plots, titles)

../../_images/demos_memory_profiling_sandstone_21_0.png

[15]:

<cil.utilities.display.show2D at 0x7f3cdffba030>

CGLS and LSQR without regularisation#

Before describing Tikhonov regularisation, we recall the problem solved by LSQR:

\[\underset{u}{\mathrm{argmin}}\begin{Vmatrix}A u - b\end{Vmatrix}^2_2\]

where,

\(A\) is the projection operator
\(b\) is the acquired data
\(u\) is the unknown image to be solved for, which is done iteratively by progressively minimising the objective function

In the solution provided by LSQR the low frequency components tend to converge faster than the high frequency components. This means we need to control the number of iterations carefully to select the optimal solution.

Set up the LSQR algorithm, including specifying its initial point to start from:

[16]:

sandstone_noisy.reorder('astra')
ig = sandstone_noisy.geometry.get_ImageGeometry()
ag = sandstone_noisy.geometry # ig and ag need to be same

device = "gpu"
A = ProjectionOperator(ig, ag, device)

initial = ig.allocate(0)

maxit = 100
itsAtATime = 1
N = round(maxit/itsAtATime)

xx = np.arange(0, maxit, itsAtATime)

The timed_iterations function below will be used to run the algorithms in increments, so we can store the residuals, errors and time taken per iteration:

Once set up, we can initialise and run the algorithms for the specified number of iterations:

[17]:

cgls_simple = CGLS(initial=initial, operator=A, data=sandstone_noisy)
times_cgls, rel_res_vec_cgls, rel_err_vec_cgls = timed_iterations(sandstone_noisy, A, sandstone_cor, recon, padsize, cgls_simple, N, itsAtATime)

lsqr_simple = LSQR(initial=initial, operator=A, data=sandstone_noisy)
times_lsqr, rel_res_vec_lsqr, rel_err_vec_lsqr = timed_iterations(sandstone_noisy, A, sandstone_cor, recon, padsize, lsqr_simple, N, itsAtATime)

Convergence/reduction of residuals:#

The plots below show the progress of the residuals and the relative error across each iteration.

The residual is described by the term:

\[\begin{Vmatrix}A u - b\end{Vmatrix}^2_2\]

Which describes how well the algorithm’s reconstruction matches the data. LSQR is an adaptation of the conjugate gradients method CGLS, so produces the same residuals. However, LSQR is generally more reliable in the case of numerical errors.

The (relative) error describes how well the reconstruction matches the ground truth, which is the FBP reconstruction we computed earlier.

[18]:

plt.subplot(1,2,1)
plt.semilogy(xx, rel_res_vec_cgls, 'r-', xx, rel_res_vec_lsqr, 'b.--')
plt.xlabel('Iteration')
plt.ylabel('Residual')
plt.gca().legend(('CGLS','LSQR'))

plt.subplot(1,2,2)
plt.semilogy(xx, rel_err_vec_cgls, 'r-', xx, rel_err_vec_lsqr, 'b.--')
plt.xlabel('Iteration')
plt.ylabel('Error')
plt.gca().legend(('CGLS','LSQR'))
plt.tight_layout()

../../_images/demos_memory_profiling_sandstone_31_0.png

Figure 1: (Left) This graph shows the progress of the residuals across 100 iterations. Both CGLS and LSQR converge to the same residuals, as the iterations for LSQR and CGLS are mathematically equivalent. The residuals themselves decrease (minimise) across the iterations, meaning the algorithms’ reconstructions match the data closely. (Right) This graph shows the relative error across 100 iterations. Both CGLS and LSQR display identical errors, again because the iterations are equivalent mathematically. This graph shows that the algorithms’ reconstructions match the FBP reconstruction (ground truth) most at iteration ~20.

We can also plot and compare the time taken/elapsed across each iteration:

[ ]:

plt.plot(xx, times_cgls, 'r-', xx, times_lsqr, 'b.--')
plt.xlabel('Iteration')
plt.ylabel('Time (s)')
plt.gca().legend(('CGLS','LSQR'))

<matplotlib.legend.Legend at 0x7f3d023669f0>

Figure 2: This graph shows the time elapsed at each iteration for CGLS and LSQR. For both algorithms, this relationship is linear. On this device, the elapsed times across 100 iterations are very similar.

Memory Profiling#

To profile the memory usage, a custom version of the CGLS file was created. CGLS_MP uses the memory_profiler module to profile the __init__, set_up and run methods.

To validate these results, CGLS_LP uses psutil to manually track the memory usage after each line in these methods. This version also tracks the memory usage in a separate thread, which shows the peak usages in each method. The same was done for LSQR.

Below, we will run the memory-sandstone.py script using the modified files, and analyse the outputs:

Validating Profiling Tools:#

[20]:

%%capture cgls_mem_prof
!python3 Scripts/memory-sandstone.py cgls_mp

[21]:

%%capture cgls_line_prof
!python3 Scripts/memory-sandstone.py cgls_lp --track-peak

First, we can view the line-by-line memory increases from the memory_profiler output:

[22]:

print(*cgls_mem_prof.stdout.splitlines()[:117], sep="\n") # Print set_up, __init__ & run 1

Dataset folder already exists in ..
Filename: /home/efv97572/cil-showcase/CIL-User-Showcase/015_Memory_Profiling_LSQR_CGLS/Algorithms/CGLSMP.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    81   681.71 MiB   681.71 MiB           1       @profile(precision=2)
    82                                             def set_up(self, initial, operator, data):
    83                                                 r'''Initialisation of the algorithm
    84                                                 Parameters
    85                                                 ------------
    86                                                 operator : Operator
    87                                                     Linear operator for the inverse problem
    88                                                 initial : (optional) DataContainer in the domain of the operator, default is a DataContainer filled with zeros.
    89                                                     Initial guess
    90                                                 data : DataContainer in the range of the operator
    91                                                     Acquired data to reconstruct
    92
    93                                                 '''
    94   681.71 MiB     0.00 MiB           1           log.info("%s setting up", self.__class__.__name__)
    95   735.62 MiB    53.91 MiB           1           self.x = initial.copy()
    96   735.62 MiB     0.00 MiB           1           self.operator = operator
    97
    98   758.32 MiB    22.71 MiB           1           self.r = data - self.operator.direct(self.x)
    99   813.20 MiB    54.88 MiB           1           self.s = self.operator.adjoint(self.r)
   100
   101   867.13 MiB    53.93 MiB           1           self.p = self.s.copy()
   102   888.44 MiB    21.31 MiB           1           self.q = self.operator.range_geometry().allocate()
   103
   104   888.66 MiB     0.21 MiB           1           self.norms0 = self.s.norm()
   105   888.66 MiB     0.00 MiB           1           self.norms = self.s.norm()
   106
   107   888.66 MiB     0.00 MiB           1           self.gamma = self.norms0**2
   108   888.72 MiB     0.06 MiB           1           self.normx = self.x.norm()
   109
   110   888.72 MiB     0.00 MiB           1           self.configured = True
   111   888.72 MiB     0.00 MiB           1           log.info("%s configured", self.__class__.__name__)


Filename: /home/efv97572/cil-showcase/CIL-User-Showcase/015_Memory_Profiling_LSQR_CGLS/Algorithms/CGLSMP.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    63   681.71 MiB   681.71 MiB           1       @profile(precision=2)
    64                                             def __init__(self, initial=None, operator=None, data=None, **kwargs):
    65                                                 '''initialisation of the algorithm
    66                                                 '''
    67                                                 #We are deprecating tolerance
    68   681.71 MiB     0.00 MiB           1           self.tolerance=kwargs.pop("tolerance", None)
    69   681.71 MiB     0.00 MiB           1           if self.tolerance is not None:
    70                                                     warnings.warn( stacklevel=2, category=DeprecationWarning, message="Passing tolerance directly to CGLS is being deprecated. Instead we recommend using the callback functionality: https://tomographicimaging.github.io/CIL/nightly/optimisation/#callbacks and in particular the CGLSEarlyStopping callback replicated the old behaviour")
    71                                                 else:
    72   681.71 MiB     0.00 MiB           1               self.tolerance = 0
    73
    74   681.71 MiB     0.00 MiB           1           super(CGLS_MP, self).__init__(**kwargs)
    75
    76   681.71 MiB     0.00 MiB           1           if initial is None and operator is not None:
    77                                                     initial = operator.domain_geometry().allocate(0)
    78   681.71 MiB     0.00 MiB           1           if initial is not None and operator is not None and data is not None:
    79   888.72 MiB   207.01 MiB           1               self.set_up(initial=initial, operator=operator, data=data)


run 1
Filename: /home/efv97572/cil-showcase/CIL-User-Showcase/015_Memory_Profiling_LSQR_CGLS/Algorithms/CGLSMP.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   113   888.72 MiB   888.72 MiB           1       @profile(precision=2)
   114                                             def run(self, iterations=None, callbacks: Optional[List[Callback]]=None, verbose=1, **kwargs):
   115                                                 r"""run upto :code:`iterations` with callbacks/logging.
   116
   117                                                 For a demonstration of callbacks see https://github.com/TomographicImaging/CIL-Demos/blob/main/misc/callback_demonstration.ipynb
   118
   119                                                 Parameters
   120                                                 -----------
   121                                                 iterations: int, default is None
   122                                                     Number of iterations to run. If not set the algorithm will run until :code:`should_stop()` is reached
   123                                                 callbacks: list of callables, default is Defaults to :code:`[ProgressCallback(verbose)]`
   124                                                     List of callables which are passed the current Algorithm object each iteration. Defaults to :code:`[ProgressCallback(verbose)]`.
   125                                                 verbose: 0=quiet, 1=info, 2=debug
   126                                                     Passed to the default callback to determine the verbosity of the printed output.
   127                                                 """
   128
   129   888.72 MiB     0.00 MiB           1           if 'print_interval' in kwargs:
   130                                                     warnings.warn("use `TextProgressCallback(miniters)` instead of `run(print_interval)`",
   131                                                          DeprecationWarning, stacklevel=2)
   132   888.72 MiB     0.00 MiB           1           if callbacks is None:
   133   888.72 MiB     0.00 MiB           1               callbacks = [ProgressCallback(verbose=verbose)]
   134
   135                                                 # transform old-style callbacks into new
   136   888.72 MiB     0.00 MiB           1           callback = kwargs.get('callback', None)
   137
   138   888.72 MiB     0.00 MiB           1           if callback is not None:
   139                                                     callbacks.append(_OldCallback(callback, verbose=verbose))
   140   888.72 MiB     0.00 MiB           1           if hasattr(self, '__log_file'):
   141                                                     callbacks.append(LogfileCallback(self.__log_file, verbose=verbose))
   142
   143   888.72 MiB     0.00 MiB           1           if self.should_stop():
   144                                                     print("Stop criterion has been reached.")
   145   888.72 MiB     0.00 MiB           1           if iterations is None:
   146                                                     warnings.warn("`run()` missing `iterations`", DeprecationWarning, stacklevel=2)
   147                                                     iterations = self.max_iteration
   148
   149   888.72 MiB     0.00 MiB           1           if self.iteration == -1 and self.update_objective_interval>0:
   150   888.72 MiB     0.00 MiB           1               iterations+=1
   151
   152                                                 # call `__next__` upto `iterations` times or until `StopIteration` is raised
   153   888.72 MiB     0.00 MiB           1           self.max_iteration = self.iteration + iterations
   154
   155   888.72 MiB     0.00 MiB           2           iters = (count(self.iteration) if numpy.isposinf(self.max_iteration)
   156   888.72 MiB     0.00 MiB           1                    else range(self.iteration, self.max_iteration))
   157
   158   888.92 MiB     0.06 MiB           3           for _ in zip(iters, self):
   159   888.92 MiB     0.00 MiB           2               try:
   160   888.92 MiB     0.00 MiB           4                   for callback in callbacks:
   161   888.92 MiB     0.14 MiB           2                       callback(self)
   162                                                     except StopIteration:
   163                                                         break

[23]:

# ImageData and AcquisitionData Sizes
print(f"Estimate of 'initial' (ImageData) size: {initial.size*dtypeSize*MB:.1f} MB")
print(f"Estimate of 'sandstone_noisy' (AcquisitionData) size: {sandstone_noisy.size*dtypeSize*MB:.1f} MB")

Estimate of 'initial' (ImageData) size: 53.9 MB
Estimate of 'sandstone_noisy' (AcquisitionData) size: 21.5 MB

In the set_up method, we can see:

3 increases of ~53.9 MB - these correspond to the size of the ImageData space
2 increases of ~21.5 MB - corresponding to the AcquisitionData space
Totalling to an increase of around ~205 MB

These increases are expected. In the __init__ method, the same increase is seen as a result of the call to set_up (Highlighted in turquoise).

In each run, there is a small increase in memory usage due to the iterable in line 158, which calls the update method.

For easier visualisation of the Initial and Final usages per method, we process this output and display it as a dataframe. The first usage reading was ~653 MB, which is the memory allocated when loading and processing the dataset. To show the memory usage of the CGLS algorithm alone, this dataframe has been normalised by subtracting the prior memory usage from the results:

[24]:

cgls_mp = mem_prof_process(cgls_mem_prof, norm=True)
cgls_mp.style.map(lambda val: 'background-color: turquoise' if val > 200 else '', subset='Total Memory Usage (MiB)')\
                 .format(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)

[24]:

	Method	Method Call	Initial Memory Usage (MiB)	Final Memory Usage (MiB)	Total Memory Usage (MiB)
0	__init__	def __init__(self, initial=None, operator=None, data=None, **kwargs):	0.00	207.01	207.01
1	set_up	def set_up(self, initial, operator, data):	0.00	207.01	207.01
2	run 1	def run(self, iterations=None, callbacks: Optional[List[Callback]]=None, verbose=1, **kwargs):	207.01	207.21	0.20
3	run 2	def run(self, iterations=None, callbacks: Optional[List[Callback]]=None, verbose=1, **kwargs):	207.21	207.30	0.09
4	run 3	def run(self, iterations=None, callbacks: Optional[List[Callback]]=None, verbose=1, **kwargs):	207.30	207.33	0.03
5	run 4	def run(self, iterations=None, callbacks: Optional[List[Callback]]=None, verbose=1, **kwargs):	207.33	207.36	0.03
6	run 5	def run(self, iterations=None, callbacks: Optional[List[Callback]]=None, verbose=1, **kwargs):	207.36	207.39	0.03
7	run 6	def run(self, iterations=None, callbacks: Optional[List[Callback]]=None, verbose=1, **kwargs):	207.39	207.41	0.02
8	run 7	def run(self, iterations=None, callbacks: Optional[List[Callback]]=None, verbose=1, **kwargs):	207.41	207.49	0.08
9	run 8	def run(self, iterations=None, callbacks: Optional[List[Callback]]=None, verbose=1, **kwargs):	207.49	207.53	0.04
10	run 9	def run(self, iterations=None, callbacks: Optional[List[Callback]]=None, verbose=1, **kwargs):	207.53	207.56	0.03
11	run 10	def run(self, iterations=None, callbacks: Optional[List[Callback]]=None, verbose=1, **kwargs):	207.56	207.59	0.03

Let’s compare these values to the line-by-line profiling results:

[25]:

cgls_lp = line_prof_process(cgls_line_prof, norm=True)
cgls_lp.style.map(lambda val: 'background-color: turquoise' if val > 200 else '', subset='Total Memory Usage (MiB)')\
                            .format(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)

[25]:

	Method	Initial Memory Usage (MiB)	Final Memory Usage (MiB)	Total Memory Usage (MiB)
0	init	0.00	206.93	206.93
1	setup	0.00	206.93	206.93
2	run 1	206.93	207.19	0.26
3	run 2	207.19	207.28	0.09
4	run 3	207.28	207.31	0.03
5	run 4	207.31	207.34	0.03
6	run 5	207.34	207.37	0.03
7	run 6	207.37	207.40	0.03
8	run 7	207.40	207.43	0.03
9	run 8	207.43	207.49	0.06
10	run 9	207.49	207.49	0.00
11	run 10	207.49	207.51	0.02

As we can see, the major change in memory corresponding to set_up and __init__ are the same, with minor variations in the memory usages of the run calls.

Peak Memory Usages:#

We have validated the memory usage tracking using two techniques. For the remainder of the notebook, we will be using output snippets from both to illustrate memory changes.

With the line-by-line profiling technique, we can also take a look at the peak memory usages that were measured on a separate thread, and the corresponding calls:

[26]:

cgls_lp = line_peak_process(cgls_line_prof, norm=True)
cgls_lp.style.apply(
    lambda x: ['background-color: pink' if (58 > val > 50) and (i % 2 == 1) else '' for i, val in enumerate(x)],
    subset='Peak Memory Increase (MiB)')\
    .format(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)

[26]:

	Method	Initial Memory Usage (MiB)	Peak Memory	Final Memory Usage (MiB)	Total Memory Usage (MiB)	Peak Memory Increase (MiB)	Peak Line
0	init	0.00	206.93	206.93	206.93	206.93	self.set_up(initial=initial, operator=operator, data=data)
1	setup	0.00	206.93	206.93	206.93	206.93	self.norms0 = self.s.norm()
2	run 1	206.93	261.08	207.19	0.26	54.15	update(self)
3	update 1	207.13	261.08	207.19	0.06	53.95	self.operator.direct(self.p, out=self.q)
4	run 2	207.19	261.24	207.28	0.09	54.05	update(self)
5	update 2	207.19	261.24	207.28	0.09	54.05	self.operator.direct(self.p, out=self.q)
6	run 3	207.28	261.27	207.31	0.03	53.99	update(self)
7	update 3	207.28	261.27	207.31	0.03	53.99	self.operator.direct(self.p, out=self.q)
8	run 4	207.31	261.30	207.34	0.03	53.99	update(self)
9	update 4	207.31	261.30	207.34	0.03	53.99	self.operator.direct(self.p, out=self.q)
10	run 5	207.34	261.33	207.37	0.03	53.99	update(self)
11	update 5	207.34	261.33	207.37	0.03	53.99	self.operator.direct(self.p, out=self.q)
12	run 6	207.37	261.36	207.40	0.03	53.99	update(self)
13	update 6	207.37	261.36	207.40	0.03	53.99	self.operator.direct(self.p, out=self.q)
14	run 7	207.40	261.38	207.43	0.03	53.98	update(self)
15	update 7	207.40	261.38	207.43	0.03	53.98	self.operator.direct(self.p, out=self.q)
16	run 8	207.43	261.41	207.49	0.06	53.98	update(self)
17	update 8	207.43	261.41	207.49	0.06	53.98	self.operator.direct(self.p, out=self.q)
18	run 9	207.49	261.44	207.49	0.00	53.95	update(self)
19	update 9	207.49	261.44	207.49	0.00	53.95	self.operator.direct(self.p, out=self.q)
20	run 10	207.49	261.47	207.51	0.02	53.98	update(self)
21	update 10	207.49	261.47	207.51	0.02	53.98	self.operator.direct(self.p, out=self.q)

This table shows that CGLS’ set_up and __init__ calls do not have a spike in memory that exceeds the final usage. However, during the runs, the update(self) call produces peak increases around the size of the ImageData space. This memory usage is transient, and is released by the time the call is finished.

Looking at the Method=update Peak lines (Highlighted in pink), we can see the biggest peak is attributed to the line self.operator.direct(self.p, out=self.q), which creates a copy of the ImageData space during its execution, and releases the memory after it has run.

Below, we visualise this by plotting the total memory increases and the peak memory increases as sequential line graphs:

[27]:

plot_mem((cgls_lp,), ("CGLS",))

../../_images/demos_memory_profiling_sandstone_52_0.png

Figure 3: This graph shows the memory usages before (initial), during (peak) and after (final) the ‘init’ and ‘run’ methods in CGLS. The first two points show the ~205 MB memory increase during ‘init’, which remains stable. The spikes during each subsequent run show that the memory usage reaches a peak, and then drops back down, resulting in a net increase of ~0 MB. This is consistent across all ‘run’ calls (iterations).

Comparison With LSQR:#

Let’s compare this to LSQR’s memory usage:

[28]:

%%capture lsqr_line_prof
!python3 Scripts/memory-sandstone.py lsqr_lp --track-peak

[29]:

lsqr_lp = line_peak_process(lsqr_line_prof, norm=True)
lsqr_lp.style.apply(
    lambda x: ['background-color: pink' if (58 > val > 50) and (i % 2 == 1) else '' for i, val in enumerate(x)],
    subset='Peak Memory Increase (MiB)')\
    .map(lambda val: 'background-color: turquoise' if val > 200 else '', subset='Total Memory Usage (MiB)')\
    .map(lambda val: 'background-color: orange' if 58 > val > 50 else '', subset='Total Memory Usage (MiB)')\
    .format(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)

[29]:

	Method	Initial Memory Usage (MiB)	Peak Memory	Final Memory Usage (MiB)	Total Memory Usage (MiB)	Peak Memory Increase (MiB)	Peak Line
0	init	0.00	207.02	207.02	207.02	207.02	self.set_up(initial=initial, operator=operator, data=data)
1	setup	0.00	207.02	207.02	207.02	207.02	self.tmp_range = data.geometry.allocate(None)
2	run 1	207.02	261.20	261.17	54.15	54.18	update(self)
3	update 1	207.25	261.20	261.17	53.92	53.95	self.operator.direct(self.v, out=self.tmp_range)
4	run 2	261.17	315.19	261.22	0.05	54.02	update(self)
5	update 2	261.17	315.19	261.22	0.05	54.02	self.operator.direct(self.v, out=self.tmp_range)
6	run 3	261.22	315.22	261.25	0.03	54.00	update(self)
7	update 3	261.22	315.22	261.25	0.03	54.00	self.operator.direct(self.v, out=self.tmp_range)
8	run 4	261.25	315.25	261.37	0.12	54.00	update(self)
9	update 4	261.25	315.25	261.37	0.12	54.00	self.operator.direct(self.v, out=self.tmp_range)
10	run 5	261.37	315.31	261.36	-0.01	53.94	update(self)
11	update 5	261.37	315.31	261.36	-0.01	53.94	self.operator.direct(self.v, out=self.tmp_range)
12	run 6	261.36	315.35	261.42	0.06	53.99	update(self)
13	update 6	261.36	315.35	261.42	0.06	53.99	self.operator.direct(self.v, out=self.tmp_range)
14	run 7	261.42	315.38	261.47	0.05	53.96	update(self)
15	update 7	261.42	315.38	261.47	0.05	53.96	self.operator.direct(self.v, out=self.tmp_range)
16	run 8	261.47	315.42	261.53	0.06	53.95	update(self)
17	update 8	261.47	315.42	261.52	0.05	53.95	self.operator.direct(self.v, out=self.tmp_range)
18	run 9	261.53	315.47	261.54	0.01	53.94	update(self)
19	update 9	261.53	315.47	261.54	0.01	53.94	self.operator.direct(self.v, out=self.tmp_range)
20	run 10	261.54	315.50	261.54	0.00	53.96	update(self)
21	update 10	261.54	315.50	261.54	0.00	53.96	self.operator.direct(self.v, out=self.tmp_range)

The table and plots show that LSQR has a the same setup cost as CGLS (Highlighted in turquoise).

We also see the same Peak Increase in LSQR’s run calls, again due to the ImageData copy made during update’s self.operator.direct(self.v, out=self.tmp_range) call (Highlighted in pink), which is deallocated after this line is complete.

Notably, there is a ~54 MB increase in total memory usage in the first run/update calls (Highlighted in orange). This is because in set_up, self.tmp_domain is initialised with None, which does not allocate memory. But when calling the first update, the line self.operator.adjoint(self.u, out=self.tmp_domain) replaces the Nones with values, which results in memory allocation.

In total with the initialisation cost, LSQR makes:

4 increases of ~53.9 MB - corresponding to the size of the ImageData space
2 increases of ~21.5 MB - corresponding to the AcquisitionData space

The raw line-by-line output below shows the self.operator.direct peak and subsequent deallocation, and the self.operator.adjoint increase:

[30]:

# Uncomment to view line-by-line & peak usage output:
print_output_subset(lsqr_line_prof, 'run 1') # Truncated to show run 1 & update 1

run 1
Start of run | Memory Usage: 887.12 MB | line:

16 | Memory Usage: 887.12 MB | line: callbacks = [ProgressCallback(verbose=verbose)]

17 | Memory Usage: 887.12 MB | line: callback = kwargs.get('callback', None)

18 | Memory Usage: 887.12 MB | line: callbacks.append(_OldCallback(callback, verbose=verbose))

19 | Memory Usage: 887.12 MB | line: iterations = self.max_iteration

20 | Memory Usage: 887.12 MB | line: iterations+=1

21 | Memory Usage: 887.12 MB | line: self.max_iteration = self.iteration + iterations

22 | Memory Usage: 887.23 MB | line: iters = (count(self.iteration) if numpy.isposinf(self.max_iteration) else range(self.iteration, self.max_iteration))

23 | Memory Usage: 887.23 MB | line: update(self)

24 | Memory Usage: 887.35 MB | line: callback(self)

Start of update | Memory Usage: 887.35 MB | line:

Memory Usage Log (Time, Memory in MB): 15:55:14.623491, 941.25 MB

Memory Usage Log (Time, Memory in MB): 15:55:14.633770, 941.30 MB

Memory Usage Log (Time, Memory in MB): 15:55:14.643983, 941.30 MB

Memory Usage Log (Time, Memory in MB): 15:55:14.654148, 941.30 MB

Memory Usage Log (Time, Memory in MB): 15:55:14.664296, 941.30 MB

Memory Usage Log (Time, Memory in MB): 15:55:14.674446, 941.30 MB

Memory Usage Log (Time, Memory in MB): 15:55:14.684598, 941.30 MB

Memory Usage Log (Time, Memory in MB): 15:55:14.694766, 941.30 MB

Memory Usage Log (Time, Memory in MB): 15:55:14.704964, 941.30 MB

Memory Usage Log (Time, Memory in MB): 15:55:14.715156, 941.30 MB

Memory Usage Log (Time, Memory in MB): 15:55:14.725320, 941.30 MB

Memory Usage Log (Time, Memory in MB): 15:55:14.735482, 941.30 MB

25 | Memory Usage: 887.37 MB | line: self.operator.direct(self.v, out=self.tmp_range)

Memory Usage Log (Time, Memory in MB): 15:55:14.747662, 887.37 MB

26 | Memory Usage: 887.36 MB | line: self.tmp_range.sapyb(1.,  self.u,-self.alpha, out=self.u)

27 | Memory Usage: 887.36 MB | line: self.beta = self.u.norm()

28 | Memory Usage: 887.36 MB | line: self.u /= self.beta
Memory Usage Log (Time, Memory in MB): 15:55:14.758079, 887.36 MB


Memory Usage Log (Time, Memory in MB): 15:55:14.783936, 887.36 MB

Memory Usage Log (Time, Memory in MB): 15:55:14.794185, 887.36 MB

Memory Usage Log (Time, Memory in MB): 15:55:14.804395, 887.36 MB

Memory Usage Log (Time, Memory in MB): 15:55:14.814572, 887.36 MB

Memory Usage Log (Time, Memory in MB): 15:55:14.824746, 887.36 MB

Memory Usage Log (Time, Memory in MB): 15:55:14.834923, 887.36 MB

Memory Usage Log (Time, Memory in MB): 15:55:14.845119, 887.36 MB

Memory Usage Log (Time, Memory in MB): 15:55:14.855290, 887.36 MB

Memory Usage Log (Time, Memory in MB): 15:55:14.865475, 887.36 MB

Memory Usage Log (Time, Memory in MB): 15:55:14.875650, 887.36 MB

Memory Usage Log (Time, Memory in MB): 15:55:14.885855, 887.36 MB

Memory Usage Log (Time, Memory in MB): 15:55:14.896072, 887.36 MB

Memory Usage Log (Time, Memory in MB): 15:55:14.906363, 899.14 MB

Memory Usage Log (Time, Memory in MB): 15:55:14.916694, 923.14 MB

29 | Memory Usage: 941.29 MB | line: self.operator.adjoint(self.u, out=self.tmp_domain)

Memory Usage Log (Time, Memory in MB): 15:55:14.928414, 941.29 MB

30 | Memory Usage: 941.29 MB | line: self.v.sapyb(-self.beta, self.tmp_domain, 1., out=self.v)

Memory Usage Log (Time, Memory in MB): 15:55:14.938845, 941.29 MB

31 | Memory Usage: 941.29 MB | line: self.alpha = self.v.norm()

Memory Usage Log (Time, Memory in MB): 15:55:14.949036, 941.29 MB

32 | Memory Usage: 941.29 MB | line: self.v /= self.alpha

33 | Memory Usage: 941.29 MB | line: rhobar1 = self.rhobar, psi = 0

34 | Memory Usage: 941.29 MB | line: self.phibar = s * self.phibar

Memory Usage Log (Time, Memory in MB): 15:55:14.959302, 941.28 MB

35 | Memory Usage: 941.28 MB | line: self.x.sapyb(1, self.d, phi/rho, out=self.x)

Memory Usage Log (Time, Memory in MB): 15:55:14.969583, 941.27 MB

36 | Memory Usage: 941.27 MB | line: self.d.sapyb(-theta/rho, self.v, 1, out=self.d)

End of update | Memory Usage: 941.27 MB | line: self.normr = math.sqrt(self.phibar ** 2 + self.res2)

23 | Memory Usage: 941.27 MB | line: update(self)

24 | Memory Usage: 941.27 MB | line: callback(self)

End of run | Memory Usage: 941.27 MB | line:

In contrast to the second run/update calls, we still see a self.operator.direct peak, but no overall usage change from self.operator.adjoint:

[31]:

print_output_subset(lsqr_line_prof, 'run 2') # Truncated to show run 2 & update 2

run 2
Start of run | Memory Usage: 941.27 MB | line:

16 | Memory Usage: 941.27 MB | line: callbacks = [ProgressCallback(verbose=verbose)]

17 | Memory Usage: 941.27 MB | line: callback = kwargs.get('callback', None)

18 | Memory Usage: 941.27 MB | line: callbacks.append(_OldCallback(callback, verbose=verbose))

19 | Memory Usage: 941.27 MB | line: iterations = self.max_iteration

20 | Memory Usage: 941.27 MB | line: iterations+=1

21 | Memory Usage: 941.27 MB | line: self.max_iteration = self.iteration + iterations

22 | Memory Usage: 941.27 MB | line: iters = (count(self.iteration) if numpy.isposinf(self.max_iteration) else range(self.iteration, self.max_iteration))

Start of update | Memory Usage: 941.27 MB | line:

Memory Usage Log (Time, Memory in MB): 15:55:15.033527, 995.12 MB

Memory Usage Log (Time, Memory in MB): 15:55:15.043790, 995.29 MB

Memory Usage Log (Time, Memory in MB): 15:55:15.053982, 995.29 MB

Memory Usage Log (Time, Memory in MB): 15:55:15.064149, 995.29 MB

Memory Usage Log (Time, Memory in MB): 15:55:15.074310, 995.29 MB

Memory Usage Log (Time, Memory in MB): 15:55:15.084502, 995.29 MB

Memory Usage Log (Time, Memory in MB): 15:55:15.094703, 995.29 MB

Memory Usage Log (Time, Memory in MB): 15:55:15.104903, 995.29 MB

Memory Usage Log (Time, Memory in MB): 15:55:15.115103, 995.29 MB

Memory Usage Log (Time, Memory in MB): 15:55:15.125278, 995.29 MB

Memory Usage Log (Time, Memory in MB): 15:55:15.135451, 995.29 MB

Memory Usage Log (Time, Memory in MB): 15:55:15.145736, 995.29 MB

25 | Memory Usage: 941.35 MB | line: self.operator.direct(self.v, out=self.tmp_range)

Memory Usage Log (Time, Memory in MB): 15:55:15.155986, 941.35 MB

26 | Memory Usage: 941.34 MB | line: self.tmp_range.sapyb(1.,  self.u,-self.alpha, out=self.u)

27 | Memory Usage: 941.34 MB | line: self.beta = self.u.norm()

28 | Memory Usage: 941.34 MB | line: self.u /= self.beta

Memory Usage Log (Time, Memory in MB): 15:55:15.190489, 941.34 MB

Memory Usage Log (Time, Memory in MB): 15:55:15.200770, 941.34 MB

Memory Usage Log (Time, Memory in MB): 15:55:15.210964, 941.34 MB

Memory Usage Log (Time, Memory in MB): 15:55:15.221151, 941.34 MB

Memory Usage Log (Time, Memory in MB): 15:55:15.231526, 941.34 MB

Memory Usage Log (Time, Memory in MB): 15:55:15.241818, 941.34 MB

Memory Usage Log (Time, Memory in MB): 15:55:15.252074, 941.34 MB

Memory Usage Log (Time, Memory in MB): 15:55:15.262294, 941.34 MB

Memory Usage Log (Time, Memory in MB): 15:55:15.272539, 941.34 MB

Memory Usage Log (Time, Memory in MB): 15:55:15.282943, 941.34 MB

Memory Usage Log (Time, Memory in MB): 15:55:15.293293, 941.34 MB

Memory Usage Log (Time, Memory in MB): 15:55:15.303709, 941.34 MB

Memory Usage Log (Time, Memory in MB): 15:55:15.313990, 941.34 MB

29 | Memory Usage: 941.34 MB | line: self.operator.adjoint(self.u, out=self.tmp_domain)

Memory Usage Log (Time, Memory in MB): 15:55:15.324311, 941.34 MB

30 | Memory Usage: 941.34 MB | line: self.v.sapyb(-self.beta, self.tmp_domain, 1., out=self.v)

31 | Memory Usage: 941.34 MB | line: self.alpha = self.v.norm()

Memory Usage Log (Time, Memory in MB): 15:55:15.334620, 941.34 MB

32 | Memory Usage: 941.34 MB | line: self.v /= self.alpha

33 | Memory Usage: 941.34 MB | line: rhobar1 = self.rhobar, psi = 0

34 | Memory Usage: 941.34 MB | line: self.phibar = s * self.phibar

Memory Usage Log (Time, Memory in MB): 15:55:15.344957, 941.33 MB

35 | Memory Usage: 941.33 MB | line: self.x.sapyb(1, self.d, phi/rho, out=self.x)

Memory Usage Log (Time, Memory in MB): 15:55:15.355259, 941.32 MB

36 | Memory Usage: 941.32 MB | line: self.d.sapyb(-theta/rho, self.v, 1, out=self.d)

End of update | Memory Usage: 941.32 MB | line: self.normr = math.sqrt(self.phibar ** 2 + self.res2)

23 | Memory Usage: 941.32 MB | line: update(self)

24 | Memory Usage: 941.32 MB | line: callback(self)

End of run | Memory Usage: 941.32 MB | line:

Below is a graph illustrating the overall trends across the methods, showing the comparison for CGLS and LSQR:

[32]:

plot_mem(dfs=(cgls_lp, lsqr_lp), algNms=("CGLS", "LSQR"))

../../_images/demos_memory_profiling_sandstone_62_0.png

Figure 4: This graph shows the memory usages before (initial), during (peak) and after (final) the ‘init’ and ‘run’ methods in CGLS (Blue) and LSQR (Orange). The first two points show the memory increase during ‘init’, which remains stable. Unlike CGLS, LSQR’s first ‘run’ (iteration) shows another memory increase, which also stays consistent and shows that LSQR uses more memory than CGLS. The spikes during each subsequent run show that the memory usage reaches a peak, and then drops back down, resulting in a net increase of ~0 MB. This is consistent across all ‘run’ calls (iterations).

LSQR creates 4 copies of the ImageData space, and 2 copies of the AcquisitionData, totalling to ~259 MB. This can be seen clearly in the breakdown below:

[33]:

%%capture lsqr_mem_prof
!python3 Scripts/memory-sandstone.py lsqr_mp # Uncomment to run memory_profiler breakdown

[34]:

# Uncomment to see memory_profiler breakdown:
print(*lsqr_mem_prof.stdout.splitlines()[:42], sep="\n") # Print set_up
print(*lsqr_mem_prof.stdout.splitlines()[77:135], sep="\n") # Print run 1

Dataset folder already exists in ..
Filename: /home/efv97572/cil-showcase/CIL-User-Showcase/015_Memory_Profiling_LSQR_CGLS/Algorithms/LSQRMP.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    92    680.4 MiB    680.4 MiB           1       @profile
    93                                             def set_up(self, initial, operator, data):
    94                                                 r'''Initialisation of the algorithm
    95                                                 Parameters
    96                                                 ------------
    97                                                 operator : Operator
    98                                                     Linear operator for the inverse problem
    99                                                 initial : (optional) DataContainer in the domain of the operator, default is a DataContainer filled with zeros.
   100                                                     Initial guess
   101                                                 data : DataContainer in the range of the operator
   102                                                     Acquired data to reconstruct
   103
   104                                                 '''
   105
   106    680.4 MiB      0.0 MiB           1           log.info("%s setting up", self.__class__.__name__)
   107    734.3 MiB     53.9 MiB           1           self.x = initial.copy() #1 domain
   108    734.3 MiB      0.0 MiB           1           self.operator = operator
   109
   110                                                 # Initialise Golub-Kahan bidiagonalisation (GKB)
   111
   112                                                 #self.u = data - self.operator.direct(self.x)
   113    756.8 MiB     22.5 MiB           1           self.u = self.operator.direct(self.x) #1 range
   114    757.1 MiB      0.3 MiB           1           self.u.sapyb(-1, data, 1, out=self.u)
   115    757.1 MiB      0.0 MiB           1           self.beta = self.u.norm()
   116    757.1 MiB      0.0 MiB           1           self.u /= self.beta
   117
   118    812.1 MiB     55.0 MiB           1           self.v = self.operator.adjoint(self.u) #2 domain
   119    812.1 MiB      0.0 MiB           1           self.alpha = self.v.norm()
   120    812.1 MiB      0.0 MiB           1           self.v /= self.alpha
   121
   122    812.1 MiB      0.0 MiB           1           self.rhobar = self.alpha
   123    812.1 MiB      0.0 MiB           1           self.phibar = self.beta
   124    812.1 MiB      0.0 MiB           1           self.normr = self.beta
   125    812.1 MiB      0.0 MiB           1           self.regalphasq = self.regalpha**2
   126
   127    865.8 MiB     53.7 MiB           1           self.d = self.v.copy() #3 domain
   128    887.6 MiB     21.8 MiB           1           self.tmp_range = data.geometry.allocate(None) #2 range

run 1
Filename: /home/efv97572/cil-showcase/CIL-User-Showcase/015_Memory_Profiling_LSQR_CGLS/Algorithms/LSQRMP.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   136    887.6 MiB    887.6 MiB           1       @profile
   137                                             def run(self, iterations=None, callbacks: Optional[List[Callback]]=None, verbose=1, **kwargs):
   138                                                 r"""run upto :code:`iterations` with callbacks/logging.
   139
   140                                                 For a demonstration of callbacks see https://github.com/TomographicImaging/CIL-Demos/blob/main/misc/callback_demonstration.ipynb
   141
   142                                                 Parameters
   143                                                 -----------
   144                                                 iterations: int, default is None
   145                                                     Number of iterations to run. If not set the algorithm will run until :code:`should_stop()` is reached
   146                                                 callbacks: list of callables, default is Defaults to :code:`[ProgressCallback(verbose)]`
   147                                                     List of callables which are passed the current Algorithm object each iteration. Defaults to :code:`[ProgressCallback(verbose)]`.
   148                                                 verbose: 0=quiet, 1=info, 2=debug
   149                                                     Passed to the default callback to determine the verbosity of the printed output.
   150                                                 """
   151
   152    887.6 MiB      0.0 MiB           1           if 'print_interval' in kwargs:
   153                                                     warnings.warn("use `TextProgressCallback(miniters)` instead of `run(print_interval)`",
   154                                                          DeprecationWarning, stacklevel=2)
   155    887.6 MiB      0.0 MiB           1           if callbacks is None:
   156    887.6 MiB      0.0 MiB           1               callbacks = [ProgressCallback(verbose=verbose)]
   157
   158                                                 # transform old-style callbacks into new
   159    887.6 MiB      0.0 MiB           1           callback = kwargs.get('callback', None)
   160
   161    887.6 MiB      0.0 MiB           1           if callback is not None:
   162                                                     callbacks.append(_OldCallback(callback, verbose=verbose))
   163    887.6 MiB      0.0 MiB           1           if hasattr(self, '__log_file'):
   164                                                     callbacks.append(LogfileCallback(self.__log_file, verbose=verbose))
   165
   166    887.6 MiB      0.0 MiB           1           if self.should_stop():
   167                                                     print("Stop criterion has been reached.")
   168    887.6 MiB      0.0 MiB           1           if iterations is None:
   169                                                     warnings.warn("`run()` missing `iterations`", DeprecationWarning, stacklevel=2)
   170                                                     iterations = self.max_iteration
   171
   172    887.6 MiB      0.0 MiB           1           if self.iteration == -1 and self.update_objective_interval>0:
   173    887.6 MiB      0.0 MiB           1               iterations+=1
   174
   175                                                 # call `__next__` upto `iterations` times or until `StopIteration` is raised
   176    887.6 MiB      0.0 MiB           1           self.max_iteration = self.iteration + iterations
   177
   178    887.6 MiB      0.0 MiB           2           iters = (count(self.iteration) if numpy.isposinf(self.max_iteration)
   179    887.6 MiB      0.0 MiB           1                    else range(self.iteration, self.max_iteration))
   180
   181    941.7 MiB     53.9 MiB           3           for _ in zip(iters, self):
   182    941.7 MiB      0.0 MiB           2               try:
   183    941.7 MiB      0.0 MiB           4                   for callback in callbacks:
   184    941.7 MiB      0.2 MiB           2                       callback(self)
   185                                                     except StopIteration:
   186                                                         break

Tikhonov regularisation using CGLS and LSQR - with or without a block#

Tikhonov regularisation#

We can add a regularisation term to problem solved by CGLS; this gives us the minimisation problem in the following form, which is known as Tikhonov regularisation:

\[\underset{u}{\mathrm{argmin}}\begin{Vmatrix}A u - b \end{Vmatrix}^2_2 + \alpha^2\|u\|^2_2\]

where,

\(A\) is the projection operator
\(b\) is the acquired data
\(u\) is the unknown image to be solved for
\(\alpha\) is the regularisation parameter

The first term measures the fidelity of the solution to the data. The second term meausures the fidelity to the prior knowledge we have imposed on the system, operator \(L\). \(\alpha\) controls the trade-off between these terms. \(L\) is often chosen to be a smoothing operator like the identity matrix, or a gradient operator constrained to the squared L2-norm.

This can be re-written equivalently in the block matrix form:

\[\underset{u}{\mathrm{argmin}}\begin{Vmatrix}\binom{A}{\alpha I} u - \binom{b}{0}\end{Vmatrix}^2_2\]

With the definitions:

\(\tilde{A} = \binom{A}{\alpha I}\)
\(\tilde{b} = \binom{b}{0}\)

this can now be recognised as a least squares problem:

\[\underset{u}{\mathrm{argmin}}\begin{Vmatrix}\tilde{A} u - \tilde{b}\end{Vmatrix}^2_2\]

and being a least squares problem, it can be solved using CGLS with \(\tilde{A}\) as operator and \(\tilde{b}\) as data.

LSQR in CIL has the option to include Tikhonov regularisation without using blocks. In this section, we will compare LSQR Tikhonov with a block, CGLS with a block, and LSQR Tikhonov without a block.

Set up the algorithms:

[35]:

initial = ig.allocate(0)

L = IdentityOperator(ig)
alpha = 0.1

operator_block = BlockOperator(A, alpha*L)
zero_data = L.range.allocate(0)
data_block = BlockDataContainer(sandstone_noisy, zero_data)

maxit = 100
itsAtATime = 1
N = round(maxit/itsAtATime)

xx = np.arange(0, maxit, itsAtATime)

[36]:

cgls_tik_block = CGLS(initial=initial, operator=operator_block, data=data_block, update_objective_interval = 10)
times_cgls_tik_block, rel_res_vec_cgls_tik_block, rel_err_vec_cgls_tik_block = timed_iterations(sandstone_noisy, A, sandstone_cor, recon, padsize, cgls_tik_block, N, itsAtATime)

lsqr_tik_block = LSQR(initial=initial, operator=operator_block, data=data_block, update_objective_interval = 10)
times_lsqr_tik_block, rel_res_vec_lsqr_tik_block, rel_err_vec_lsqr_tik_block = timed_iterations(sandstone_noisy, A, sandstone_cor, recon, padsize, lsqr_tik_block, N, itsAtATime)

lsqr_tik_simple = LSQR_Tikhonov(initial=initial, operator=A, data=sandstone_noisy, alpha = alpha)
times_lsqr_tik_simple, rel_res_vec_lsqr_tik_simple, rel_err_vec_lsqr_tik_simple = timed_iterations(sandstone_noisy, A, sandstone_cor, recon, padsize, lsqr_tik_simple, N, itsAtATime)

Convergence/reduction of residuals:#

The plots below show the progress of the residuals across each iteration, and the error across iterations:

[37]:

plt.subplot(1,2,1)
plt.semilogy(xx, rel_res_vec_lsqr_tik_block, '*-', xx, rel_res_vec_cgls_tik_block, '-', xx, rel_res_vec_lsqr_tik_simple, '--')
plt.xlabel('Iteratons')
plt.ylabel('Residuals')
plt.gca().legend(('LSQR Block', 'CGLS Block', 'LSQR Tik'))

plt.subplot(1,2,2)
plt.semilogy(xx, rel_err_vec_lsqr_tik_block, '*-', xx, rel_err_vec_cgls_tik_block, '-', xx, rel_err_vec_lsqr_tik_simple, '--')
plt.xlabel('Iteratons')
plt.ylabel('Error')
plt.gca().legend(('LSQR Block', 'CGLS Block', 'LSQR Tik'))
plt.tight_layout()

../../_images/demos_memory_profiling_sandstone_74_0.png

Figure 5: (Left) This graph shows the progress of the residuals across 100 iterations. CGLS Block, LSQR Block and LSQR Tikhonov’s iterations are equivalent mathematically, so they converge to the same residuals. The residuals themselves decrease (minimise) across the iterations, meaning the algorithms’ reconstructions match the data closely. (Right) This graph shows the relative error across 100 iterations. All 3 algorithms display identical errors due to mathematically equivalent iterations. This graph shows that the algorithms’ reconstructions match the FBP reconstruction (ground truth) most at iteration ~20.

Compare the time taken/elapsed across each iteration for the 3 algorithms:

[ ]:

plt.plot(xx, times_cgls_tik_block, 'r-', xx, times_lsqr_tik_block, 'b.--', xx, times_lsqr_tik_simple, 'g.-' )
plt.xlabel('Iteration')
plt.ylabel('Time (s)')
plt.gca().legend(('CGLS Block', 'LSQR Block', 'LSQR Tik'))

<matplotlib.legend.Legend at 0x7f3d909333e0>

../../_images/demos_memory_profiling_sandstone_77_1.png

Figure 6: This graph shows the time elapsed at each iteration for CGLS Block, LSQR Block and LSQR Tikhonov. For all algorithms, this relationship is linear. On this device, the elapsed times across 100 iterations are very similar, with CGLS Block being slightly faster overall.

Memory costs for LSQR+Block, CGLS+Block and LSQR+Tikhonov#

Setting Up Memory Tracking:#

Setup the memory tracking for the three algorithms:

[39]:

%%capture cgls_block_prof
!python3 Scripts/memory-sandstone.py cgls_lp_tik_block --track-peak

[40]:

%%capture lsqr_block_prof
!python3 Scripts/memory-sandstone.py lsqr_lp_tik_block --track-peak

[41]:

%%capture lsqr_block_mp
!python3 Scripts/memory-sandstone.py lsqr_mp_tik_block

[42]:

%%capture lsqr_tik_prof
!python3 Scripts/memory-sandstone.py lsqr_tik_lp --track-peak

[43]:

%%capture lsqr_tik_mp
!python3 Scripts/memory-sandstone.py lsqr_tik_mp

Memory Usage Results:#

[44]:

# ImageData and AcquisitionData Sizes
print(f"Estimate of 'initial' (ImageData) size: {initial.size*dtypeSize*MB:.1f} MB")
print(f"Estimate of 'sandstone_noisy' (AcquisitionData) size: {sandstone_noisy.size*dtypeSize*MB:.1f} MB")
print(f"Estimate of 'data_block' (Block AcquisitionData) size: {(sandstone_noisy.size+zero_data.size)*dtypeSize*MB:.1f} MB")

Estimate of 'initial' (ImageData) size: 53.9 MB
Estimate of 'sandstone_noisy' (AcquisitionData) size: 21.5 MB
Estimate of 'data_block' (Block AcquisitionData) size: 75.4 MB

In the setup for CGLS Block, the ImageData space is 53.9 MB but the (Block) AcquisitionData is now 75.4 MB.

Since the algorithm is the same, we expect the same number of increases as before. In total, we expect a setup increase of: (53.9 * 3) + (75.4 * 2) = ~312.5 MB (Highlighted in turquoise).

In the dataframe below, there is a (temporary) peak increase of ~129.5 MB in the first update (Highlighted in pink) due to the line self.operator.direct(self.p, out=self.q). This corresponds to the ImageData and Block AcquisitionData sizes: 53.9 + 75.4 = ~129.3 MB. Most of this memory is deallocated by the end of this line, ~21.5 MB (1x original AcquisitionData) stays occupied in the total usage (Highlighted in orange).

The subsequent update calls have a peak of ~107.9 MB, which is 1x AcqData (~21.5) less than the previous peak, suggesting that the AcqData buffer array in self.q is populated with values during the first call to update.

In total, CGLS Block makes:

3 copies of ImgData (53.9 MB)
2 copies of Block AcqData (75.4 MB)
1 copy of AcqData (21.5 MB)

The CGLS Block DataFrame shows these results below:

[45]:

cgls_block_lp = line_peak_process(cgls_block_prof, norm=True)
cgls_block_lp.iloc[:12].style.apply(
    lambda x: ['background-color: pink' if (134 > val > 126) and (i % 2 == 1) else '' for i, val in enumerate(x)],
    subset='Peak Memory Increase (MiB)')\
    .map(lambda val: 'background-color: turquoise' if val > 300 else '', subset='Total Memory Usage (MiB)')\
    .map(lambda val: 'background-color: orange' if 24 > val > 18 else '', subset='Total Memory Usage (MiB)')\
    .format(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)

[45]:

	Method	Initial Memory Usage (MiB)	Peak Memory	Final Memory Usage (MiB)	Total Memory Usage (MiB)	Peak Memory Increase (MiB)	Peak Line
0	init	0.00	314.77	314.77	314.77	314.77	self.set_up(initial=initial, operator=operator, data=data)
1	setup	0.00	314.77	314.77	314.77	314.77	self.gamma = self.norms0**2
2	run 1	314.77	444.35	336.67	21.90	129.58	update(self)
3	update 1	314.96	444.35	336.67	21.71	129.39	self.operator.direct(self.p, out=self.q)
4	run 2	336.67	444.56	336.73	0.06	107.89	update(self)
5	update 2	336.67	444.56	336.73	0.06	107.89	self.operator.direct(self.p, out=self.q)
6	run 3	336.73	444.60	336.78	0.05	107.87	update(self)
7	update 3	336.73	444.60	336.76	0.03	107.87	self.operator.direct(self.p, out=self.q)
8	run 4	336.78	444.65	336.80	0.02	107.87	update(self)
9	update 4	336.78	444.65	336.80	0.02	107.87	self.operator.direct(self.p, out=self.q)
10	run 5	336.80	444.68	336.85	0.05	107.88	update(self)
11	update 5	336.80	444.68	336.84	0.04	107.88	self.operator.direct(self.p, out=self.q)

We expect the same with LSQR block, in this case, 4 copies of ImageData and 2 copies of Block AcquisitionData, with an extra 21.5 MB allocated for the original AcquisitionData.
In set_up, the allocations (Highlighted in turquoise) are: (53.9 * 3) + 75.4 + 21.5 = ~258.6 MB
in the first update, the allocations (Highlighted in orange) are: 53.9 + 75.4 = ~129.3 MB

The memory increases in update 1 occur because the buffer arrays initialised in set_up become populated with values.

The peak of ~183 MB (Highlighted in pink) is due to self.operator.direct(self.v, out=self.tmp_range) creating temp buffers of ImgData + Block AcqData, and self.operator.adjoint(self.u, out=self.tmp_domain) creating 2 buffers of ImgData during update 1.

In total, LSQR Block makes:

4 copies of ImgData (53.9 MB)
2 copies of Block AcqData (75.4 MB)
1 copy of AcqData (21.5 MB)

[46]:

lsqr_block_lp = line_peak_process(lsqr_block_prof, norm=True)
lsqr_block_lp.iloc[:12].style.apply(
    lambda x: ['background-color: pink' if (190 > val > 175) and (i % 2 == 1) else '' for i, val in enumerate(x)],
    subset='Peak Memory Increase (MiB)')\
    .map(lambda val: 'background-color: turquoise' if val > 255 else '', subset='Total Memory Usage (MiB)')\
    .map(lambda val: 'background-color: orange' if 134 > val > 126 else '', subset='Total Memory Usage (MiB)')\
    .format(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)

[46]:

	Method	Initial Memory Usage (MiB)	Peak Memory	Final Memory Usage (MiB)	Total Memory Usage (MiB)	Peak Memory Increase (MiB)	Peak Line
0	init	0.00	260.78	260.78	260.78	260.78	self.set_up(initial=initial, operator=operator, data=data)
1	setup	0.00	260.78	260.78	260.78	260.78	self.tmp_range = data.geometry.allocate(None)
2	run 1	260.78	444.51	390.60	129.82	183.73	update(self)
3	update 1	261.14	444.51	390.55	129.41	183.37	self.operator.adjoint(self.u, out=self.tmp_domain)
4	run 2	390.60	498.47	390.64	0.04	107.87	update(self)
5	update 2	390.60	498.47	390.59	-0.01	107.87	self.operator.direct(self.v, out=self.tmp_range)
6	run 3	390.64	498.51	390.64	0.00	107.87	update(self)
7	update 3	390.64	498.51	390.64	0.00	107.87	self.operator.direct(self.v, out=self.tmp_range)
8	run 4	390.64	498.56	390.68	0.04	107.92	update(self)
9	update 4	390.64	498.56	390.68	0.04	107.92	self.operator.direct(self.v, out=self.tmp_range)
10	run 5	390.68	498.61	390.74	0.06	107.93	update(self)
11	update 5	390.68	498.61	390.74	0.06	107.93	self.operator.direct(self.v, out=self.tmp_range)

[47]:

# ImageData and AcquisitionData Sizes
print(f"Estimate of 'initial' (ImageData) size: {initial.size*dtypeSize*MB:.1f} MB")
print(f"Estimate of 'sandstone_noisy' (AcquisitionData) size: {sandstone_noisy.size*dtypeSize*MB:.1f} MB")

Estimate of 'initial' (ImageData) size: 53.9 MB
Estimate of 'sandstone_noisy' (AcquisitionData) size: 21.5 MB

The setup for LSQR Tikhonov does not use the Block operator, so the ImageData and AcquisitionData sizes are 53.9 MB and 21.5 MB, respectively.

We expect the Total Memory Usage to be the same as LSQR run previously, without the BlockOperator:

4 copies of ImageData (3 in set_up (Highlighted in turquoise), 1 in run/update (Highlighted in orange))
2 copies of AcquisitionData (Highlighted in turquoise)

Totalling to a usage of: (53.9 * 4) + (21.5 * 2) = ~258.6 MB. This is shown in the outputs below, showing LSQR Tikhonov will use less memory than the block versions of CGLS and LSQR.

[48]:

# Uncomment to see memory_profiler breakdown:
print(*lsqr_tik_mp.stdout.splitlines()[:49], sep="\n") # Print set_up
print(*lsqr_tik_mp.stdout.splitlines()[77:135], sep="\n") # Print run 1

Dataset folder already exists in ..
Filename: /home/efv97572/cil-showcase/CIL-User-Showcase/015_Memory_Profiling_LSQR_CGLS/Algorithms/LSQRMP.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    92    681.6 MiB    681.6 MiB           1       @profile
    93                                             def set_up(self, initial, operator, data):
    94                                                 r'''Initialisation of the algorithm
    95                                                 Parameters
    96                                                 ------------
    97                                                 operator : Operator
    98                                                     Linear operator for the inverse problem
    99                                                 initial : (optional) DataContainer in the domain of the operator, default is a DataContainer filled with zeros.
   100                                                     Initial guess
   101                                                 data : DataContainer in the range of the operator
   102                                                     Acquired data to reconstruct
   103
   104                                                 '''
   105
   106    681.6 MiB      0.0 MiB           1           log.info("%s setting up", self.__class__.__name__)
   107    735.3 MiB     53.7 MiB           1           self.x = initial.copy() #1 domain
   108    735.3 MiB      0.0 MiB           1           self.operator = operator
   109
   110                                                 # Initialise Golub-Kahan bidiagonalisation (GKB)
   111
   112                                                 #self.u = data - self.operator.direct(self.x)
   113    758.2 MiB     22.9 MiB           1           self.u = self.operator.direct(self.x) #1 range
   114    758.2 MiB      0.0 MiB           1           self.u.sapyb(-1, data, 1, out=self.u)
   115    758.3 MiB      0.0 MiB           1           self.beta = self.u.norm()
   116    758.4 MiB      0.1 MiB           1           self.u /= self.beta
   117
   118    813.2 MiB     54.9 MiB           1           self.v = self.operator.adjoint(self.u) #2 domain
   119    813.2 MiB      0.0 MiB           1           self.alpha = self.v.norm()
   120    813.2 MiB      0.0 MiB           1           self.v /= self.alpha
   121
   122    813.2 MiB      0.0 MiB           1           self.rhobar = self.alpha
   123    813.2 MiB      0.0 MiB           1           self.phibar = self.beta
   124    813.2 MiB      0.0 MiB           1           self.normr = self.beta
   125    813.2 MiB      0.0 MiB           1           self.regalphasq = self.regalpha**2
   126
   127    866.9 MiB     53.7 MiB           1           self.d = self.v.copy() #3 domain
   128    888.5 MiB     21.6 MiB           1           self.tmp_range = data.geometry.allocate(None) #2 range
   129    888.5 MiB      0.0 MiB           1           self.tmp_domain = self.x.geometry.allocate(None) #4 domain
   130
   131    888.5 MiB      0.0 MiB           1           self.res2 = 0
   132
   133    888.5 MiB      0.0 MiB           1           self.configured = True
   134    888.5 MiB      0.0 MiB           1           log.info("%s configured", self.__class__.__name__)


run 1
Filename: /home/efv97572/cil-showcase/CIL-User-Showcase/015_Memory_Profiling_LSQR_CGLS/Algorithms/LSQRMP.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   136    888.7 MiB    888.7 MiB           1       @profile
   137                                             def run(self, iterations=None, callbacks: Optional[List[Callback]]=None, verbose=1, **kwargs):
   138                                                 r"""run upto :code:`iterations` with callbacks/logging.
   139
   140                                                 For a demonstration of callbacks see https://github.com/TomographicImaging/CIL-Demos/blob/main/misc/callback_demonstration.ipynb
   141
   142                                                 Parameters
   143                                                 -----------
   144                                                 iterations: int, default is None
   145                                                     Number of iterations to run. If not set the algorithm will run until :code:`should_stop()` is reached
   146                                                 callbacks: list of callables, default is Defaults to :code:`[ProgressCallback(verbose)]`
   147                                                     List of callables which are passed the current Algorithm object each iteration. Defaults to :code:`[ProgressCallback(verbose)]`.
   148                                                 verbose: 0=quiet, 1=info, 2=debug
   149                                                     Passed to the default callback to determine the verbosity of the printed output.
   150                                                 """
   151
   152    888.7 MiB      0.0 MiB           1           if 'print_interval' in kwargs:
   153                                                     warnings.warn("use `TextProgressCallback(miniters)` instead of `run(print_interval)`",
   154                                                          DeprecationWarning, stacklevel=2)
   155    888.7 MiB      0.0 MiB           1           if callbacks is None:
   156    888.7 MiB      0.0 MiB           1               callbacks = [ProgressCallback(verbose=verbose)]
   157
   158                                                 # transform old-style callbacks into new
   159    888.7 MiB      0.0 MiB           1           callback = kwargs.get('callback', None)
   160
   161    888.7 MiB      0.0 MiB           1           if callback is not None:
   162                                                     callbacks.append(_OldCallback(callback, verbose=verbose))
   163    888.7 MiB      0.0 MiB           1           if hasattr(self, '__log_file'):
   164                                                     callbacks.append(LogfileCallback(self.__log_file, verbose=verbose))
   165
   166    888.7 MiB      0.0 MiB           1           if self.should_stop():
   167                                                     print("Stop criterion has been reached.")
   168    888.7 MiB      0.0 MiB           1           if iterations is None:
   169                                                     warnings.warn("`run()` missing `iterations`", DeprecationWarning, stacklevel=2)
   170                                                     iterations = self.max_iteration
   171
   172    888.7 MiB      0.0 MiB           1           if self.iteration == -1 and self.update_objective_interval>0:
   173    888.7 MiB      0.0 MiB           1               iterations+=1
   174
   175                                                 # call `__next__` upto `iterations` times or until `StopIteration` is raised
   176    888.7 MiB      0.0 MiB           1           self.max_iteration = self.iteration + iterations
   177
   178    888.7 MiB      0.0 MiB           2           iters = (count(self.iteration) if numpy.isposinf(self.max_iteration)
   179    888.7 MiB      0.0 MiB           1                    else range(self.iteration, self.max_iteration))
   180
   181    942.7 MiB     53.9 MiB           3           for _ in zip(iters, self):
   182    942.7 MiB      0.0 MiB           2               try:
   183    942.7 MiB      0.0 MiB           4                   for callback in callbacks:
   184    942.7 MiB      0.1 MiB           2                       callback(self)
   185                                                     except StopIteration:
   186                                                         break

We can see the increase of ~54 MB at the first iteration resulting from the line self.operator.adjoint(self.u, out=self.tmp_domain) which is also observed in the non-block version of LSQR:

[49]:

print_output_subset(lsqr_tik_prof, 'run 1')

run 1
Start of run | Memory Usage: 888.03 MB | line:

16 | Memory Usage: 888.03 MB | line: callbacks = [ProgressCallback(verbose=verbose)]

17 | Memory Usage: 888.03 MB | line: callback = kwargs.get('callback', None)

18 | Memory Usage: 888.03 MB | line: callbacks.append(_OldCallback(callback, verbose=verbose))

19 | Memory Usage: 888.03 MB | line: iterations = self.max_iteration

20 | Memory Usage: 888.03 MB | line: iterations+=1

21 | Memory Usage: 888.05 MB | line: self.max_iteration = self.iteration + iterations

22 | Memory Usage: 888.05 MB | line: iters = (count(self.iteration) if numpy.isposinf(self.max_iteration) else range(self.iteration, self.max_iteration))

23 | Memory Usage: 888.05 MB | line: update(self)

24 | Memory Usage: 888.18 MB | line: callback(self)

Start of update | Memory Usage: 888.18 MB | line:

Memory Usage Log (Time, Memory in MB): 16:00:11.557260, 941.89 MB

Memory Usage Log (Time, Memory in MB): 16:00:11.567545, 942.11 MB

Memory Usage Log (Time, Memory in MB): 16:00:11.577777, 942.11 MB

Memory Usage Log (Time, Memory in MB): 16:00:11.587952, 942.11 MB

Memory Usage Log (Time, Memory in MB): 16:00:11.598127, 942.11 MB

Memory Usage Log (Time, Memory in MB): 16:00:11.608298, 942.11 MB

Memory Usage Log (Time, Memory in MB): 16:00:11.618444, 942.11 MB

Memory Usage Log (Time, Memory in MB): 16:00:11.628593, 942.11 MB

Memory Usage Log (Time, Memory in MB): 16:00:11.638785, 942.11 MB

Memory Usage Log (Time, Memory in MB): 16:00:11.649060, 942.11 MB

Memory Usage Log (Time, Memory in MB): 16:00:11.659228, 942.11 MB

Memory Usage Log (Time, Memory in MB): 16:00:11.669382, 942.11 MB

25 | Memory Usage: 888.18 MB | line: self.operator.direct(self.v, out=self.tmp_range)

Memory Usage Log (Time, Memory in MB): 16:00:11.681125, 888.18 MB

26 | Memory Usage: 888.18 MB | line: self.tmp_range.sapyb(1.,  self.u,-self.alpha, out=self.u)

27 | Memory Usage: 888.17 MB | line: self.beta = self.u.norm()

Memory Usage Log (Time, Memory in MB): 16:00:11.691371, 888.17 MB

28 | Memory Usage: 888.17 MB | line: self.u /= self.beta

Memory Usage Log (Time, Memory in MB): 16:00:11.718584, 888.17 MB

Memory Usage Log (Time, Memory in MB): 16:00:11.728793, 888.17 MB

Memory Usage Log (Time, Memory in MB): 16:00:11.738955, 888.17 MB

Memory Usage Log (Time, Memory in MB): 16:00:11.749109, 888.17 MB

Memory Usage Log (Time, Memory in MB): 16:00:11.759262, 888.17 MB

Memory Usage Log (Time, Memory in MB): 16:00:11.769433, 888.17 MB

Memory Usage Log (Time, Memory in MB): 16:00:11.779634, 888.17 MB

Memory Usage Log (Time, Memory in MB): 16:00:11.789839, 888.17 MB

Memory Usage Log (Time, Memory in MB): 16:00:11.800049, 888.17 MB

Memory Usage Log (Time, Memory in MB): 16:00:11.810232, 888.17 MB

Memory Usage Log (Time, Memory in MB): 16:00:11.820399, 888.17 MB

Memory Usage Log (Time, Memory in MB): 16:00:11.830621, 888.17 MB

Memory Usage Log (Time, Memory in MB): 16:00:11.840947, 899.95 MB

Memory Usage Log (Time, Memory in MB): 16:00:11.851269, 923.95 MB

29 | Memory Usage: 942.11 MB | line: self.operator.adjoint(self.u, out=self.tmp_domain)

Memory Usage Log (Time, Memory in MB): 16:00:11.861507, 942.11 MB

30 | Memory Usage: 942.11 MB | line: self.v.sapyb(-self.beta, self.tmp_domain, 1., out=self.v)

Memory Usage Log (Time, Memory in MB): 16:00:11.871839, 942.11 MB

31 | Memory Usage: 942.11 MB | line: self.alpha = self.v.norm()

32 | Memory Usage: 942.11 MB | line: self.v /= self.alpha

33 | Memory Usage: 942.11 MB | line: self.phibar = c1 * self.phibar

34 | Memory Usage: 942.11 MB | line: self.phibar = s * self.phibar

Memory Usage Log (Time, Memory in MB): 16:00:11.882094, 942.11 MB

35 | Memory Usage: 942.10 MB | line: self.x.sapyb(1, self.d, phi/rho, out=self.x)

Memory Usage Log (Time, Memory in MB): 16:00:11.892373, 942.10 MB

36 | Memory Usage: 942.09 MB | line: self.d.sapyb(-theta/rho, self.v, 1, out=self.d)

End of update | Memory Usage: 942.09 MB | line: self.normr = math.sqrt(self.phibar ** 2 + self.res2)

23 | Memory Usage: 942.09 MB | line: update(self)

24 | Memory Usage: 942.09 MB | line: callback(self)

End of run | Memory Usage: 942.09 MB | line:

[50]:

lsqr_tik_lp = line_peak_process(lsqr_tik_prof, norm=True)
lsqr_tik_lp.iloc[:12].style.map(lambda val: 'background-color: turquoise' if val > 204 else '', subset='Total Memory Usage (MiB)')\
    .map(lambda val: 'background-color: orange' if 58 > val > 50 else '', subset='Total Memory Usage (MiB)')\
    .format(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)

[50]:

	Method	Initial Memory Usage (MiB)	Peak Memory	Final Memory Usage (MiB)	Total Memory Usage (MiB)	Peak Memory Increase (MiB)	Peak Line
0	init	0.00	207.10	207.10	207.10	207.10	self.set_up(initial=initial, operator=operator, data=data)
1	setup	0.00	207.10	207.10	207.10	207.10	self.tmp_range = data.geometry.allocate(None)
2	run 1	207.10	261.18	261.16	54.06	54.08	update(self)
3	update 1	207.25	261.18	261.16	53.91	53.93	self.operator.direct(self.v, out=self.tmp_range)
4	run 2	261.16	315.17	261.19	0.03	54.01	update(self)
5	update 2	261.16	315.17	261.19	0.03	54.01	self.operator.direct(self.v, out=self.tmp_range)
6	run 3	261.19	315.19	261.23	0.04	54.00	update(self)
7	update 3	261.19	315.19	261.23	0.04	54.00	self.operator.direct(self.v, out=self.tmp_range)
8	run 4	261.23	315.23	261.27	0.04	54.00	update(self)
9	update 4	261.23	315.23	261.27	0.04	54.00	self.operator.direct(self.v, out=self.tmp_range)
10	run 5	261.27	315.27	261.30	0.03	54.00	update(self)
11	update 5	261.27	315.27	261.30	0.03	54.00	self.operator.direct(self.v, out=self.tmp_range)

The graphs below show the memory usage trends across the method calls.

LSQR Tikhonov has the lowest setup and peak memory usages of the algorithms, as the input data was smaller in size.

[51]:

plot_mem(dfs=(cgls_block_lp, lsqr_block_lp, lsqr_tik_lp), algNms=("CGLS Block", "LSQR Block", "LSQR Tik"), linestyles=("-","-","-"))

../../_images/demos_memory_profiling_sandstone_100_0.png

Figure 7: This graph shows the memory usages before (initial), during (peak) and after (final) the ‘init’ and ‘run’ methods in CGLS Block (Blue), LSQR Block (Orange) and LSQR Tikhonov (Green). The first two points show the memory increases during ‘init’, which remain stable. In ‘run (iteration) 1, CGLS Block and LSQR Block show a peak, which then drops down slightly. The graph shows a net increase in memory usage at the end of 'run 1'. LSQR Tikhonov shows a net increase in memory usage inrun 1` too, without a peak. The spikes during each subsequent run show that the memory usage reaches a peak, and then drops back down, resulting in a net increase of ~0 MB. This is consistent across all ‘run’ calls (iterations). Amongst these three algorithms, LSQR Block uses the most memory, and LSQR Tikhonov uses the least.

Conclusion:#

In this notebook, we showed:

LSQR and CGLS produce very similar residuals, and perform similarly with respect to iteration time
The simple version of CGLS uses less memory than LSQR
LSQR and CGLS Block, and LSQR Tikhonov produce very similar residuals, and show similar iteration times
LSQR Tikhonov will use the same memory as regular LSQR
LSQR Tikhonov has the benefit of a regularisation term, which means that there is less likelihood of overfitting to the noise
The profiling data tells us where the memory usage comes from, and places it could be optimised
There may be some room to optimise peak memory usage, e.g. in the ProjectionOperator code

Table of Contents

Memory and Performance Profiling CIL Algorithms: CGLS vs. LSQR#

Loading The Sandstone Dataset#

Ground Truth - FBP Reconstruction of pre-processed sandstone#

Pre-processing:#

Obtain ground truth reconstruction:#

Generating Test Sinogram - Apply Noise:#

CGLS and LSQR without regularisation#

Convergence/reduction of residuals:#

Memory Profiling#

Validating Profiling Tools:#

Peak Memory Usages:#

Comparison With LSQR:#

Tikhonov regularisation using CGLS and LSQR - with or without a block#

Tikhonov regularisation#

Convergence/reduction of residuals:#

Memory costs for LSQR+Block, CGLS+Block and LSQR+Tikhonov#

Setting Up Memory Tracking:#

Memory Usage Results:#

Conclusion:#