[1]:
# -*- coding: utf-8 -*-
# Copyright 2025 United Kingdom Research and Innovation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Authored by: Mariam Demir (STFC-UKRI)
# Maike Meier (STFC-UKRI)
Memory and Performance Profiling CIL Algorithms: CGLS vs. LSQR#
Introduce the LSQR algorithm, and the LSQR Tikhonov variant
Compare the reconstructions obtained from LSQR to CGLS, and to a ground truth (FBP reconstruction)
Compare LSQR’s running times across iterations with CGLS
Benchmark and validate two memory profiling techniques, using CGLS as an example
Use the memory profiling techniques to investigate where the LSQR algorithm increases the memory
Compare the memory usage of LSQR to CGLS
Compare and investigate the memory usages of CGLS and LSQR with block operators, and LSQR Tikhonov
This notebook was tested and developed with the following package versions: | Package | Version | |———|———| | CIL (Core Imaging Library) | 24.2.0 | | NumPy | 1.26.4 | | Matplotlib | 3.10.1 | | memory-profiler | 0.61.0 | | astra-toolbox | 2.1.0 | | scipy | 1.15.2 | | ipp | 2021.12.1 | | jinja2 | 3.1.6 |
Create the environment using the command:
conda create --name cil -c conda-forge -c https://software.repos.intel.com/python/conda -c ccpi cil=24.2.0 ipp=2021.12 astra-toolbox=*=cuda* ipykernel ipywidgets scipy Jinja2
And install the following packages using pip:
pip install -U memory_profiler
[ ]:
import sys
sys.path.append("./Algorithms")
sys.path.append("./Scripts")
from LSQR import *
from LSQR_Tikhonov import *
from LSQRLP import *
from LSQRMP import *
from CGLSLP import *
from CGLSMP import *
# CIL core components needed
from cil.processors import Normaliser, TransmissionAbsorptionConverter, Padder, CentreOfRotationCorrector
from cil.framework import AcquisitionGeometry, AcquisitionData, BlockDataContainer
# CIL optimisation algorithms and linear operators
from cil.optimisation.algorithms import CGLS
from cil.optimisation.operators import BlockOperator, IdentityOperator
# CIL example synthetic test image
from cil.utilities import dataexample
# CIL display tools
from cil.utilities.display import show2D
# CIL FBP
from cil.recon import FBP
# Forward/backprojector from CIL ASTRA plugin
from cil.plugins.astra import ProjectionOperator
# Utility functions for this dataset & notebook
from Utilities import timed_iterations, mem_prof_process, line_prof_process, line_peak_process, plot_mem, print_output_subset
from IPython.display import display, HTML
# Third-party imports
import numpy as np
import scipy
import matplotlib.pyplot as plt
import os
MB = 1/(1024*1024)
dtypeSize = 4
Loading The Sandstone Dataset#
The dataset used in this notebook is a 2D Tomogram of sandstone, containing:
1500 projections
A single vertical slice
2560 horizontal pixels (which will later become padded to 3760 pixels)
From this information, we can calculate the estimated sizes of the AcquisitionData, and the ImageData (a.k.a Reconstruction Volume). These values will be important when looking at memory usage:
AcquisitionData size: (1500 * 3760 * dtypeSize) = 21.5 MB
ImageData size: (3760 * 3760 * dtypeSize) = 53.9 MB
We will download the data using the method dataexample.SANDSTONE.download_data(). Then, we pre-process and reconstruct the data using Filtered Back Projection to get a ground truth. Finally, we apply noise to the dataset sinogram, which will be used when running the algorithms.
[3]:
dataexample.SANDSTONE.download_data(data_dir='.', prompt=False)
Dataset folder already exists in .
[4]:
datapath = './sandstone'
filename = "slice_0270_data.mat"
all_data = scipy.io.loadmat(os.path.join(datapath,filename))
[5]:
sandstone = all_data['X_proj'].astype(np.float32)
sandstone.shape
[5]:
(2560, 1500)
[6]:
ag = AcquisitionGeometry.create_Parallel2D() \
.set_panel(num_pixels=(2560)) \
.set_angles(angles=np.linspace(0,180,1500,endpoint=False)) \
.set_labels(['horizontal','angle'])
print(ag.dimension_labels)
sandstone = AcquisitionData(sandstone, geometry=ag, deep_copy=False)
sandstone.reorder('astra')
show2D(sandstone)
(<AcquisitionDimension.HORIZONTAL: 'horizontal'>, <AcquisitionDimension.ANGLE: 'angle'>)
[6]:
<cil.utilities.display.show2D at 0x7f3d02c7dc40>
Ground Truth - FBP Reconstruction of pre-processed sandstone#
Pre-processing:#
[7]:
flats = all_data['X_flat'].astype(np.float32)
flats.shape
darks = all_data['X_dark'].astype(np.float32)
darks.shape
[7]:
(2560, 30)
[8]:
sandstone_norm = Normaliser(flat_field=flats.mean(axis=1),
dark_field=darks.mean(axis=1))(sandstone)
sandstone_norm = TransmissionAbsorptionConverter()(sandstone_norm)
[9]:
show2D([sandstone, sandstone_norm], title=['Sandstone', 'Sandstone Normalised'])
[9]:
<cil.utilities.display.show2D at 0x7f3ce5ae2d50>
[10]:
padsize = 600
sandstone_pad = Padder.edge(pad_width={'horizontal': padsize})(sandstone_norm)
[11]:
show2D(sandstone_pad)
[11]:
<cil.utilities.display.show2D at 0x7f3d08465310>
[12]:
sandstone_cor = CentreOfRotationCorrector.image_sharpness(backend='astra', search_range=100, tolerance=0.1)(sandstone_pad)
sandstone_cor.geometry.get_centre_of_rotation(distance_units='pixels')
[12]:
{'offset': (44.48700388605622, 'pixels'), 'angle': (0.0, 'radian')}
Obtain ground truth reconstruction:#
[13]:
ig = sandstone_norm.geometry.get_ImageGeometry()
recon = FBP(sandstone_cor, ig, backend='astra').run()
FBP recon
Input Data:
angle: 1500
horizontal: 3760
Reconstruction Volume:
horizontal_y: 2560
horizontal_x: 2560
Reconstruction Options:
Backend: astra
Filter: ram-lak
Filter cut-off frequency: 1.0
FFT order: 13
Filter_inplace: False
Split processing: 0
Reconstructing in 1 chunk(s):
[14]:
show2D(recon, title='Sandstone Reconstruction (Ground Truth)')
[14]:
<cil.utilities.display.show2D at 0x7f3cdfea6b10>
Generating Test Sinogram - Apply Noise:#
[15]:
# Incident intensity: lower counts will increase the noise
background_counts = 10000
# Convert the simulated absorption sinogram to transmission values using Lambert-Beer.
# Use sandstone_cor as mean for Poisson data generation.
# Convert back to absorption sinogram.
counts = background_counts * np.exp(-sandstone_cor.as_array())
noisy_counts = np.random.poisson(counts)
sand_noisy_data = -np.log(noisy_counts/background_counts)
# Create new AcquisitionData object with same geometry as sandstone_cor and fill with noisy data.
sandstone_noisy = sandstone_cor.geometry.allocate()
sandstone_noisy.fill(sand_noisy_data)
plots = [sandstone_cor, sandstone_noisy]
titles = ["sandstone sinogram", "sandstone sinogram noisy"]
show2D(plots, titles)
[15]:
<cil.utilities.display.show2D at 0x7f3cdffba030>
CGLS and LSQR without regularisation#
Before describing Tikhonov regularisation, we recall the problem solved by LSQR:
where,
\(A\) is the projection operator
\(b\) is the acquired data
\(u\) is the unknown image to be solved for, which is done iteratively by progressively minimising the objective function
In the solution provided by LSQR the low frequency components tend to converge faster than the high frequency components. This means we need to control the number of iterations carefully to select the optimal solution.
Set up the LSQR algorithm, including specifying its initial point to start from:
[16]:
sandstone_noisy.reorder('astra')
ig = sandstone_noisy.geometry.get_ImageGeometry()
ag = sandstone_noisy.geometry # ig and ag need to be same
device = "gpu"
A = ProjectionOperator(ig, ag, device)
initial = ig.allocate(0)
maxit = 100
itsAtATime = 1
N = round(maxit/itsAtATime)
xx = np.arange(0, maxit, itsAtATime)
The timed_iterations function below will be used to run the algorithms in increments, so we can store the residuals, errors and time taken per iteration:
Once set up, we can initialise and run the algorithms for the specified number of iterations:
[17]:
cgls_simple = CGLS(initial=initial, operator=A, data=sandstone_noisy)
times_cgls, rel_res_vec_cgls, rel_err_vec_cgls = timed_iterations(sandstone_noisy, A, sandstone_cor, recon, padsize, cgls_simple, N, itsAtATime)
lsqr_simple = LSQR(initial=initial, operator=A, data=sandstone_noisy)
times_lsqr, rel_res_vec_lsqr, rel_err_vec_lsqr = timed_iterations(sandstone_noisy, A, sandstone_cor, recon, padsize, lsqr_simple, N, itsAtATime)
Convergence/reduction of residuals:#
\[\begin{Vmatrix}A u - b\end{Vmatrix}^2_2\]Which describes how well the algorithm’s reconstruction matches the data. LSQR is an adaptation of the conjugate gradients method CGLS, so produces the same residuals. However, LSQR is generally more reliable in the case of numerical errors.
The (relative) error describes how well the reconstruction matches the ground truth, which is the FBP reconstruction we computed earlier.
[18]:
plt.subplot(1,2,1)
plt.semilogy(xx, rel_res_vec_cgls, 'r-', xx, rel_res_vec_lsqr, 'b.--')
plt.xlabel('Iteration')
plt.ylabel('Residual')
plt.gca().legend(('CGLS','LSQR'))
plt.subplot(1,2,2)
plt.semilogy(xx, rel_err_vec_cgls, 'r-', xx, rel_err_vec_lsqr, 'b.--')
plt.xlabel('Iteration')
plt.ylabel('Error')
plt.gca().legend(('CGLS','LSQR'))
plt.tight_layout()
Figure 1: (Left) This graph shows the progress of the residuals across 100 iterations. Both CGLS and LSQR converge to the same residuals, as the iterations for LSQR and CGLS are mathematically equivalent. The residuals themselves decrease (minimise) across the iterations, meaning the algorithms’ reconstructions match the data closely. (Right) This graph shows the relative error across 100 iterations. Both CGLS and LSQR display identical errors, again because the iterations are equivalent mathematically. This graph shows that the algorithms’ reconstructions match the FBP reconstruction (ground truth) most at iteration ~20.
We can also plot and compare the time taken/elapsed across each iteration:
[ ]:
plt.plot(xx, times_cgls, 'r-', xx, times_lsqr, 'b.--')
plt.xlabel('Iteration')
plt.ylabel('Time (s)')
plt.gca().legend(('CGLS','LSQR'))
<matplotlib.legend.Legend at 0x7f3d023669f0>
Figure 2: This graph shows the time elapsed at each iteration for CGLS and LSQR. For both algorithms, this relationship is linear. On this device, the elapsed times across 100 iterations are very similar.
Memory Profiling#
CGLS_MP uses the memory_profiler module to profile the __init__, set_up and run methods.CGLS_LP uses psutil to manually track the memory usage after each line in these methods. This version also tracks the memory usage in a separate thread, which shows the peak usages in each method. The same was done for LSQR.Below, we will run the memory-sandstone.py script using the modified files, and analyse the outputs:
Validating Profiling Tools:#
[20]:
%%capture cgls_mem_prof
!python3 Scripts/memory-sandstone.py cgls_mp
[21]:
%%capture cgls_line_prof
!python3 Scripts/memory-sandstone.py cgls_lp --track-peak
First, we can view the line-by-line memory increases from the memory_profiler output:
[22]:
print(*cgls_mem_prof.stdout.splitlines()[:117], sep="\n") # Print set_up, __init__ & run 1
Dataset folder already exists in ..
Filename: /home/efv97572/cil-showcase/CIL-User-Showcase/015_Memory_Profiling_LSQR_CGLS/Algorithms/CGLSMP.py
Line # Mem usage Increment Occurrences Line Contents
=============================================================
81 681.71 MiB 681.71 MiB 1 @profile(precision=2)
82 def set_up(self, initial, operator, data):
83 r'''Initialisation of the algorithm
84 Parameters
85 ------------
86 operator : Operator
87 Linear operator for the inverse problem
88 initial : (optional) DataContainer in the domain of the operator, default is a DataContainer filled with zeros.
89 Initial guess
90 data : DataContainer in the range of the operator
91 Acquired data to reconstruct
92
93 '''
94 681.71 MiB 0.00 MiB 1 log.info("%s setting up", self.__class__.__name__)
95 735.62 MiB 53.91 MiB 1 self.x = initial.copy()
96 735.62 MiB 0.00 MiB 1 self.operator = operator
97
98 758.32 MiB 22.71 MiB 1 self.r = data - self.operator.direct(self.x)
99 813.20 MiB 54.88 MiB 1 self.s = self.operator.adjoint(self.r)
100
101 867.13 MiB 53.93 MiB 1 self.p = self.s.copy()
102 888.44 MiB 21.31 MiB 1 self.q = self.operator.range_geometry().allocate()
103
104 888.66 MiB 0.21 MiB 1 self.norms0 = self.s.norm()
105 888.66 MiB 0.00 MiB 1 self.norms = self.s.norm()
106
107 888.66 MiB 0.00 MiB 1 self.gamma = self.norms0**2
108 888.72 MiB 0.06 MiB 1 self.normx = self.x.norm()
109
110 888.72 MiB 0.00 MiB 1 self.configured = True
111 888.72 MiB 0.00 MiB 1 log.info("%s configured", self.__class__.__name__)
Filename: /home/efv97572/cil-showcase/CIL-User-Showcase/015_Memory_Profiling_LSQR_CGLS/Algorithms/CGLSMP.py
Line # Mem usage Increment Occurrences Line Contents
=============================================================
63 681.71 MiB 681.71 MiB 1 @profile(precision=2)
64 def __init__(self, initial=None, operator=None, data=None, **kwargs):
65 '''initialisation of the algorithm
66 '''
67 #We are deprecating tolerance
68 681.71 MiB 0.00 MiB 1 self.tolerance=kwargs.pop("tolerance", None)
69 681.71 MiB 0.00 MiB 1 if self.tolerance is not None:
70 warnings.warn( stacklevel=2, category=DeprecationWarning, message="Passing tolerance directly to CGLS is being deprecated. Instead we recommend using the callback functionality: https://tomographicimaging.github.io/CIL/nightly/optimisation/#callbacks and in particular the CGLSEarlyStopping callback replicated the old behaviour")
71 else:
72 681.71 MiB 0.00 MiB 1 self.tolerance = 0
73
74 681.71 MiB 0.00 MiB 1 super(CGLS_MP, self).__init__(**kwargs)
75
76 681.71 MiB 0.00 MiB 1 if initial is None and operator is not None:
77 initial = operator.domain_geometry().allocate(0)
78 681.71 MiB 0.00 MiB 1 if initial is not None and operator is not None and data is not None:
79 888.72 MiB 207.01 MiB 1 self.set_up(initial=initial, operator=operator, data=data)
run 1
Filename: /home/efv97572/cil-showcase/CIL-User-Showcase/015_Memory_Profiling_LSQR_CGLS/Algorithms/CGLSMP.py
Line # Mem usage Increment Occurrences Line Contents
=============================================================
113 888.72 MiB 888.72 MiB 1 @profile(precision=2)
114 def run(self, iterations=None, callbacks: Optional[List[Callback]]=None, verbose=1, **kwargs):
115 r"""run upto :code:`iterations` with callbacks/logging.
116
117 For a demonstration of callbacks see https://github.com/TomographicImaging/CIL-Demos/blob/main/misc/callback_demonstration.ipynb
118
119 Parameters
120 -----------
121 iterations: int, default is None
122 Number of iterations to run. If not set the algorithm will run until :code:`should_stop()` is reached
123 callbacks: list of callables, default is Defaults to :code:`[ProgressCallback(verbose)]`
124 List of callables which are passed the current Algorithm object each iteration. Defaults to :code:`[ProgressCallback(verbose)]`.
125 verbose: 0=quiet, 1=info, 2=debug
126 Passed to the default callback to determine the verbosity of the printed output.
127 """
128
129 888.72 MiB 0.00 MiB 1 if 'print_interval' in kwargs:
130 warnings.warn("use `TextProgressCallback(miniters)` instead of `run(print_interval)`",
131 DeprecationWarning, stacklevel=2)
132 888.72 MiB 0.00 MiB 1 if callbacks is None:
133 888.72 MiB 0.00 MiB 1 callbacks = [ProgressCallback(verbose=verbose)]
134
135 # transform old-style callbacks into new
136 888.72 MiB 0.00 MiB 1 callback = kwargs.get('callback', None)
137
138 888.72 MiB 0.00 MiB 1 if callback is not None:
139 callbacks.append(_OldCallback(callback, verbose=verbose))
140 888.72 MiB 0.00 MiB 1 if hasattr(self, '__log_file'):
141 callbacks.append(LogfileCallback(self.__log_file, verbose=verbose))
142
143 888.72 MiB 0.00 MiB 1 if self.should_stop():
144 print("Stop criterion has been reached.")
145 888.72 MiB 0.00 MiB 1 if iterations is None:
146 warnings.warn("`run()` missing `iterations`", DeprecationWarning, stacklevel=2)
147 iterations = self.max_iteration
148
149 888.72 MiB 0.00 MiB 1 if self.iteration == -1 and self.update_objective_interval>0:
150 888.72 MiB 0.00 MiB 1 iterations+=1
151
152 # call `__next__` upto `iterations` times or until `StopIteration` is raised
153 888.72 MiB 0.00 MiB 1 self.max_iteration = self.iteration + iterations
154
155 888.72 MiB 0.00 MiB 2 iters = (count(self.iteration) if numpy.isposinf(self.max_iteration)
156 888.72 MiB 0.00 MiB 1 else range(self.iteration, self.max_iteration))
157
158 888.92 MiB 0.06 MiB 3 for _ in zip(iters, self):
159 888.92 MiB 0.00 MiB 2 try:
160 888.92 MiB 0.00 MiB 4 for callback in callbacks:
161 888.92 MiB 0.14 MiB 2 callback(self)
162 except StopIteration:
163 break
[23]:
# ImageData and AcquisitionData Sizes
print(f"Estimate of 'initial' (ImageData) size: {initial.size*dtypeSize*MB:.1f} MB")
print(f"Estimate of 'sandstone_noisy' (AcquisitionData) size: {sandstone_noisy.size*dtypeSize*MB:.1f} MB")
Estimate of 'initial' (ImageData) size: 53.9 MB
Estimate of 'sandstone_noisy' (AcquisitionData) size: 21.5 MB
In the set_up method, we can see:
3 increases of ~53.9 MB - these correspond to the size of the ImageData space
2 increases of ~21.5 MB - corresponding to the AcquisitionData space
Totalling to an increase of around ~205 MB
__init__ method, the same increase is seen as a result of the call to set_up (Highlighted in turquoise).update method.For easier visualisation of the Initial and Final usages per method, we process this output and display it as a dataframe. The first usage reading was ~653 MB, which is the memory allocated when loading and processing the dataset. To show the memory usage of the CGLS algorithm alone, this dataframe has been normalised by subtracting the prior memory usage from the results:
[24]:
cgls_mp = mem_prof_process(cgls_mem_prof, norm=True)
cgls_mp.style.map(lambda val: 'background-color: turquoise' if val > 200 else '', subset='Total Memory Usage (MiB)')\
.format(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)
[24]:
| Method | Method Call | Initial Memory Usage (MiB) | Final Memory Usage (MiB) | Total Memory Usage (MiB) | |
|---|---|---|---|---|---|
| 0 | __init__ | def __init__(self, initial=None, operator=None, data=None, **kwargs): | 0.00 | 207.01 | 207.01 |
| 1 | set_up | def set_up(self, initial, operator, data): | 0.00 | 207.01 | 207.01 |
| 2 | run 1 | def run(self, iterations=None, callbacks: Optional[List[Callback]]=None, verbose=1, **kwargs): | 207.01 | 207.21 | 0.20 |
| 3 | run 2 | def run(self, iterations=None, callbacks: Optional[List[Callback]]=None, verbose=1, **kwargs): | 207.21 | 207.30 | 0.09 |
| 4 | run 3 | def run(self, iterations=None, callbacks: Optional[List[Callback]]=None, verbose=1, **kwargs): | 207.30 | 207.33 | 0.03 |
| 5 | run 4 | def run(self, iterations=None, callbacks: Optional[List[Callback]]=None, verbose=1, **kwargs): | 207.33 | 207.36 | 0.03 |
| 6 | run 5 | def run(self, iterations=None, callbacks: Optional[List[Callback]]=None, verbose=1, **kwargs): | 207.36 | 207.39 | 0.03 |
| 7 | run 6 | def run(self, iterations=None, callbacks: Optional[List[Callback]]=None, verbose=1, **kwargs): | 207.39 | 207.41 | 0.02 |
| 8 | run 7 | def run(self, iterations=None, callbacks: Optional[List[Callback]]=None, verbose=1, **kwargs): | 207.41 | 207.49 | 0.08 |
| 9 | run 8 | def run(self, iterations=None, callbacks: Optional[List[Callback]]=None, verbose=1, **kwargs): | 207.49 | 207.53 | 0.04 |
| 10 | run 9 | def run(self, iterations=None, callbacks: Optional[List[Callback]]=None, verbose=1, **kwargs): | 207.53 | 207.56 | 0.03 |
| 11 | run 10 | def run(self, iterations=None, callbacks: Optional[List[Callback]]=None, verbose=1, **kwargs): | 207.56 | 207.59 | 0.03 |
Let’s compare these values to the line-by-line profiling results:
[25]:
cgls_lp = line_prof_process(cgls_line_prof, norm=True)
cgls_lp.style.map(lambda val: 'background-color: turquoise' if val > 200 else '', subset='Total Memory Usage (MiB)')\
.format(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)
[25]:
| Method | Initial Memory Usage (MiB) | Final Memory Usage (MiB) | Total Memory Usage (MiB) | |
|---|---|---|---|---|
| 0 | init | 0.00 | 206.93 | 206.93 |
| 1 | setup | 0.00 | 206.93 | 206.93 |
| 2 | run 1 | 206.93 | 207.19 | 0.26 |
| 3 | run 2 | 207.19 | 207.28 | 0.09 |
| 4 | run 3 | 207.28 | 207.31 | 0.03 |
| 5 | run 4 | 207.31 | 207.34 | 0.03 |
| 6 | run 5 | 207.34 | 207.37 | 0.03 |
| 7 | run 6 | 207.37 | 207.40 | 0.03 |
| 8 | run 7 | 207.40 | 207.43 | 0.03 |
| 9 | run 8 | 207.43 | 207.49 | 0.06 |
| 10 | run 9 | 207.49 | 207.49 | 0.00 |
| 11 | run 10 | 207.49 | 207.51 | 0.02 |
As we can see, the major change in memory corresponding to set_up and __init__ are the same, with minor variations in the memory usages of the run calls.
Peak Memory Usages:#
We have validated the memory usage tracking using two techniques. For the remainder of the notebook, we will be using output snippets from both to illustrate memory changes.
With the line-by-line profiling technique, we can also take a look at the peak memory usages that were measured on a separate thread, and the corresponding calls:
[26]:
cgls_lp = line_peak_process(cgls_line_prof, norm=True)
cgls_lp.style.apply(
lambda x: ['background-color: pink' if (58 > val > 50) and (i % 2 == 1) else '' for i, val in enumerate(x)],
subset='Peak Memory Increase (MiB)')\
.format(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)
[26]:
| Method | Initial Memory Usage (MiB) | Peak Memory | Final Memory Usage (MiB) | Total Memory Usage (MiB) | Peak Memory Increase (MiB) | Peak Line | |
|---|---|---|---|---|---|---|---|
| 0 | init | 0.00 | 206.93 | 206.93 | 206.93 | 206.93 | self.set_up(initial=initial, operator=operator, data=data) |
| 1 | setup | 0.00 | 206.93 | 206.93 | 206.93 | 206.93 | self.norms0 = self.s.norm() |
| 2 | run 1 | 206.93 | 261.08 | 207.19 | 0.26 | 54.15 | update(self) |
| 3 | update 1 | 207.13 | 261.08 | 207.19 | 0.06 | 53.95 | self.operator.direct(self.p, out=self.q) |
| 4 | run 2 | 207.19 | 261.24 | 207.28 | 0.09 | 54.05 | update(self) |
| 5 | update 2 | 207.19 | 261.24 | 207.28 | 0.09 | 54.05 | self.operator.direct(self.p, out=self.q) |
| 6 | run 3 | 207.28 | 261.27 | 207.31 | 0.03 | 53.99 | update(self) |
| 7 | update 3 | 207.28 | 261.27 | 207.31 | 0.03 | 53.99 | self.operator.direct(self.p, out=self.q) |
| 8 | run 4 | 207.31 | 261.30 | 207.34 | 0.03 | 53.99 | update(self) |
| 9 | update 4 | 207.31 | 261.30 | 207.34 | 0.03 | 53.99 | self.operator.direct(self.p, out=self.q) |
| 10 | run 5 | 207.34 | 261.33 | 207.37 | 0.03 | 53.99 | update(self) |
| 11 | update 5 | 207.34 | 261.33 | 207.37 | 0.03 | 53.99 | self.operator.direct(self.p, out=self.q) |
| 12 | run 6 | 207.37 | 261.36 | 207.40 | 0.03 | 53.99 | update(self) |
| 13 | update 6 | 207.37 | 261.36 | 207.40 | 0.03 | 53.99 | self.operator.direct(self.p, out=self.q) |
| 14 | run 7 | 207.40 | 261.38 | 207.43 | 0.03 | 53.98 | update(self) |
| 15 | update 7 | 207.40 | 261.38 | 207.43 | 0.03 | 53.98 | self.operator.direct(self.p, out=self.q) |
| 16 | run 8 | 207.43 | 261.41 | 207.49 | 0.06 | 53.98 | update(self) |
| 17 | update 8 | 207.43 | 261.41 | 207.49 | 0.06 | 53.98 | self.operator.direct(self.p, out=self.q) |
| 18 | run 9 | 207.49 | 261.44 | 207.49 | 0.00 | 53.95 | update(self) |
| 19 | update 9 | 207.49 | 261.44 | 207.49 | 0.00 | 53.95 | self.operator.direct(self.p, out=self.q) |
| 20 | run 10 | 207.49 | 261.47 | 207.51 | 0.02 | 53.98 | update(self) |
| 21 | update 10 | 207.49 | 261.47 | 207.51 | 0.02 | 53.98 | self.operator.direct(self.p, out=self.q) |
This table shows that CGLS’ set_up and __init__ calls do not have a spike in memory that exceeds the final usage. However, during the runs, the update(self) call produces peak increases around the size of the ImageData space. This memory usage is transient, and is released by the time the call is finished.
Looking at the Method=update Peak lines (Highlighted in pink), we can see the biggest peak is attributed to the line self.operator.direct(self.p, out=self.q), which creates a copy of the ImageData space during its execution, and releases the memory after it has run.
Below, we visualise this by plotting the total memory increases and the peak memory increases as sequential line graphs:
[27]:
plot_mem((cgls_lp,), ("CGLS",))
Figure 3: This graph shows the memory usages before (initial), during (peak) and after (final) the ‘init’ and ‘run’ methods in CGLS. The first two points show the ~205 MB memory increase during ‘init’, which remains stable. The spikes during each subsequent run show that the memory usage reaches a peak, and then drops back down, resulting in a net increase of ~0 MB. This is consistent across all ‘run’ calls (iterations).
Comparison With LSQR:#
Let’s compare this to LSQR’s memory usage:
[28]:
%%capture lsqr_line_prof
!python3 Scripts/memory-sandstone.py lsqr_lp --track-peak
[29]:
lsqr_lp = line_peak_process(lsqr_line_prof, norm=True)
lsqr_lp.style.apply(
lambda x: ['background-color: pink' if (58 > val > 50) and (i % 2 == 1) else '' for i, val in enumerate(x)],
subset='Peak Memory Increase (MiB)')\
.map(lambda val: 'background-color: turquoise' if val > 200 else '', subset='Total Memory Usage (MiB)')\
.map(lambda val: 'background-color: orange' if 58 > val > 50 else '', subset='Total Memory Usage (MiB)')\
.format(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)
[29]:
| Method | Initial Memory Usage (MiB) | Peak Memory | Final Memory Usage (MiB) | Total Memory Usage (MiB) | Peak Memory Increase (MiB) | Peak Line | |
|---|---|---|---|---|---|---|---|
| 0 | init | 0.00 | 207.02 | 207.02 | 207.02 | 207.02 | self.set_up(initial=initial, operator=operator, data=data) |
| 1 | setup | 0.00 | 207.02 | 207.02 | 207.02 | 207.02 | self.tmp_range = data.geometry.allocate(None) |
| 2 | run 1 | 207.02 | 261.20 | 261.17 | 54.15 | 54.18 | update(self) |
| 3 | update 1 | 207.25 | 261.20 | 261.17 | 53.92 | 53.95 | self.operator.direct(self.v, out=self.tmp_range) |
| 4 | run 2 | 261.17 | 315.19 | 261.22 | 0.05 | 54.02 | update(self) |
| 5 | update 2 | 261.17 | 315.19 | 261.22 | 0.05 | 54.02 | self.operator.direct(self.v, out=self.tmp_range) |
| 6 | run 3 | 261.22 | 315.22 | 261.25 | 0.03 | 54.00 | update(self) |
| 7 | update 3 | 261.22 | 315.22 | 261.25 | 0.03 | 54.00 | self.operator.direct(self.v, out=self.tmp_range) |
| 8 | run 4 | 261.25 | 315.25 | 261.37 | 0.12 | 54.00 | update(self) |
| 9 | update 4 | 261.25 | 315.25 | 261.37 | 0.12 | 54.00 | self.operator.direct(self.v, out=self.tmp_range) |
| 10 | run 5 | 261.37 | 315.31 | 261.36 | -0.01 | 53.94 | update(self) |
| 11 | update 5 | 261.37 | 315.31 | 261.36 | -0.01 | 53.94 | self.operator.direct(self.v, out=self.tmp_range) |
| 12 | run 6 | 261.36 | 315.35 | 261.42 | 0.06 | 53.99 | update(self) |
| 13 | update 6 | 261.36 | 315.35 | 261.42 | 0.06 | 53.99 | self.operator.direct(self.v, out=self.tmp_range) |
| 14 | run 7 | 261.42 | 315.38 | 261.47 | 0.05 | 53.96 | update(self) |
| 15 | update 7 | 261.42 | 315.38 | 261.47 | 0.05 | 53.96 | self.operator.direct(self.v, out=self.tmp_range) |
| 16 | run 8 | 261.47 | 315.42 | 261.53 | 0.06 | 53.95 | update(self) |
| 17 | update 8 | 261.47 | 315.42 | 261.52 | 0.05 | 53.95 | self.operator.direct(self.v, out=self.tmp_range) |
| 18 | run 9 | 261.53 | 315.47 | 261.54 | 0.01 | 53.94 | update(self) |
| 19 | update 9 | 261.53 | 315.47 | 261.54 | 0.01 | 53.94 | self.operator.direct(self.v, out=self.tmp_range) |
| 20 | run 10 | 261.54 | 315.50 | 261.54 | 0.00 | 53.96 | update(self) |
| 21 | update 10 | 261.54 | 315.50 | 261.54 | 0.00 | 53.96 | self.operator.direct(self.v, out=self.tmp_range) |
run calls, again due to the ImageData copy made during update’s self.operator.direct(self.v, out=self.tmp_range) call (Highlighted in pink), which is deallocated after this line is complete.Notably, there is a ~54 MB increase in total memory usage in the first run/update calls (Highlighted in orange). This is because in set_up, self.tmp_domain is initialised with None, which does not allocate memory. But when calling the first update, the line self.operator.adjoint(self.u, out=self.tmp_domain) replaces the Nones with values, which results in memory allocation.
In total with the initialisation cost, LSQR makes:
4 increases of ~53.9 MB - corresponding to the size of the ImageData space
2 increases of ~21.5 MB - corresponding to the AcquisitionData space
The raw line-by-line output below shows the self.operator.direct peak and subsequent deallocation, and the self.operator.adjoint increase:
[30]:
# Uncomment to view line-by-line & peak usage output:
print_output_subset(lsqr_line_prof, 'run 1') # Truncated to show run 1 & update 1
run 1
Start of run | Memory Usage: 887.12 MB | line:
16 | Memory Usage: 887.12 MB | line: callbacks = [ProgressCallback(verbose=verbose)]
17 | Memory Usage: 887.12 MB | line: callback = kwargs.get('callback', None)
18 | Memory Usage: 887.12 MB | line: callbacks.append(_OldCallback(callback, verbose=verbose))
19 | Memory Usage: 887.12 MB | line: iterations = self.max_iteration
20 | Memory Usage: 887.12 MB | line: iterations+=1
21 | Memory Usage: 887.12 MB | line: self.max_iteration = self.iteration + iterations
22 | Memory Usage: 887.23 MB | line: iters = (count(self.iteration) if numpy.isposinf(self.max_iteration) else range(self.iteration, self.max_iteration))
23 | Memory Usage: 887.23 MB | line: update(self)
24 | Memory Usage: 887.35 MB | line: callback(self)
Start of update | Memory Usage: 887.35 MB | line:
Memory Usage Log (Time, Memory in MB): 15:55:14.623491, 941.25 MB
Memory Usage Log (Time, Memory in MB): 15:55:14.633770, 941.30 MB
Memory Usage Log (Time, Memory in MB): 15:55:14.643983, 941.30 MB
Memory Usage Log (Time, Memory in MB): 15:55:14.654148, 941.30 MB
Memory Usage Log (Time, Memory in MB): 15:55:14.664296, 941.30 MB
Memory Usage Log (Time, Memory in MB): 15:55:14.674446, 941.30 MB
Memory Usage Log (Time, Memory in MB): 15:55:14.684598, 941.30 MB
Memory Usage Log (Time, Memory in MB): 15:55:14.694766, 941.30 MB
Memory Usage Log (Time, Memory in MB): 15:55:14.704964, 941.30 MB
Memory Usage Log (Time, Memory in MB): 15:55:14.715156, 941.30 MB
Memory Usage Log (Time, Memory in MB): 15:55:14.725320, 941.30 MB
Memory Usage Log (Time, Memory in MB): 15:55:14.735482, 941.30 MB
25 | Memory Usage: 887.37 MB | line: self.operator.direct(self.v, out=self.tmp_range)
Memory Usage Log (Time, Memory in MB): 15:55:14.747662, 887.37 MB
26 | Memory Usage: 887.36 MB | line: self.tmp_range.sapyb(1., self.u,-self.alpha, out=self.u)
27 | Memory Usage: 887.36 MB | line: self.beta = self.u.norm()
28 | Memory Usage: 887.36 MB | line: self.u /= self.beta
Memory Usage Log (Time, Memory in MB): 15:55:14.758079, 887.36 MB
Memory Usage Log (Time, Memory in MB): 15:55:14.783936, 887.36 MB
Memory Usage Log (Time, Memory in MB): 15:55:14.794185, 887.36 MB
Memory Usage Log (Time, Memory in MB): 15:55:14.804395, 887.36 MB
Memory Usage Log (Time, Memory in MB): 15:55:14.814572, 887.36 MB
Memory Usage Log (Time, Memory in MB): 15:55:14.824746, 887.36 MB
Memory Usage Log (Time, Memory in MB): 15:55:14.834923, 887.36 MB
Memory Usage Log (Time, Memory in MB): 15:55:14.845119, 887.36 MB
Memory Usage Log (Time, Memory in MB): 15:55:14.855290, 887.36 MB
Memory Usage Log (Time, Memory in MB): 15:55:14.865475, 887.36 MB
Memory Usage Log (Time, Memory in MB): 15:55:14.875650, 887.36 MB
Memory Usage Log (Time, Memory in MB): 15:55:14.885855, 887.36 MB
Memory Usage Log (Time, Memory in MB): 15:55:14.896072, 887.36 MB
Memory Usage Log (Time, Memory in MB): 15:55:14.906363, 899.14 MB
Memory Usage Log (Time, Memory in MB): 15:55:14.916694, 923.14 MB
29 | Memory Usage: 941.29 MB | line: self.operator.adjoint(self.u, out=self.tmp_domain)
Memory Usage Log (Time, Memory in MB): 15:55:14.928414, 941.29 MB
30 | Memory Usage: 941.29 MB | line: self.v.sapyb(-self.beta, self.tmp_domain, 1., out=self.v)
Memory Usage Log (Time, Memory in MB): 15:55:14.938845, 941.29 MB
31 | Memory Usage: 941.29 MB | line: self.alpha = self.v.norm()
Memory Usage Log (Time, Memory in MB): 15:55:14.949036, 941.29 MB
32 | Memory Usage: 941.29 MB | line: self.v /= self.alpha
33 | Memory Usage: 941.29 MB | line: rhobar1 = self.rhobar, psi = 0
34 | Memory Usage: 941.29 MB | line: self.phibar = s * self.phibar
Memory Usage Log (Time, Memory in MB): 15:55:14.959302, 941.28 MB
35 | Memory Usage: 941.28 MB | line: self.x.sapyb(1, self.d, phi/rho, out=self.x)
Memory Usage Log (Time, Memory in MB): 15:55:14.969583, 941.27 MB
36 | Memory Usage: 941.27 MB | line: self.d.sapyb(-theta/rho, self.v, 1, out=self.d)
End of update | Memory Usage: 941.27 MB | line: self.normr = math.sqrt(self.phibar ** 2 + self.res2)
23 | Memory Usage: 941.27 MB | line: update(self)
24 | Memory Usage: 941.27 MB | line: callback(self)
End of run | Memory Usage: 941.27 MB | line:
In contrast to the second run/update calls, we still see a self.operator.direct peak, but no overall usage change from self.operator.adjoint:
[31]:
print_output_subset(lsqr_line_prof, 'run 2') # Truncated to show run 2 & update 2
run 2
Start of run | Memory Usage: 941.27 MB | line:
16 | Memory Usage: 941.27 MB | line: callbacks = [ProgressCallback(verbose=verbose)]
17 | Memory Usage: 941.27 MB | line: callback = kwargs.get('callback', None)
18 | Memory Usage: 941.27 MB | line: callbacks.append(_OldCallback(callback, verbose=verbose))
19 | Memory Usage: 941.27 MB | line: iterations = self.max_iteration
20 | Memory Usage: 941.27 MB | line: iterations+=1
21 | Memory Usage: 941.27 MB | line: self.max_iteration = self.iteration + iterations
22 | Memory Usage: 941.27 MB | line: iters = (count(self.iteration) if numpy.isposinf(self.max_iteration) else range(self.iteration, self.max_iteration))
Start of update | Memory Usage: 941.27 MB | line:
Memory Usage Log (Time, Memory in MB): 15:55:15.033527, 995.12 MB
Memory Usage Log (Time, Memory in MB): 15:55:15.043790, 995.29 MB
Memory Usage Log (Time, Memory in MB): 15:55:15.053982, 995.29 MB
Memory Usage Log (Time, Memory in MB): 15:55:15.064149, 995.29 MB
Memory Usage Log (Time, Memory in MB): 15:55:15.074310, 995.29 MB
Memory Usage Log (Time, Memory in MB): 15:55:15.084502, 995.29 MB
Memory Usage Log (Time, Memory in MB): 15:55:15.094703, 995.29 MB
Memory Usage Log (Time, Memory in MB): 15:55:15.104903, 995.29 MB
Memory Usage Log (Time, Memory in MB): 15:55:15.115103, 995.29 MB
Memory Usage Log (Time, Memory in MB): 15:55:15.125278, 995.29 MB
Memory Usage Log (Time, Memory in MB): 15:55:15.135451, 995.29 MB
Memory Usage Log (Time, Memory in MB): 15:55:15.145736, 995.29 MB
25 | Memory Usage: 941.35 MB | line: self.operator.direct(self.v, out=self.tmp_range)
Memory Usage Log (Time, Memory in MB): 15:55:15.155986, 941.35 MB
26 | Memory Usage: 941.34 MB | line: self.tmp_range.sapyb(1., self.u,-self.alpha, out=self.u)
27 | Memory Usage: 941.34 MB | line: self.beta = self.u.norm()
28 | Memory Usage: 941.34 MB | line: self.u /= self.beta
Memory Usage Log (Time, Memory in MB): 15:55:15.190489, 941.34 MB
Memory Usage Log (Time, Memory in MB): 15:55:15.200770, 941.34 MB
Memory Usage Log (Time, Memory in MB): 15:55:15.210964, 941.34 MB
Memory Usage Log (Time, Memory in MB): 15:55:15.221151, 941.34 MB
Memory Usage Log (Time, Memory in MB): 15:55:15.231526, 941.34 MB
Memory Usage Log (Time, Memory in MB): 15:55:15.241818, 941.34 MB
Memory Usage Log (Time, Memory in MB): 15:55:15.252074, 941.34 MB
Memory Usage Log (Time, Memory in MB): 15:55:15.262294, 941.34 MB
Memory Usage Log (Time, Memory in MB): 15:55:15.272539, 941.34 MB
Memory Usage Log (Time, Memory in MB): 15:55:15.282943, 941.34 MB
Memory Usage Log (Time, Memory in MB): 15:55:15.293293, 941.34 MB
Memory Usage Log (Time, Memory in MB): 15:55:15.303709, 941.34 MB
Memory Usage Log (Time, Memory in MB): 15:55:15.313990, 941.34 MB
29 | Memory Usage: 941.34 MB | line: self.operator.adjoint(self.u, out=self.tmp_domain)
Memory Usage Log (Time, Memory in MB): 15:55:15.324311, 941.34 MB
30 | Memory Usage: 941.34 MB | line: self.v.sapyb(-self.beta, self.tmp_domain, 1., out=self.v)
31 | Memory Usage: 941.34 MB | line: self.alpha = self.v.norm()
Memory Usage Log (Time, Memory in MB): 15:55:15.334620, 941.34 MB
32 | Memory Usage: 941.34 MB | line: self.v /= self.alpha
33 | Memory Usage: 941.34 MB | line: rhobar1 = self.rhobar, psi = 0
34 | Memory Usage: 941.34 MB | line: self.phibar = s * self.phibar
Memory Usage Log (Time, Memory in MB): 15:55:15.344957, 941.33 MB
35 | Memory Usage: 941.33 MB | line: self.x.sapyb(1, self.d, phi/rho, out=self.x)
Memory Usage Log (Time, Memory in MB): 15:55:15.355259, 941.32 MB
36 | Memory Usage: 941.32 MB | line: self.d.sapyb(-theta/rho, self.v, 1, out=self.d)
End of update | Memory Usage: 941.32 MB | line: self.normr = math.sqrt(self.phibar ** 2 + self.res2)
23 | Memory Usage: 941.32 MB | line: update(self)
24 | Memory Usage: 941.32 MB | line: callback(self)
End of run | Memory Usage: 941.32 MB | line:
Below is a graph illustrating the overall trends across the methods, showing the comparison for CGLS and LSQR:
[32]:
plot_mem(dfs=(cgls_lp, lsqr_lp), algNms=("CGLS", "LSQR"))
Figure 4: This graph shows the memory usages before (initial), during (peak) and after (final) the ‘init’ and ‘run’ methods in CGLS (Blue) and LSQR (Orange). The first two points show the memory increase during ‘init’, which remains stable. Unlike CGLS, LSQR’s first ‘run’ (iteration) shows another memory increase, which also stays consistent and shows that LSQR uses more memory than CGLS. The spikes during each subsequent run show that the memory usage reaches a peak, and then drops back down, resulting in a net increase of ~0 MB. This is consistent across all ‘run’ calls (iterations).
LSQR creates 4 copies of the ImageData space, and 2 copies of the AcquisitionData, totalling to ~259 MB. This can be seen clearly in the breakdown below:
[33]:
%%capture lsqr_mem_prof
!python3 Scripts/memory-sandstone.py lsqr_mp # Uncomment to run memory_profiler breakdown
[34]:
# Uncomment to see memory_profiler breakdown:
print(*lsqr_mem_prof.stdout.splitlines()[:42], sep="\n") # Print set_up
print(*lsqr_mem_prof.stdout.splitlines()[77:135], sep="\n") # Print run 1
Dataset folder already exists in ..
Filename: /home/efv97572/cil-showcase/CIL-User-Showcase/015_Memory_Profiling_LSQR_CGLS/Algorithms/LSQRMP.py
Line # Mem usage Increment Occurrences Line Contents
=============================================================
92 680.4 MiB 680.4 MiB 1 @profile
93 def set_up(self, initial, operator, data):
94 r'''Initialisation of the algorithm
95 Parameters
96 ------------
97 operator : Operator
98 Linear operator for the inverse problem
99 initial : (optional) DataContainer in the domain of the operator, default is a DataContainer filled with zeros.
100 Initial guess
101 data : DataContainer in the range of the operator
102 Acquired data to reconstruct
103
104 '''
105
106 680.4 MiB 0.0 MiB 1 log.info("%s setting up", self.__class__.__name__)
107 734.3 MiB 53.9 MiB 1 self.x = initial.copy() #1 domain
108 734.3 MiB 0.0 MiB 1 self.operator = operator
109
110 # Initialise Golub-Kahan bidiagonalisation (GKB)
111
112 #self.u = data - self.operator.direct(self.x)
113 756.8 MiB 22.5 MiB 1 self.u = self.operator.direct(self.x) #1 range
114 757.1 MiB 0.3 MiB 1 self.u.sapyb(-1, data, 1, out=self.u)
115 757.1 MiB 0.0 MiB 1 self.beta = self.u.norm()
116 757.1 MiB 0.0 MiB 1 self.u /= self.beta
117
118 812.1 MiB 55.0 MiB 1 self.v = self.operator.adjoint(self.u) #2 domain
119 812.1 MiB 0.0 MiB 1 self.alpha = self.v.norm()
120 812.1 MiB 0.0 MiB 1 self.v /= self.alpha
121
122 812.1 MiB 0.0 MiB 1 self.rhobar = self.alpha
123 812.1 MiB 0.0 MiB 1 self.phibar = self.beta
124 812.1 MiB 0.0 MiB 1 self.normr = self.beta
125 812.1 MiB 0.0 MiB 1 self.regalphasq = self.regalpha**2
126
127 865.8 MiB 53.7 MiB 1 self.d = self.v.copy() #3 domain
128 887.6 MiB 21.8 MiB 1 self.tmp_range = data.geometry.allocate(None) #2 range
run 1
Filename: /home/efv97572/cil-showcase/CIL-User-Showcase/015_Memory_Profiling_LSQR_CGLS/Algorithms/LSQRMP.py
Line # Mem usage Increment Occurrences Line Contents
=============================================================
136 887.6 MiB 887.6 MiB 1 @profile
137 def run(self, iterations=None, callbacks: Optional[List[Callback]]=None, verbose=1, **kwargs):
138 r"""run upto :code:`iterations` with callbacks/logging.
139
140 For a demonstration of callbacks see https://github.com/TomographicImaging/CIL-Demos/blob/main/misc/callback_demonstration.ipynb
141
142 Parameters
143 -----------
144 iterations: int, default is None
145 Number of iterations to run. If not set the algorithm will run until :code:`should_stop()` is reached
146 callbacks: list of callables, default is Defaults to :code:`[ProgressCallback(verbose)]`
147 List of callables which are passed the current Algorithm object each iteration. Defaults to :code:`[ProgressCallback(verbose)]`.
148 verbose: 0=quiet, 1=info, 2=debug
149 Passed to the default callback to determine the verbosity of the printed output.
150 """
151
152 887.6 MiB 0.0 MiB 1 if 'print_interval' in kwargs:
153 warnings.warn("use `TextProgressCallback(miniters)` instead of `run(print_interval)`",
154 DeprecationWarning, stacklevel=2)
155 887.6 MiB 0.0 MiB 1 if callbacks is None:
156 887.6 MiB 0.0 MiB 1 callbacks = [ProgressCallback(verbose=verbose)]
157
158 # transform old-style callbacks into new
159 887.6 MiB 0.0 MiB 1 callback = kwargs.get('callback', None)
160
161 887.6 MiB 0.0 MiB 1 if callback is not None:
162 callbacks.append(_OldCallback(callback, verbose=verbose))
163 887.6 MiB 0.0 MiB 1 if hasattr(self, '__log_file'):
164 callbacks.append(LogfileCallback(self.__log_file, verbose=verbose))
165
166 887.6 MiB 0.0 MiB 1 if self.should_stop():
167 print("Stop criterion has been reached.")
168 887.6 MiB 0.0 MiB 1 if iterations is None:
169 warnings.warn("`run()` missing `iterations`", DeprecationWarning, stacklevel=2)
170 iterations = self.max_iteration
171
172 887.6 MiB 0.0 MiB 1 if self.iteration == -1 and self.update_objective_interval>0:
173 887.6 MiB 0.0 MiB 1 iterations+=1
174
175 # call `__next__` upto `iterations` times or until `StopIteration` is raised
176 887.6 MiB 0.0 MiB 1 self.max_iteration = self.iteration + iterations
177
178 887.6 MiB 0.0 MiB 2 iters = (count(self.iteration) if numpy.isposinf(self.max_iteration)
179 887.6 MiB 0.0 MiB 1 else range(self.iteration, self.max_iteration))
180
181 941.7 MiB 53.9 MiB 3 for _ in zip(iters, self):
182 941.7 MiB 0.0 MiB 2 try:
183 941.7 MiB 0.0 MiB 4 for callback in callbacks:
184 941.7 MiB 0.2 MiB 2 callback(self)
185 except StopIteration:
186 break
Tikhonov regularisation using CGLS and LSQR - with or without a block#
Tikhonov regularisation#
We can add a regularisation term to problem solved by CGLS; this gives us the minimisation problem in the following form, which is known as Tikhonov regularisation:
where,
\(A\) is the projection operator
\(b\) is the acquired data
\(u\) is the unknown image to be solved for
\(\alpha\) is the regularisation parameter
The first term measures the fidelity of the solution to the data. The second term meausures the fidelity to the prior knowledge we have imposed on the system, operator \(L\). \(\alpha\) controls the trade-off between these terms. \(L\) is often chosen to be a smoothing operator like the identity matrix, or a gradient operator constrained to the squared L2-norm.
This can be re-written equivalently in the block matrix form:
With the definitions:
\(\tilde{A} = \binom{A}{\alpha I}\)
\(\tilde{b} = \binom{b}{0}\)
this can now be recognised as a least squares problem:
and being a least squares problem, it can be solved using CGLS with \(\tilde{A}\) as operator and \(\tilde{b}\) as data.
LSQR in CIL has the option to include Tikhonov regularisation without using blocks. In this section, we will compare LSQR Tikhonov with a block, CGLS with a block, and LSQR Tikhonov without a block.
Set up the algorithms:
[35]:
initial = ig.allocate(0)
L = IdentityOperator(ig)
alpha = 0.1
operator_block = BlockOperator(A, alpha*L)
zero_data = L.range.allocate(0)
data_block = BlockDataContainer(sandstone_noisy, zero_data)
maxit = 100
itsAtATime = 1
N = round(maxit/itsAtATime)
xx = np.arange(0, maxit, itsAtATime)
[36]:
cgls_tik_block = CGLS(initial=initial, operator=operator_block, data=data_block, update_objective_interval = 10)
times_cgls_tik_block, rel_res_vec_cgls_tik_block, rel_err_vec_cgls_tik_block = timed_iterations(sandstone_noisy, A, sandstone_cor, recon, padsize, cgls_tik_block, N, itsAtATime)
lsqr_tik_block = LSQR(initial=initial, operator=operator_block, data=data_block, update_objective_interval = 10)
times_lsqr_tik_block, rel_res_vec_lsqr_tik_block, rel_err_vec_lsqr_tik_block = timed_iterations(sandstone_noisy, A, sandstone_cor, recon, padsize, lsqr_tik_block, N, itsAtATime)
lsqr_tik_simple = LSQR_Tikhonov(initial=initial, operator=A, data=sandstone_noisy, alpha = alpha)
times_lsqr_tik_simple, rel_res_vec_lsqr_tik_simple, rel_err_vec_lsqr_tik_simple = timed_iterations(sandstone_noisy, A, sandstone_cor, recon, padsize, lsqr_tik_simple, N, itsAtATime)
Convergence/reduction of residuals:#
The plots below show the progress of the residuals across each iteration, and the error across iterations:
[37]:
plt.subplot(1,2,1)
plt.semilogy(xx, rel_res_vec_lsqr_tik_block, '*-', xx, rel_res_vec_cgls_tik_block, '-', xx, rel_res_vec_lsqr_tik_simple, '--')
plt.xlabel('Iteratons')
plt.ylabel('Residuals')
plt.gca().legend(('LSQR Block', 'CGLS Block', 'LSQR Tik'))
plt.subplot(1,2,2)
plt.semilogy(xx, rel_err_vec_lsqr_tik_block, '*-', xx, rel_err_vec_cgls_tik_block, '-', xx, rel_err_vec_lsqr_tik_simple, '--')
plt.xlabel('Iteratons')
plt.ylabel('Error')
plt.gca().legend(('LSQR Block', 'CGLS Block', 'LSQR Tik'))
plt.tight_layout()
Figure 5: (Left) This graph shows the progress of the residuals across 100 iterations. CGLS Block, LSQR Block and LSQR Tikhonov’s iterations are equivalent mathematically, so they converge to the same residuals. The residuals themselves decrease (minimise) across the iterations, meaning the algorithms’ reconstructions match the data closely. (Right) This graph shows the relative error across 100 iterations. All 3 algorithms display identical errors due to mathematically equivalent iterations. This graph shows that the algorithms’ reconstructions match the FBP reconstruction (ground truth) most at iteration ~20.
Compare the time taken/elapsed across each iteration for the 3 algorithms:
[ ]:
plt.plot(xx, times_cgls_tik_block, 'r-', xx, times_lsqr_tik_block, 'b.--', xx, times_lsqr_tik_simple, 'g.-' )
plt.xlabel('Iteration')
plt.ylabel('Time (s)')
plt.gca().legend(('CGLS Block', 'LSQR Block', 'LSQR Tik'))
<matplotlib.legend.Legend at 0x7f3d909333e0>
Figure 6: This graph shows the time elapsed at each iteration for CGLS Block, LSQR Block and LSQR Tikhonov. For all algorithms, this relationship is linear. On this device, the elapsed times across 100 iterations are very similar, with CGLS Block being slightly faster overall.
Memory costs for LSQR+Block, CGLS+Block and LSQR+Tikhonov#
Setting Up Memory Tracking:#
Setup the memory tracking for the three algorithms:
[39]:
%%capture cgls_block_prof
!python3 Scripts/memory-sandstone.py cgls_lp_tik_block --track-peak
[40]:
%%capture lsqr_block_prof
!python3 Scripts/memory-sandstone.py lsqr_lp_tik_block --track-peak
[41]:
%%capture lsqr_block_mp
!python3 Scripts/memory-sandstone.py lsqr_mp_tik_block
[42]:
%%capture lsqr_tik_prof
!python3 Scripts/memory-sandstone.py lsqr_tik_lp --track-peak
[43]:
%%capture lsqr_tik_mp
!python3 Scripts/memory-sandstone.py lsqr_tik_mp
Memory Usage Results:#
[44]:
# ImageData and AcquisitionData Sizes
print(f"Estimate of 'initial' (ImageData) size: {initial.size*dtypeSize*MB:.1f} MB")
print(f"Estimate of 'sandstone_noisy' (AcquisitionData) size: {sandstone_noisy.size*dtypeSize*MB:.1f} MB")
print(f"Estimate of 'data_block' (Block AcquisitionData) size: {(sandstone_noisy.size+zero_data.size)*dtypeSize*MB:.1f} MB")
Estimate of 'initial' (ImageData) size: 53.9 MB
Estimate of 'sandstone_noisy' (AcquisitionData) size: 21.5 MB
Estimate of 'data_block' (Block AcquisitionData) size: 75.4 MB
In the dataframe below, there is a (temporary) peak increase of ~129.5 MB in the first update (Highlighted in pink) due to the line self.operator.direct(self.p, out=self.q). This corresponds to the ImageData and Block AcquisitionData sizes: 53.9 + 75.4 = ~129.3 MB. Most of this memory is deallocated by the end of this line, ~21.5 MB (1x original AcquisitionData) stays occupied in the total usage (Highlighted in orange).
The subsequent update calls have a peak of ~107.9 MB, which is 1x AcqData (~21.5) less than the previous peak, suggesting that the AcqData buffer array in self.q is populated with values during the first call to update.
In total, CGLS Block makes:
3 copies of ImgData (53.9 MB)
2 copies of Block AcqData (75.4 MB)
1 copy of AcqData (21.5 MB)
The CGLS Block DataFrame shows these results below:
[45]:
cgls_block_lp = line_peak_process(cgls_block_prof, norm=True)
cgls_block_lp.iloc[:12].style.apply(
lambda x: ['background-color: pink' if (134 > val > 126) and (i % 2 == 1) else '' for i, val in enumerate(x)],
subset='Peak Memory Increase (MiB)')\
.map(lambda val: 'background-color: turquoise' if val > 300 else '', subset='Total Memory Usage (MiB)')\
.map(lambda val: 'background-color: orange' if 24 > val > 18 else '', subset='Total Memory Usage (MiB)')\
.format(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)
[45]:
| Method | Initial Memory Usage (MiB) | Peak Memory | Final Memory Usage (MiB) | Total Memory Usage (MiB) | Peak Memory Increase (MiB) | Peak Line | |
|---|---|---|---|---|---|---|---|
| 0 | init | 0.00 | 314.77 | 314.77 | 314.77 | 314.77 | self.set_up(initial=initial, operator=operator, data=data) |
| 1 | setup | 0.00 | 314.77 | 314.77 | 314.77 | 314.77 | self.gamma = self.norms0**2 |
| 2 | run 1 | 314.77 | 444.35 | 336.67 | 21.90 | 129.58 | update(self) |
| 3 | update 1 | 314.96 | 444.35 | 336.67 | 21.71 | 129.39 | self.operator.direct(self.p, out=self.q) |
| 4 | run 2 | 336.67 | 444.56 | 336.73 | 0.06 | 107.89 | update(self) |
| 5 | update 2 | 336.67 | 444.56 | 336.73 | 0.06 | 107.89 | self.operator.direct(self.p, out=self.q) |
| 6 | run 3 | 336.73 | 444.60 | 336.78 | 0.05 | 107.87 | update(self) |
| 7 | update 3 | 336.73 | 444.60 | 336.76 | 0.03 | 107.87 | self.operator.direct(self.p, out=self.q) |
| 8 | run 4 | 336.78 | 444.65 | 336.80 | 0.02 | 107.87 | update(self) |
| 9 | update 4 | 336.78 | 444.65 | 336.80 | 0.02 | 107.87 | self.operator.direct(self.p, out=self.q) |
| 10 | run 5 | 336.80 | 444.68 | 336.85 | 0.05 | 107.88 | update(self) |
| 11 | update 5 | 336.80 | 444.68 | 336.84 | 0.04 | 107.88 | self.operator.direct(self.p, out=self.q) |
set_up, the allocations (Highlighted in turquoise) are: (53.9 * 3) + 75.4 + 21.5 = ~258.6 MBupdate, the allocations (Highlighted in orange) are: 53.9 + 75.4 = ~129.3 MBThe memory increases in update 1 occur because the buffer arrays initialised in set_up become populated with values.
The peak of ~183 MB (Highlighted in pink) is due to self.operator.direct(self.v, out=self.tmp_range) creating temp buffers of ImgData + Block AcqData, and self.operator.adjoint(self.u, out=self.tmp_domain) creating 2 buffers of ImgData during update 1.
In total, LSQR Block makes:
4 copies of ImgData (53.9 MB)
2 copies of Block AcqData (75.4 MB)
1 copy of AcqData (21.5 MB)
[46]:
lsqr_block_lp = line_peak_process(lsqr_block_prof, norm=True)
lsqr_block_lp.iloc[:12].style.apply(
lambda x: ['background-color: pink' if (190 > val > 175) and (i % 2 == 1) else '' for i, val in enumerate(x)],
subset='Peak Memory Increase (MiB)')\
.map(lambda val: 'background-color: turquoise' if val > 255 else '', subset='Total Memory Usage (MiB)')\
.map(lambda val: 'background-color: orange' if 134 > val > 126 else '', subset='Total Memory Usage (MiB)')\
.format(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)
[46]:
| Method | Initial Memory Usage (MiB) | Peak Memory | Final Memory Usage (MiB) | Total Memory Usage (MiB) | Peak Memory Increase (MiB) | Peak Line | |
|---|---|---|---|---|---|---|---|
| 0 | init | 0.00 | 260.78 | 260.78 | 260.78 | 260.78 | self.set_up(initial=initial, operator=operator, data=data) |
| 1 | setup | 0.00 | 260.78 | 260.78 | 260.78 | 260.78 | self.tmp_range = data.geometry.allocate(None) |
| 2 | run 1 | 260.78 | 444.51 | 390.60 | 129.82 | 183.73 | update(self) |
| 3 | update 1 | 261.14 | 444.51 | 390.55 | 129.41 | 183.37 | self.operator.adjoint(self.u, out=self.tmp_domain) |
| 4 | run 2 | 390.60 | 498.47 | 390.64 | 0.04 | 107.87 | update(self) |
| 5 | update 2 | 390.60 | 498.47 | 390.59 | -0.01 | 107.87 | self.operator.direct(self.v, out=self.tmp_range) |
| 6 | run 3 | 390.64 | 498.51 | 390.64 | 0.00 | 107.87 | update(self) |
| 7 | update 3 | 390.64 | 498.51 | 390.64 | 0.00 | 107.87 | self.operator.direct(self.v, out=self.tmp_range) |
| 8 | run 4 | 390.64 | 498.56 | 390.68 | 0.04 | 107.92 | update(self) |
| 9 | update 4 | 390.64 | 498.56 | 390.68 | 0.04 | 107.92 | self.operator.direct(self.v, out=self.tmp_range) |
| 10 | run 5 | 390.68 | 498.61 | 390.74 | 0.06 | 107.93 | update(self) |
| 11 | update 5 | 390.68 | 498.61 | 390.74 | 0.06 | 107.93 | self.operator.direct(self.v, out=self.tmp_range) |
[47]:
# ImageData and AcquisitionData Sizes
print(f"Estimate of 'initial' (ImageData) size: {initial.size*dtypeSize*MB:.1f} MB")
print(f"Estimate of 'sandstone_noisy' (AcquisitionData) size: {sandstone_noisy.size*dtypeSize*MB:.1f} MB")
Estimate of 'initial' (ImageData) size: 53.9 MB
Estimate of 'sandstone_noisy' (AcquisitionData) size: 21.5 MB
The setup for LSQR Tikhonov does not use the Block operator, so the ImageData and AcquisitionData sizes are 53.9 MB and 21.5 MB, respectively.
We expect the Total Memory Usage to be the same as LSQR run previously, without the BlockOperator:
4 copies of ImageData (3 in
set_up(Highlighted in turquoise), 1 inrun/update(Highlighted in orange))2 copies of AcquisitionData (Highlighted in turquoise)
Totalling to a usage of: (53.9 * 4) + (21.5 * 2) = ~258.6 MB. This is shown in the outputs below, showing LSQR Tikhonov will use less memory than the block versions of CGLS and LSQR.
[48]:
# Uncomment to see memory_profiler breakdown:
print(*lsqr_tik_mp.stdout.splitlines()[:49], sep="\n") # Print set_up
print(*lsqr_tik_mp.stdout.splitlines()[77:135], sep="\n") # Print run 1
Dataset folder already exists in ..
Filename: /home/efv97572/cil-showcase/CIL-User-Showcase/015_Memory_Profiling_LSQR_CGLS/Algorithms/LSQRMP.py
Line # Mem usage Increment Occurrences Line Contents
=============================================================
92 681.6 MiB 681.6 MiB 1 @profile
93 def set_up(self, initial, operator, data):
94 r'''Initialisation of the algorithm
95 Parameters
96 ------------
97 operator : Operator
98 Linear operator for the inverse problem
99 initial : (optional) DataContainer in the domain of the operator, default is a DataContainer filled with zeros.
100 Initial guess
101 data : DataContainer in the range of the operator
102 Acquired data to reconstruct
103
104 '''
105
106 681.6 MiB 0.0 MiB 1 log.info("%s setting up", self.__class__.__name__)
107 735.3 MiB 53.7 MiB 1 self.x = initial.copy() #1 domain
108 735.3 MiB 0.0 MiB 1 self.operator = operator
109
110 # Initialise Golub-Kahan bidiagonalisation (GKB)
111
112 #self.u = data - self.operator.direct(self.x)
113 758.2 MiB 22.9 MiB 1 self.u = self.operator.direct(self.x) #1 range
114 758.2 MiB 0.0 MiB 1 self.u.sapyb(-1, data, 1, out=self.u)
115 758.3 MiB 0.0 MiB 1 self.beta = self.u.norm()
116 758.4 MiB 0.1 MiB 1 self.u /= self.beta
117
118 813.2 MiB 54.9 MiB 1 self.v = self.operator.adjoint(self.u) #2 domain
119 813.2 MiB 0.0 MiB 1 self.alpha = self.v.norm()
120 813.2 MiB 0.0 MiB 1 self.v /= self.alpha
121
122 813.2 MiB 0.0 MiB 1 self.rhobar = self.alpha
123 813.2 MiB 0.0 MiB 1 self.phibar = self.beta
124 813.2 MiB 0.0 MiB 1 self.normr = self.beta
125 813.2 MiB 0.0 MiB 1 self.regalphasq = self.regalpha**2
126
127 866.9 MiB 53.7 MiB 1 self.d = self.v.copy() #3 domain
128 888.5 MiB 21.6 MiB 1 self.tmp_range = data.geometry.allocate(None) #2 range
129 888.5 MiB 0.0 MiB 1 self.tmp_domain = self.x.geometry.allocate(None) #4 domain
130
131 888.5 MiB 0.0 MiB 1 self.res2 = 0
132
133 888.5 MiB 0.0 MiB 1 self.configured = True
134 888.5 MiB 0.0 MiB 1 log.info("%s configured", self.__class__.__name__)
run 1
Filename: /home/efv97572/cil-showcase/CIL-User-Showcase/015_Memory_Profiling_LSQR_CGLS/Algorithms/LSQRMP.py
Line # Mem usage Increment Occurrences Line Contents
=============================================================
136 888.7 MiB 888.7 MiB 1 @profile
137 def run(self, iterations=None, callbacks: Optional[List[Callback]]=None, verbose=1, **kwargs):
138 r"""run upto :code:`iterations` with callbacks/logging.
139
140 For a demonstration of callbacks see https://github.com/TomographicImaging/CIL-Demos/blob/main/misc/callback_demonstration.ipynb
141
142 Parameters
143 -----------
144 iterations: int, default is None
145 Number of iterations to run. If not set the algorithm will run until :code:`should_stop()` is reached
146 callbacks: list of callables, default is Defaults to :code:`[ProgressCallback(verbose)]`
147 List of callables which are passed the current Algorithm object each iteration. Defaults to :code:`[ProgressCallback(verbose)]`.
148 verbose: 0=quiet, 1=info, 2=debug
149 Passed to the default callback to determine the verbosity of the printed output.
150 """
151
152 888.7 MiB 0.0 MiB 1 if 'print_interval' in kwargs:
153 warnings.warn("use `TextProgressCallback(miniters)` instead of `run(print_interval)`",
154 DeprecationWarning, stacklevel=2)
155 888.7 MiB 0.0 MiB 1 if callbacks is None:
156 888.7 MiB 0.0 MiB 1 callbacks = [ProgressCallback(verbose=verbose)]
157
158 # transform old-style callbacks into new
159 888.7 MiB 0.0 MiB 1 callback = kwargs.get('callback', None)
160
161 888.7 MiB 0.0 MiB 1 if callback is not None:
162 callbacks.append(_OldCallback(callback, verbose=verbose))
163 888.7 MiB 0.0 MiB 1 if hasattr(self, '__log_file'):
164 callbacks.append(LogfileCallback(self.__log_file, verbose=verbose))
165
166 888.7 MiB 0.0 MiB 1 if self.should_stop():
167 print("Stop criterion has been reached.")
168 888.7 MiB 0.0 MiB 1 if iterations is None:
169 warnings.warn("`run()` missing `iterations`", DeprecationWarning, stacklevel=2)
170 iterations = self.max_iteration
171
172 888.7 MiB 0.0 MiB 1 if self.iteration == -1 and self.update_objective_interval>0:
173 888.7 MiB 0.0 MiB 1 iterations+=1
174
175 # call `__next__` upto `iterations` times or until `StopIteration` is raised
176 888.7 MiB 0.0 MiB 1 self.max_iteration = self.iteration + iterations
177
178 888.7 MiB 0.0 MiB 2 iters = (count(self.iteration) if numpy.isposinf(self.max_iteration)
179 888.7 MiB 0.0 MiB 1 else range(self.iteration, self.max_iteration))
180
181 942.7 MiB 53.9 MiB 3 for _ in zip(iters, self):
182 942.7 MiB 0.0 MiB 2 try:
183 942.7 MiB 0.0 MiB 4 for callback in callbacks:
184 942.7 MiB 0.1 MiB 2 callback(self)
185 except StopIteration:
186 break
We can see the increase of ~54 MB at the first iteration resulting from the line self.operator.adjoint(self.u, out=self.tmp_domain) which is also observed in the non-block version of LSQR:
[49]:
print_output_subset(lsqr_tik_prof, 'run 1')
run 1
Start of run | Memory Usage: 888.03 MB | line:
16 | Memory Usage: 888.03 MB | line: callbacks = [ProgressCallback(verbose=verbose)]
17 | Memory Usage: 888.03 MB | line: callback = kwargs.get('callback', None)
18 | Memory Usage: 888.03 MB | line: callbacks.append(_OldCallback(callback, verbose=verbose))
19 | Memory Usage: 888.03 MB | line: iterations = self.max_iteration
20 | Memory Usage: 888.03 MB | line: iterations+=1
21 | Memory Usage: 888.05 MB | line: self.max_iteration = self.iteration + iterations
22 | Memory Usage: 888.05 MB | line: iters = (count(self.iteration) if numpy.isposinf(self.max_iteration) else range(self.iteration, self.max_iteration))
23 | Memory Usage: 888.05 MB | line: update(self)
24 | Memory Usage: 888.18 MB | line: callback(self)
Start of update | Memory Usage: 888.18 MB | line:
Memory Usage Log (Time, Memory in MB): 16:00:11.557260, 941.89 MB
Memory Usage Log (Time, Memory in MB): 16:00:11.567545, 942.11 MB
Memory Usage Log (Time, Memory in MB): 16:00:11.577777, 942.11 MB
Memory Usage Log (Time, Memory in MB): 16:00:11.587952, 942.11 MB
Memory Usage Log (Time, Memory in MB): 16:00:11.598127, 942.11 MB
Memory Usage Log (Time, Memory in MB): 16:00:11.608298, 942.11 MB
Memory Usage Log (Time, Memory in MB): 16:00:11.618444, 942.11 MB
Memory Usage Log (Time, Memory in MB): 16:00:11.628593, 942.11 MB
Memory Usage Log (Time, Memory in MB): 16:00:11.638785, 942.11 MB
Memory Usage Log (Time, Memory in MB): 16:00:11.649060, 942.11 MB
Memory Usage Log (Time, Memory in MB): 16:00:11.659228, 942.11 MB
Memory Usage Log (Time, Memory in MB): 16:00:11.669382, 942.11 MB
25 | Memory Usage: 888.18 MB | line: self.operator.direct(self.v, out=self.tmp_range)
Memory Usage Log (Time, Memory in MB): 16:00:11.681125, 888.18 MB
26 | Memory Usage: 888.18 MB | line: self.tmp_range.sapyb(1., self.u,-self.alpha, out=self.u)
27 | Memory Usage: 888.17 MB | line: self.beta = self.u.norm()
Memory Usage Log (Time, Memory in MB): 16:00:11.691371, 888.17 MB
28 | Memory Usage: 888.17 MB | line: self.u /= self.beta
Memory Usage Log (Time, Memory in MB): 16:00:11.718584, 888.17 MB
Memory Usage Log (Time, Memory in MB): 16:00:11.728793, 888.17 MB
Memory Usage Log (Time, Memory in MB): 16:00:11.738955, 888.17 MB
Memory Usage Log (Time, Memory in MB): 16:00:11.749109, 888.17 MB
Memory Usage Log (Time, Memory in MB): 16:00:11.759262, 888.17 MB
Memory Usage Log (Time, Memory in MB): 16:00:11.769433, 888.17 MB
Memory Usage Log (Time, Memory in MB): 16:00:11.779634, 888.17 MB
Memory Usage Log (Time, Memory in MB): 16:00:11.789839, 888.17 MB
Memory Usage Log (Time, Memory in MB): 16:00:11.800049, 888.17 MB
Memory Usage Log (Time, Memory in MB): 16:00:11.810232, 888.17 MB
Memory Usage Log (Time, Memory in MB): 16:00:11.820399, 888.17 MB
Memory Usage Log (Time, Memory in MB): 16:00:11.830621, 888.17 MB
Memory Usage Log (Time, Memory in MB): 16:00:11.840947, 899.95 MB
Memory Usage Log (Time, Memory in MB): 16:00:11.851269, 923.95 MB
29 | Memory Usage: 942.11 MB | line: self.operator.adjoint(self.u, out=self.tmp_domain)
Memory Usage Log (Time, Memory in MB): 16:00:11.861507, 942.11 MB
30 | Memory Usage: 942.11 MB | line: self.v.sapyb(-self.beta, self.tmp_domain, 1., out=self.v)
Memory Usage Log (Time, Memory in MB): 16:00:11.871839, 942.11 MB
31 | Memory Usage: 942.11 MB | line: self.alpha = self.v.norm()
32 | Memory Usage: 942.11 MB | line: self.v /= self.alpha
33 | Memory Usage: 942.11 MB | line: self.phibar = c1 * self.phibar
34 | Memory Usage: 942.11 MB | line: self.phibar = s * self.phibar
Memory Usage Log (Time, Memory in MB): 16:00:11.882094, 942.11 MB
35 | Memory Usage: 942.10 MB | line: self.x.sapyb(1, self.d, phi/rho, out=self.x)
Memory Usage Log (Time, Memory in MB): 16:00:11.892373, 942.10 MB
36 | Memory Usage: 942.09 MB | line: self.d.sapyb(-theta/rho, self.v, 1, out=self.d)
End of update | Memory Usage: 942.09 MB | line: self.normr = math.sqrt(self.phibar ** 2 + self.res2)
23 | Memory Usage: 942.09 MB | line: update(self)
24 | Memory Usage: 942.09 MB | line: callback(self)
End of run | Memory Usage: 942.09 MB | line:
[50]:
lsqr_tik_lp = line_peak_process(lsqr_tik_prof, norm=True)
lsqr_tik_lp.iloc[:12].style.map(lambda val: 'background-color: turquoise' if val > 204 else '', subset='Total Memory Usage (MiB)')\
.map(lambda val: 'background-color: orange' if 58 > val > 50 else '', subset='Total Memory Usage (MiB)')\
.format(lambda x: f"{x:.2f}" if isinstance(x, (int, float)) else x)
[50]:
| Method | Initial Memory Usage (MiB) | Peak Memory | Final Memory Usage (MiB) | Total Memory Usage (MiB) | Peak Memory Increase (MiB) | Peak Line | |
|---|---|---|---|---|---|---|---|
| 0 | init | 0.00 | 207.10 | 207.10 | 207.10 | 207.10 | self.set_up(initial=initial, operator=operator, data=data) |
| 1 | setup | 0.00 | 207.10 | 207.10 | 207.10 | 207.10 | self.tmp_range = data.geometry.allocate(None) |
| 2 | run 1 | 207.10 | 261.18 | 261.16 | 54.06 | 54.08 | update(self) |
| 3 | update 1 | 207.25 | 261.18 | 261.16 | 53.91 | 53.93 | self.operator.direct(self.v, out=self.tmp_range) |
| 4 | run 2 | 261.16 | 315.17 | 261.19 | 0.03 | 54.01 | update(self) |
| 5 | update 2 | 261.16 | 315.17 | 261.19 | 0.03 | 54.01 | self.operator.direct(self.v, out=self.tmp_range) |
| 6 | run 3 | 261.19 | 315.19 | 261.23 | 0.04 | 54.00 | update(self) |
| 7 | update 3 | 261.19 | 315.19 | 261.23 | 0.04 | 54.00 | self.operator.direct(self.v, out=self.tmp_range) |
| 8 | run 4 | 261.23 | 315.23 | 261.27 | 0.04 | 54.00 | update(self) |
| 9 | update 4 | 261.23 | 315.23 | 261.27 | 0.04 | 54.00 | self.operator.direct(self.v, out=self.tmp_range) |
| 10 | run 5 | 261.27 | 315.27 | 261.30 | 0.03 | 54.00 | update(self) |
| 11 | update 5 | 261.27 | 315.27 | 261.30 | 0.03 | 54.00 | self.operator.direct(self.v, out=self.tmp_range) |
[51]:
plot_mem(dfs=(cgls_block_lp, lsqr_block_lp, lsqr_tik_lp), algNms=("CGLS Block", "LSQR Block", "LSQR Tik"), linestyles=("-","-","-"))
Figure 7: This graph shows the memory usages before (initial), during (peak) and after (final) the ‘init’ and ‘run’ methods in CGLS Block (Blue), LSQR Block (Orange) and LSQR Tikhonov (Green). The first two points show the memory increases during ‘init’, which remain stable. In ‘run (iteration)
1, CGLS Block and LSQR Block show a peak, which then drops down slightly. The graph shows a net increase in memory usage at the end of 'run 1'. LSQR Tikhonov shows a net increase in memory usage inrun 1` too, without a peak. The spikes during each subsequent run show that the memory usage reaches a peak, and then drops back down, resulting in a net increase of ~0 MB. This is consistent across all ‘run’ calls (iterations). Amongst these three algorithms, LSQR Block uses the most
memory, and LSQR Tikhonov uses the least.
Conclusion:#
In this notebook, we showed:
LSQR and CGLS produce very similar residuals, and perform similarly with respect to iteration time
The simple version of CGLS uses less memory than LSQR
LSQR and CGLS Block, and LSQR Tikhonov produce very similar residuals, and show similar iteration times
LSQR Tikhonov will use the same memory as regular LSQR
LSQR Tikhonov has the benefit of a regularisation term, which means that there is less likelihood of overfitting to the noise
The profiling data tells us where the memory usage comes from, and places it could be optimised
There may be some room to optimise peak memory usage, e.g. in the
ProjectionOperatorcode