Optimisation framework#

This package allows rapid prototyping of optimisation-based reconstruction problems, i.e. defining and solving different optimization problems to enforce different properties on the reconstructed image.

Firstly, it provides an object-oriented framework for defining mathematical operators and functions as well a collection of useful example operators and functions. Both smooth and non-smooth functions can be used.

Further, it provides a number of high-level generic implementations of optimisation algorithms to solve generically formulated optimisation problems constructed from operator and function objects.

The fundamental components are:

Operator: A class specifying a (currently linear) operator.
Function: A class specifying mathematical functions such as a least squares data fidelity.
Algorithm: Implementation of an iterative optimisation algorithm to solve a particular generic optimisation problem. Algorithms are iterable Python object which can be run in a for loop. Can be stopped and warm restarted.

Algorithms (Deterministic)#

A number of generic algorithm implementations are provided including Gradient Descent (GD), Conjugate Gradient Least Squares (CGLS), Simultaneous Iterative Reconstruction Technique (SIRT), Primal Dual Hybrid Gradient (PDHG), Primal dual three-operator (PD3O), Iterative Shrinkage Thresholding Algorithm (ISTA), and Fast Iterative Shrinkage Thresholding Algorithm (FISTA).

An algorithm is designed for a particular generic optimisation problem accepts and number of instances of Function derived classes and/or Operator derived classes as input to define a specific instance of the generic optimisation problem to be solved. They are iterable objects which can be run in a for loop. The user can provide a stopping criterion different than the default.

New algorithms can be easily created by extending the Algorithm class. The user is required to implement only 4 methods: set_up, __init__, update and update_objective.

set_up and __init__ are used to configure the algorithm
update is the actual iteration updating the solution
update_objective defines how the objective is calculated.

For example, the implementation of the update of the Gradient Descent algorithm to minimise a Function will only be:

def update(self):
    self.x += -self.rate * self.objective_function.gradient(self.x)
def update_objective(self):
    self.loss.append(self.objective_function(self.x))

The Algorithm provides the infrastructure to continue iteration, to access the values of the objective function in subsequent iterations, the time for each iteration, and to provide a nice print to screen of the status of the optimisation.

Base class#

class cil.optimisation.algorithms.Algorithm(update_objective_interval=1, max_iteration=None, log_file=None)[source]#

Base class providing minimal infrastructure for iterative algorithms.

An iterative algorithm is designed to solve an optimization problem by repeatedly refining a solution. In CIL, we use iterative algorithms to minimize an objective function, often referred to as a loss. The process begins with an initial guess, and with each iteration, the algorithm updates the current solution based on the results of previous iterations (previous iterates). Iterative algorithms typically continue until a stopping criterion is met, indicating that an optimal or sufficiently good solution has been found. In CIL, stopping criteria can be implemented using a callback function (cil.optimisation.utilities.callbacks).

The user is required to implement the set_up, __init__, update and update_objective methods.

The method run is available to run n iterations. The method accepts callbacks: a list of callables, each of which receive the current Algorithm object (which in turn contains the iteration number and the actual objective value) and can be used to trigger print to screens and other user interactions. The run method will stop when the stopping criterion is met or StopIteration is raised.

update_objective_interval: int, optional, default 1
The objective (or loss) is calculated and saved every update_objective_interval. 1 means every iteration, 2 every 2 iterations and so forth. This is by default 1 and should be increased when evaluating the objective is computationally expensive.

set_up(*args, **kwargs)[source]#: Set up the algorithm

update()[source]#: A single iteration of the algorithm

should_stop()[source]#

default stopping criterion: number of iterations

The user can change this in concrete implementation of iterative algorithms.

max_iteration_stop_criterion()[source]#: Do not use: this is being deprecated

get_output()[source]#

Returns the current solution.

Returns:: The current solution
Return type:: DataContainer

is_provably_convergent()[source]#

Check if the algorithm is convergent based on the provable convergence criterion.

Returns:: Outcome of the convergence check
Return type:: Boolean

property solution#: Returns the current solution.

get_last_loss(return_all=False)[source]#

Returns the last stored value of the loss function. “Loss” is an alias for “objective value”. If update_objective_interval is 1 it is the value of the objective at the current iteration. If update_objective_interval > 1 it is the last stored value.

Parameters:: return_all (Boolean, default is False) – If True, returns all the stored loss functions
Returns:: Last stored value of the loss function
Return type:: Float

get_last_objective(return_all=False)#

Parameters:: return_all (Boolean, default is False) – If True, returns all the stored loss functions
Returns:: Last stored value of the loss function
Return type:: Float

update_objective()[source]#: calculates the objective with the current solution

property iterations#: returns the iterations at which the objective has been evaluated

property loss#

returns a list of the values of the objective (alias of loss) during the iteration

The length of this list may be shorter than the number of iterations run when the update_objective_interval > 1

property objective#

returns a list of the values of the objective (alias of loss) during the iteration

The length of this list may be shorter than the number of iterations run when the update_objective_interval > 1

property max_iteration#: gets the maximum number of iterations

property update_objective_interval#: gets the update_objective_interval

run(iterations=None, callbacks: List[Callback] | None = None, verbose=1, **kwargs)[source]#

run upto iterations with callbacks/logging.

For a demonstration of callbacks see TomographicImaging/CIL-Demos

Parameters:

iterations (int, default is None) – Number of iterations to run. If a positive infinity is passed, the algorithm will run indefinitely until a callback raises StopIteration.
callbacks (list of callables, default is Defaults to [ProgressCallback(verbose)]) – List of callables which are passed the current Algorithm object each iteration. Defaults to [ProgressCallback(verbose)].
verbose (0=quiet, 1=info, 2=debug) – Passed to the default callback to determine the verbosity of the printed output.

objective_to_dict(verbose=False)[source]#: Internal function to save and print objective functions

objective_to_string(verbose=False)[source]#: Do not use: this is being deprecated

verbose_output(*_, **__)[source]#: Do not use: this is being deprecated

verbose_header(*_, **__)[source]#: Do not use: this is being deprecated

GD#

class cil.optimisation.algorithms.GD(initial=None, f=None, step_size=None, preconditioner=None, **kwargs)[source]#

Gradient Descent algorithm

Parameters:

initial (DataContainer (e.g. ImageData)) – The initial point for the optimisation
f (CIL function (Function(). ) with a defined gradient method) – The function to be minimised.
step_size (positive real float or subclass of StepSizeRule(), default = None) – If you pass a float this will be used as a constant step size. If left as None and do not pass a step size rule then the Armijo rule will be used to perform backtracking to choose a step size at each iteration (ArmijoStepSizeRule()). If a child class of cil.optimisation.utilities.StepSizeRule()’ is passed then its method get_step_size is called for each update.
preconditioner (class with a apply method or a function that takes an initialised CIL function as an argument and modifies a provided gradient.) – This could be a custom preconditioner or one provided in preconditioner(). If None is passed then self.gradient_update will remain unmodified.

set_up(initial, f, step_size, preconditioner)[source]#

initialisation of the algorithm

Parameters:

initial (DataContainer (e.g. ImageData)) – The initial point for the optimisation
f (CIL function with a defined gradient) – The function to be minimised.
step_size (positive real float or subclass of StepSizeRule(), default = None) – If you pass a float this will be used as a constant step size. If left as None and do not pass a step_size_rule then the Armijio rule will be used to perform backtracking to choose a step size at each iteration. If a child class of cil.optimisation.utilities.StepSizeRule()’ is passed then it’s method get_step_size is called for each update.
preconditioner (class with a apply method or a function that takes an initialised CIL function as an argument and modifies a provided gradient.) – This could be a custom preconditioner or one provided in preconditioner(). If None is passed then self.gradient_update will remain unmodified.

update()[source]#: Performs a single iteration of the gradient descent algorithm

update_objective()[source]#: calculates the objective with the current solution

should_stop()[source]#: Stopping criterion for the gradient descent algorithm

property step_size#: Returns the most recently used step size. Note, if the step-size is set by a non-constant step size rule, you must use the algorithm run or update method before this getter will return the most recently used step size.

calculate_objective_function_at_point(x)[source]#

Calculates the objective at a given point x

\[f(x)\]

Parameters:: x (DataContainer)

get_last_loss(return_all=False)#

Parameters:: return_all (Boolean, default is False) – If True, returns all the stored loss functions
Returns:: Last stored value of the loss function
Return type:: Float

get_last_objective(return_all=False)#

Parameters:: return_all (Boolean, default is False) – If True, returns all the stored loss functions
Returns:: Last stored value of the loss function
Return type:: Float

get_output()#

Returns the current solution.

Returns:: The current solution
Return type:: DataContainer

is_provably_convergent()#

Check if the algorithm is convergent based on the provable convergence criterion.

Returns:: Outcome of the convergence check
Return type:: Boolean

property iterations#: returns the iterations at which the objective has been evaluated

property loss#

returns a list of the values of the objective (alias of loss) during the iteration

The length of this list may be shorter than the number of iterations run when the update_objective_interval > 1

property max_iteration#: gets the maximum number of iterations

max_iteration_stop_criterion()#: Do not use: this is being deprecated

property objective#

returns a list of the values of the objective (alias of loss) during the iteration

The length of this list may be shorter than the number of iterations run when the update_objective_interval > 1

objective_to_dict(verbose=False)#: Internal function to save and print objective functions

objective_to_string(verbose=False)#: Do not use: this is being deprecated

run(iterations=None, callbacks: List[Callback] | None = None, verbose=1, **kwargs)#

run upto iterations with callbacks/logging.

For a demonstration of callbacks see TomographicImaging/CIL-Demos

Parameters:

iterations (int, default is None) – Number of iterations to run. If a positive infinity is passed, the algorithm will run indefinitely until a callback raises StopIteration.
callbacks (list of callables, default is Defaults to [ProgressCallback(verbose)]) – List of callables which are passed the current Algorithm object each iteration. Defaults to [ProgressCallback(verbose)].
verbose (0=quiet, 1=info, 2=debug) – Passed to the default callback to determine the verbosity of the printed output.

property solution#: Returns the current solution.

property update_objective_interval#: gets the update_objective_interval

verbose_header(*_, **__)#: Do not use: this is being deprecated

verbose_output(*_, **__)#: Do not use: this is being deprecated

CGLS#

class cil.optimisation.algorithms.CGLS(initial=None, operator=None, data=None, **kwargs)[source]#

Conjugate Gradient Least Squares (CGLS) algorithm

The Conjugate Gradient Least Squares (CGLS) algorithm is commonly used for solving large systems of linear equations, due to its fast convergence.

Problem:

\[\min_x || A x - b ||^2_2\]

Parameters:

operator (Operator) – Linear operator for the inverse problem
initial ((optional) DataContainer in the domain of the operator, default is a DataContainer filled with zeros.) – Initial guess
data (DataContainer in the range of the operator) – Acquired data to reconstruct

Note

Passing tolerance directly to CGLS is being deprecated. Instead we recommend using the callback functionality: https://tomographicimaging.github.io/CIL/nightly/optimisation/#callbacks and in particular the CGLSEarlyStopping callback replicated the old behaviour.

https://web.stanford.edu/group/SOL/software/cgls/

set_up(initial, operator, data)[source]#: Initialisation of the algorithm :param operator: Linear operator for the inverse problem :type operator: Operator :param initial: Initial guess :type initial: (optional) DataContainer in the domain of the operator, default is a DataContainer filled with zeros. :param data: Acquired data to reconstruct :type data: DataContainer in the range of the operator

update()[source]#: single iteration

update_objective()[source]#: calculates the objective with the current solution

should_stop()[source]#

default stopping criterion: number of iterations

The user can change this in concrete implementation of iterative algorithms.

flag()[source]#: returns whether the tolerance has been reached

get_last_loss(return_all=False)#

Parameters:: return_all (Boolean, default is False) – If True, returns all the stored loss functions
Returns:: Last stored value of the loss function
Return type:: Float

get_last_objective(return_all=False)#

Parameters:: return_all (Boolean, default is False) – If True, returns all the stored loss functions
Returns:: Last stored value of the loss function
Return type:: Float

get_output()#

Returns the current solution.

Returns:: The current solution
Return type:: DataContainer

is_provably_convergent()#

Check if the algorithm is convergent based on the provable convergence criterion.

Returns:: Outcome of the convergence check
Return type:: Boolean

property iterations#: returns the iterations at which the objective has been evaluated

property loss#

returns a list of the values of the objective (alias of loss) during the iteration

The length of this list may be shorter than the number of iterations run when the update_objective_interval > 1

property max_iteration#: gets the maximum number of iterations

max_iteration_stop_criterion()#: Do not use: this is being deprecated

property objective#

returns a list of the values of the objective (alias of loss) during the iteration

The length of this list may be shorter than the number of iterations run when the update_objective_interval > 1

objective_to_dict(verbose=False)#: Internal function to save and print objective functions

objective_to_string(verbose=False)#: Do not use: this is being deprecated

run(iterations=None, callbacks: List[Callback] | None = None, verbose=1, **kwargs)#

run upto iterations with callbacks/logging.

For a demonstration of callbacks see TomographicImaging/CIL-Demos

Parameters:

iterations (int, default is None) – Number of iterations to run. If a positive infinity is passed, the algorithm will run indefinitely until a callback raises StopIteration.
callbacks (list of callables, default is Defaults to [ProgressCallback(verbose)]) – List of callables which are passed the current Algorithm object each iteration. Defaults to [ProgressCallback(verbose)].
verbose (0=quiet, 1=info, 2=debug) – Passed to the default callback to determine the verbosity of the printed output.

property solution#: Returns the current solution.

property update_objective_interval#: gets the update_objective_interval

verbose_header(*_, **__)#: Do not use: this is being deprecated

verbose_output(*_, **__)#: Do not use: this is being deprecated

SIRT#

class cil.optimisation.algorithms.SIRT(initial=None, operator=None, data=None, lower=None, upper=None, constraint=None, **kwargs)[source]#

Simultaneous Iterative Reconstruction Technique, see [8].

Simultaneous Iterative Reconstruction Technique (SIRT) solves the following problem

\[A x = b\]

The SIRT update step for iteration \(k\) is given by

\[x^{k+1} = \mathrm{proj}_{C}( x^{k} + \omega D ( A^{T} ( M (b - Ax^{k}) ) ) ),\]

where, \(M = \frac{1}{A\mathbb{1}}\), \(D = \frac{1}{A^{T}\mathbb{1}}\), \(\mathbb{1}\) is a DataContainer of ones, \(\mathrm{proj}_{C}\) is the projection over a set \(C\), and \(\omega\) is the relaxation parameter.

Parameters:

initial (DataContainer, default = None) – Starting point of the algorithm, default value = DataContainer in the domain of the operator allocated with zeros.
operator (LinearOperator) – The operator A.
data (DataContainer) – The data b.
lower (float, default = None) – Lower bound constraint
upper (float, default = None) – Upper bound constraint
constraint (Function, default = None) – A function with proximal method, e.g., IndicatorBox function and IndicatorBox.proximal(), or TotalVariation function and TotalVariation.proximal().

Note

If constraint is not passed, lower and upper are used to create an IndicatorBox and apply its proximal.

If constraint is passed, proximal method is required to be implemented.

Note

The preconditioning arrays (weights) M and D used in SIRT are defined as

\[M = \frac{1}{A\mathbb{1}}\]

\[D = \frac{1}{A^T\mathbb{1}}\]

Examples

\[\underset{x}{\mathrm{argmin}} \frac{1}{2}\| Ax - d\|^{2}\]

>>> sirt = SIRT(initial = ig.allocate(0), operator = A, data = d)

set_up(initial, operator, data, lower=None, upper=None, constraint=None)[source]#: Initialisation of the algorithm

property relaxation_parameter#: Get the relaxation parameter \(\omega\)

property D#: Get the preconditioning array \(D\)

set_relaxation_parameter(value=1.0)[source]#

Set the relaxation parameter \(\omega\)

Parameters:: value (float) – The relaxation parameter to be applied to the update. Must be between 0 and 2 to guarantee asymptotic convergence.

update()[source]#: Performs a single iteration of the SIRT algorithm. The update step for iteration \(k\) is given by

\[x^{k+1} = \mathrm{proj}_{C}( x^{k} + \omega D ( A^{T} ( M (b - Ax^{k}) ) ) )\]

update_objective()[source]#: Appends the current objective value to the list of previous objective values

\[\frac{1}{2}\|A x - b\|^{2}\]

get_last_loss(return_all=False)#

Parameters:: return_all (Boolean, default is False) – If True, returns all the stored loss functions
Returns:: Last stored value of the loss function
Return type:: Float

get_last_objective(return_all=False)#

Parameters:: return_all (Boolean, default is False) – If True, returns all the stored loss functions
Returns:: Last stored value of the loss function
Return type:: Float

get_output()#

Returns the current solution.

Returns:: The current solution
Return type:: DataContainer

is_provably_convergent()#

Check if the algorithm is convergent based on the provable convergence criterion.

Returns:: Outcome of the convergence check
Return type:: Boolean

property iterations#: returns the iterations at which the objective has been evaluated

property loss#

returns a list of the values of the objective (alias of loss) during the iteration

The length of this list may be shorter than the number of iterations run when the update_objective_interval > 1

property max_iteration#: gets the maximum number of iterations

max_iteration_stop_criterion()#: Do not use: this is being deprecated

property objective#

returns a list of the values of the objective (alias of loss) during the iteration

The length of this list may be shorter than the number of iterations run when the update_objective_interval > 1

objective_to_dict(verbose=False)#: Internal function to save and print objective functions

objective_to_string(verbose=False)#: Do not use: this is being deprecated

run(iterations=None, callbacks: List[Callback] | None = None, verbose=1, **kwargs)#

run upto iterations with callbacks/logging.

For a demonstration of callbacks see TomographicImaging/CIL-Demos

Parameters:

iterations (int, default is None) – Number of iterations to run. If a positive infinity is passed, the algorithm will run indefinitely until a callback raises StopIteration.
callbacks (list of callables, default is Defaults to [ProgressCallback(verbose)]) – List of callables which are passed the current Algorithm object each iteration. Defaults to [ProgressCallback(verbose)].
verbose (0=quiet, 1=info, 2=debug) – Passed to the default callback to determine the verbosity of the printed output.

should_stop()#

default stopping criterion: number of iterations

The user can change this in concrete implementation of iterative algorithms.

property solution#: Returns the current solution.

property update_objective_interval#: gets the update_objective_interval

verbose_header(*_, **__)#: Do not use: this is being deprecated

verbose_output(*_, **__)#: Do not use: this is being deprecated

ISTA/PGD#

The Iterative Soft Thresholding Algorithm (ISTA) is also known as Proximal Gradient Descent (PGD). Note that in CIL, PGD is an alias of ISTA.

class cil.optimisation.algorithms.ISTA(initial, f, g, step_size=None, preconditioner=None, **kwargs)[source]#

Iterative Shrinkage-Thresholding Algorithm (ISTA), see [1], [2], is used to solve:

\[\min_{x} f(x) + g(x)\]

where \(f\) is differentiable, and \(g\) has a simple proximal operator.

In each update, the algorithm computes:

\[x_{k+1} = \mathrm{prox}_{\alpha g}(x_{k} - \alpha\nabla f(x_{k}))\]

where \(\alpha\) is the step_size.

Note

For a constant step size, \(\alpha\), convergence of ISTA is guaranteed if

\[\alpha\in(0, \frac{2}{L}),\]

where \(L\) is the Lipschitz constant of \(f\), see [4].

Parameters:

initial (DataContainer) – Initial guess of ISTA. \(x_{0}\)
f (Function) – Differentiable function. If None is passed, the algorithm will use the ZeroFunction.
g (Function or None) – Convex function with simple proximal operator. If None is passed, the algorithm will use the ZeroFunction.
step_size (positive float or child class of cil.optimisation.utilities.StepSizeRule()’, default = None) – Step size for the gradient step of ISTA. If a float is passed, this is used as a constant step size. If a child class of cil.optimisation.utilities.StepSizeRule() is passed then its method get_step_size() is called for each update. The default step_size is a constant \(\frac{0.99*2}{L}\) or 1 if f=None.
preconditioner (class with an apply method or a function that takes an initialised CIL function as an argument and modifies a provided gradient.) – This could be a custom preconditioner or one provided in preconditoner(). If None is passed then self.gradient_update will remain unmodified.

kwargs: Keyword arguments: Arguments from the base class Algorithm.

Note

If the function g is set to None or to the ZeroFunction then the ISTA algorithm is equivalent to Gradient Descent.

If the function f is set to None or to the ZeroFunction then the ISTA algorithm is equivalent to a Proximal Point Algorithm.

Note

The ISTA algorithm is equivalent to the Proximal Gradient Descent (PGD) algorithm. In CIL, the PGD algorithm is an alias for the ISTA algorithm.

Examples

\[\underset{x}{\mathrm{argmin}}\|A x - b\|^{2}_{2}\]

>>> f = LeastSquares(A, b=b, c=0.5)
>>> g = ZeroFunction()
>>> ig = Aop.domain
>>> ista = ISTA(initial = ig.allocate(), f = f, g = g)
>>> ista.run()

FISTA#

The Fast Iterative Soft Thresholding Algorithm (FISTA).

class cil.optimisation.algorithms.FISTA(initial, f, g, step_size=None, preconditioner=None, **kwargs)[source]#

Fast Iterative Shrinkage-Thresholding Algorithm (FISTA), see [1], [2], is used to solve:

\[\min_{x} f(x) + g(x)\]

where \(f\) is differentiable and \(g\) has a simple proximal operator.

In each update the algorithm completes the following steps:

\[\begin{split}\begin{cases} x_{k} = \mathrm{prox}_{\alpha g}(y_{k} - \alpha\nabla f(y_{k}))\\ t_{k+1} = \frac{1+\sqrt{1+ 4t_{k}^{2}}}{2}\\ y_{k+1} = x_{k} + \frac{t_{k}-1}{t_{k+1}}(x_{k} - x_{k-1}) \end{cases}\end{split}\]

where \(\alpha\) is the step_size.

Note that the above applies for \(k\geq 1\). For \(k=0\), \(x_{0}\) and \(y_{0}\) are initialised to initial, and \(t_{1}=1\).

Parameters:

initial (DataContainer) – Starting point of the algorithm
f (Function) – Differentiable function. If None is passed, the algorithm will use the ZeroFunction.
g (Function or None) – Convex function with simple proximal operator. If None is passed, the algorithm will use the ZeroFunction.
step_size (positive float or child class of cil.optimisation.utilities.StepSizeRule()’, default = None) – Step size for the gradient step of ISTA. If a float is passed, this is used as a constant step size. If a child class of cil.optimisation.utilities.StepSizeRule() is passed then it’s method get_step_size() is called for each update. The default step_size is a constant \(\frac{1}{L}\) or 1 if f=None.
preconditioner (class with an apply method or a function that takes an initialised CIL function as an argument and modifies a provided gradient.) – This could be a custom preconditioner or one provided in preconditoner(). If None is passed then self.gradient_update will remain unmodified.
kwargs (Keyword arguments) – Arguments from the base class Algorithm.

Note

If the function g is set to None or to the ZeroFunction then the FISTA algorithm is equivalent to Accelerated Gradient Descent by Nesterov ([9] algorithm 2.2.9).

If the function f is set to None or to the ZeroFunction then the FISTA algorithm is equivalent to Guler’s First Accelerated Proximal Point Method ([6] sec 2).

Note

FISTA is sometimes also known as Accelerated Proximal Gradient Descent (APGD). Note that in CIL, ‘APGD’ with the default Nesterov momentum is equivalent to ‘FISTA’ but the ‘APGD’ algorithm gives more options for momentum.

Examples

\[\underset{x}{\mathrm{argmin}}\|A x - b\|^{2}_{2}\]

>>> f = LeastSquares(A, b=b, c=0.5)
>>> g = ZeroFunction()
>>> ig = Aop.domain
>>> fista = FISTA(initial = ig.allocate(), f = f, g = g)
>>> fista.run()

APGD#

The Accelerated Proximal Gradient Descent Algorithm (APGD). This is an extension of the PGD/ISTA algorithm allowing you to either use a constant momemtum or a momentum that is updated at each iteration.

class cil.optimisation.algorithms.APGD(initial, f, g, step_size=None, preconditioner=None, momentum=None, **kwargs)[source]#

The Accelerated Proximal Gradient Descent (APGD) algorithm is used to solve:

\[\min_{x} f(x) + g(x)\]

where \(f\) is differentiable, and \(g\) has a simple proximal operator.

In each update, the algorithm computes:

\[x_{k+1} = \mathrm{prox}_{\alpha g}(y_{k} - \alpha\nabla f(y_{k}))\]

where \(\alpha\) is the step_size.

Then, \(y_{k+1}\), is then calculated from \(x_{k+1}\), based on a momentum rule. Note that \(y_0=x_0\). Users have flexibility to do this however they wish by passing to momentum a class that has an apply_montemum_in_APGD function which takes an intialised algorithm and returns the next iterate.

Currently, we have implemented options for a scalar momentum coefficient (see cil.optimisation.algorithms.APGD.ScalarMomentumCoefficient class.). In this case, the momentum term is added as follows:

\[y_{k+1} = x_{k+1} + M(x_{k+1} - x_{k}).\]

The default momentum coefficient is the Nesterov momentum coefficient which varies with each iteration.

Parameters:

initial (DataContainer) – Initial guess of ISTA. \(x_{0}\)
f (Function) – Differentiable function. If None is passed, the algorithm will use the ZeroFunction.
g (Function or None) – Convex function with simple proximal operator. If None is passed, the algorithm will use the ZeroFunction.
step_size (positive float or child class of cil.optimisation.utilities.StepSizeRule()’, the default step_size is a constant \(\frac{1}{L}\) or 1 if f=None.) – Step size for the gradient step of APGD. If a float is passed, this is used as a constant step size. If a child class of cil.optimisation.utilities.StepSizeRule() is passed then its method get_step_size() is called for each update.
preconditioner (class with an apply method or a function that takes an initialised CIL algorithm as an argument and modifies a provided gradient.) – This could be a custom preconditioner or one provided in preconditoner(). If None is passed then self.gradient_update will remain unmodified.
momentum (float or class with an apply_momentum_in_APGD function that takes an intialised CIL algorithm and returns the next iterate, default is Nesterov Momentum.) – This could be a custom momentum class or one provided by CIL (see cil.optimisation.algorithms.APGD.ScalarMomentumCoefficient). If a float is passed, scalar momentum is used with the float as a fixed constant.

kwargs: Keyword arguments: Arguments from the base class Algorithm.

Note

The APGD algorithm with (default) Nesterov momentum is equivalent to the FISTA algorithm.

set_up(initial, f, g, step_size, preconditioner, momentum)[source]#: Set up of the algorithm

update()[source]#

Performs a single iteration of APGD. For \(k\geq 1\):

\[\begin{split}x_{k+1} = \mathrm{prox}_{\alpha g}(y_{k} - \alpha\nabla f(y_{k}))\\\end{split}\]

where \(\alpha\) is the step size. From \(x_{k+1}\) (and any other information available in the algorithm class) the momentum function then calculates \(y_{k+1}\).

update_objective()[source]#: Updates the objective

\[f(x) + g(x)\]

calculate_objective_function_at_point(x)[source]#

Calculates the objective at a given point x

\[f(x) + g(x)\]

Parameters:: x (DataContainer)

get_output()[source]#: Returns the current solution.

property step_size#: Returns the most recently used step size. Note, if the step-size is set by a non-constant step size rule, you must use the algorithm run or update method before this getter will return the most recently used step size.

get_last_loss(return_all=False)#

Parameters:: return_all (Boolean, default is False) – If True, returns all the stored loss functions
Returns:: Last stored value of the loss function
Return type:: Float

get_last_objective(return_all=False)#

Parameters:: return_all (Boolean, default is False) – If True, returns all the stored loss functions
Returns:: Last stored value of the loss function
Return type:: Float

is_provably_convergent()#

Check if the algorithm is convergent based on the provable convergence criterion.

Returns:: Outcome of the convergence check
Return type:: Boolean

property iterations#: returns the iterations at which the objective has been evaluated

property loss#

returns a list of the values of the objective (alias of loss) during the iteration

The length of this list may be shorter than the number of iterations run when the update_objective_interval > 1

property max_iteration#: gets the maximum number of iterations

max_iteration_stop_criterion()#: Do not use: this is being deprecated

property objective#

returns a list of the values of the objective (alias of loss) during the iteration

The length of this list may be shorter than the number of iterations run when the update_objective_interval > 1

objective_to_dict(verbose=False)#: Internal function to save and print objective functions

objective_to_string(verbose=False)#: Do not use: this is being deprecated

run(iterations=None, callbacks: List[Callback] | None = None, verbose=1, **kwargs)#

run upto iterations with callbacks/logging.

For a demonstration of callbacks see TomographicImaging/CIL-Demos

Parameters:

iterations (int, default is None) – Number of iterations to run. If a positive infinity is passed, the algorithm will run indefinitely until a callback raises StopIteration.
callbacks (list of callables, default is Defaults to [ProgressCallback(verbose)]) – List of callables which are passed the current Algorithm object each iteration. Defaults to [ProgressCallback(verbose)].
verbose (0=quiet, 1=info, 2=debug) – Passed to the default callback to determine the verbosity of the printed output.

should_stop()#

default stopping criterion: number of iterations

The user can change this in concrete implementation of iterative algorithms.

property solution#: Returns the current solution.

property update_objective_interval#: gets the update_objective_interval

verbose_header(*_, **__)#: Do not use: this is being deprecated

verbose_output(*_, **__)#: Do not use: this is being deprecated

Current options for are based on the scalar momentum, with base class:

class cil.optimisation.algorithms.APGD.ScalarMomentumCoefficient[source]#

Abstract base class for MomentumCoefficient objects. The __call__ method of this class returns the momentum coefficient for the given iteration.

The call method of the ScalarMomentumCoefficient returns a scalar value. Given access to the algorithm object, the momentum coefficient can be a function of the algorithm state.

The apply_momentum_in_APGD function, updates the solution in the APGD algorithm as

\[y_{k+1}=x_{k+1}+M(x_{k+1}-x_{k}),\]

where \(M\) is the calculated scalar momentum value.

apply_momentum_in_APGD(algorithm, out=None)[source]#

Calculates the momentum cofficient, applies a scalar momentum update in the APGD algorithm and returns the next iterate.

Uses the calculation, \(y_{k+1}=x_{k+1}+M(x_{k+1}-x_{k})\), where \(M\) is the calculated scalar momentum value.

Parameters:

algorithm (instantiated CIL Algorithm) – The algorithm object.
out (DataContainer, default is None) – Object to contain the next iterate.

Implemented examples are:

class cil.optimisation.algorithms.APGD.ConstantMomentum(momentum)[source]#

MomentumCoefficient object that returns a constant momentum coefficient.

Parameters:: momentum (float) – The constant momentum coefficient.

class cil.optimisation.algorithms.APGD.NesterovMomentum[source]#

MomentumCoefficient object that returns the Nesterov momentum coefficient.

Starting with \(t=1\), the Nesterov algorithm updates with each iteration:

\[t_{k+1}=\dfrac{1}{2}(1+\sqrt{1+4t_{k}^2})\]

The momentum coefficient is then returned as \(\dfrac{t_{k}-1}{t_{k}}\).

PDHG#

class cil.optimisation.algorithms.PDHG(f, g, operator, tau=None, sigma=None, initial=None, gamma_g=None, gamma_fconj=None, **kwargs)[source]#

Primal Dual Hybrid Gradient (PDHG) algorithm, see [3], [5].

Parameters:

f (Function) – A convex function with a “simple” proximal method of its conjugate.
g (Function) – A convex function with a “simple” proximal.
operator (LinearOperator) – A Linear Operator.
sigma (positive float, or np.ndarray, DataContainer, BlockDataContainer, optional, default is 1.0/norm(K) or 1.0/ (tau*norm(K)**2) if tau is provided) – Step size for the dual problem.
tau (positive float, or np.ndarray, DataContainer, BlockDataContainer, optional, default is 1.0/norm(K) or 1.0/ (sigma*norm(K)**2) if sigma is provided) – Step size for the primal problem.
initial (DataContainer, or list or tuple of `DataContainer`s, optional, default is a DataContainer of zeros for both primal and dual variables) – Initial point for the PDHG algorithm. If just one data container is provided, it is used for the primal and the dual variable is initialised as zeros. If a list or tuple is passed, the first element is used for the primal variable and the second one for the dual variable. If either of the two is not provided, it is initialised as a DataContainer of zeros.
gamma_g (positive float, optional, default=None) – Strongly convex constant if the function g is strongly convex. Allows primal acceleration of the PDHG algorithm.
gamma_fconj (positive float, optional, default=None) – Strongly convex constant if the convex conjugate of f is strongly convex. Allows dual acceleration of the PDHG algorithm.
**kwargs –

update_objective_intervalint, optional, default=1
Evaluates objectives, e.g., primal/dual/primal-dual gap every update_objective_interval.

check_convergenceboolean, default=True
Checks scalar sigma and tau values satisfy convergence criterion and warns if not satisfied. Can be computationally expensive for custom sigma or tau values.

thetaFloat between 0 and 1, default 1.0
Relaxation parameter for the over-relaxation of the primal variable.

Example

In our CIL-Demos repository (TomographicImaging/CIL-Demos) you can find examples using the PDHG algorithm for different imaging problems, such as Total Variation denoising, Total Generalised Variation inpainting and Total Variation Tomography reconstruction. More examples can also be found in [7], [10].

Note

Currently, the strongly convex constants are passed as parameters of PDHG. In the future, these parameters will be properties of the corresponding functions.

Notes

A first-order primal-dual algorithm for convex optimization problems with known saddle-point structure with applications in imaging.

The general problem considered in the PDHG algorithm is the generic saddle-point problem

\[\min_{x\in X}\max_{y\in Y} \langle Kx, y \rangle + g(x) - f^{*}(x)\]

where \(f\) and \(g\) are convex functions with “simple” proximal operators.

\(X\) and \(Y\) are two two finite-dimensional vector spaces with an inner product and representing the domain of \(g\) and \(f^{*}\), the convex conjugate of \(f\), respectively.

The operator \(K\) is a continuous linear operator with operator norm defined as

\[\|K\| = \max\{ \|Kx\| : x\in X, \|x\|\leq1\}\]

The saddle point problem is decomposed into the primal problem:

\[\min_{x\in X} f(Kx) + g(x),\]

and its corresponding dual problem

\[\max_{y\in Y} - g^{*}(-K^{*}y) - f^{*}(y).\]

The PDHG algorithm consists of three steps:

gradient ascent step for the dual problem,
gradient descent step for the primal problem and
an over-relaxation of the primal variable.

\[y^{n+1} = \mathrm{prox}_{\sigma f^{*}}(y^{n} + \sigma K \bar{x}^{n})\]

\[x^{n+1} = \mathrm{prox}_{\tau g}(x^{n} - \tau K^{*}y^{n+1})\]

\[\bar{x}^{n+1} = x^{n+1} + \theta (x^{n+1} - x^{n})\]

Notes

Convergence is guaranteed if \(\theta\) = 1.0, the operator norm \(\|K\|\), the dual step size \(\sigma\) and the primal step size \(\tau\), satisfy the following inequality:

\[\tau \sigma \|K\|^2 < 4/3\]

For reference, see Li, Y. and Yan, M., 2022. On the improved conditions for some primal-dual algorithms. arXiv preprint arXiv:2201.00139.

By default, the step sizes \(\sigma\) and \(\tau\) are positive scalars and defined as below:
- If sigma is None and tau is None:
\[\sigma = \frac{1}{\|K\|}, \tau = \frac{1}{\|K\|}\]
- If tau is None:
\[\tau = \frac{1}{\sigma\|K\|^{2}}\]
- If sigma is None:
\[\sigma = \frac{1}{\tau\|K\|^{2}}\]
To monitor the convergence of the algorithm, we compute the primal/dual objectives and the primal-dual gap in update_objective().

The primal objective is

\[f(Kx) + g(x)\]

and the dual objective is

\[- g^{*}(-K^{*}y) - f^{*}(y)\]

The primal-dual gap (or duality gap) is

\[f(Kx) + g(x) + g^{*}(-K^{*}y) + f^{*}(y)\]

and measures how close is the primal-dual pair (x,y) to the primal-dual solution. It is always non-negative and is used to monitor convergence of the PDHG algorithm. For more information, see Duality Gap.

Note

The primal objective is printed if verbose=1, pdhg.run(verbose=1).
All the objectives are printed if verbose=2, pdhg.run(verbose=2).

Computing these objectives can be costly, so it is better to compute every some iterations. To do this, use update_objective_interval = #number.

PDHG algorithm can be accelerated if the functions \(f^{*}\) and/or \(g\) are strongly convex. In these cases, the step-sizes \(\sigma\) and \(\tau\) are updated using the update_step_sizes() method. A function \(f\) is strongly convex with constant \(\gamma>0\) if

\[f(x) - \frac{\gamma}{2}\|x\|^{2} \quad\mbox{ is convex. }\]
- For instance the function \(\frac{1}{2}\|x\|^{2}_{2}\) is \(\gamma\) strongly convex for \(\gamma\in(-\infty,1]\). We say it is 1-strongly convex because it is the largest constant for which \(f - \frac{1}{2}\|\cdot\|^{2}\) is convex.
- The \(\|\cdot\|_{1}\) norm is not strongly convex. For more information, see Strongly Convex.
- If \(g\) is strongly convex with constant \(\gamma\) then the step-sizes \(\sigma\), \(\tau\) and \(\theta\) are updated as:
\begin{aligned} \theta_{n} & = \frac{1}{\sqrt{1 + 2\gamma\tau_{n}}}\\ \tau_{n+1} & = \theta_{n}\tau_{n}\\ \sigma_{n+1} & = \frac{\sigma_{n}}{\theta_{n}} \end{aligned}
- If \(f^{*}\) is strongly convex, we swap \(\sigma\) with \(\tau\).

Note

The case where both functions are strongly convex is not available at the moment.

property tau#: The primal step-size

property sigma#: The dual step-size

property theta#: The relaxation parameter for the over-relaxation of the primal variable

property gamma_g#: The strongly convex constant for the function g

property gamma_fconj#: The strongly convex constant for the convex conjugate of the function f

set_gamma_g(value)[source]#

Set the value of the strongly convex constant for function g

Parameters:: value (a positive number or None)

set_gamma_fconj(value)[source]#

Set the value of the strongly convex constant for the convex conjugate of function f

Parameters:: value (a positive number or None)

set_up(f, g, operator, tau=None, sigma=None, initial=None)[source]#

Initialisation of the algorithm

Parameters:

f (Function) – A convex function with a “simple” proximal method of its conjugate.
g (Function) – A convex function with a “simple” proximal.
operator (LinearOperator) – A Linear Operator.
sigma (positive float, or np.ndarray, DataContainer, BlockDataContainer, optional, default is 1.0/norm(K) or 1.0/ (tau*norm(K)**2) if tau is provided) – Step size for the dual problem.
tau (positive float, or np.ndarray, DataContainer, BlockDataContainer, optional, default is 1.0/norm(K) or 1.0/ (sigma*norm(K)**2) if sigma is provided) – Step size for the primal problem.
initial (DataContainer, or list or tuple of `DataContainer`s, optional, default is a DataContainer of zeros for both primal and dual variables) – Initial point for the PDHG algorithm. If just one data container is provided, it is used for the primal and the dual variable is initialised as zeros. If a list or tuple is passed, the first element is used for the primal variable and the second one for the dual variable. If either of the two is not provided, it is initialised as a DataContainer of zeros.

get_output()[source]#: Returns the current solution.

update()[source]#: Performs a single iteration of the PDHG algorithm

check_convergence()[source]#

Check whether convergence criterion for PDHG is satisfied with scalar values of tau and sigma

Returns:

Boolean – True if convergence criterion is satisfied. False if not satisfied or convergence is unknown.
Reference
———-
Li, Y. and Yan, M., 2022. On the improved conditions for some primal-dual algorithms. arXiv preprint arXiv (2201.00139.)

set_step_sizes(sigma=None, tau=None)[source]#

Sets sigma and tau step-sizes for the PDHG algorithm. The step sizes can be either scalar or array-objects.

Parameters:

sigma (positive float, or np.ndarray, DataContainer, BlockDataContainer, optional, default=None) – Step size for the dual problem.
tau (positive float, or np.ndarray, DataContainer, BlockDataContainer, optional, default=None) – Step size for the primal problem.

The user can set either, both or none. Values passed by the user will be accepted as long as they are positive numbers, or correct shape array like objects.

update_step_sizes()[source]#: Updates step sizes in the cases of primal or dual acceleration using the strongly convexity property. The case where both functions are strongly convex is not available at the moment.

update_objective()[source]#: Evaluates the primal objective, the dual objective and the primal-dual gap.

property objective#

returns a list of the values of the objective (alias of loss) during the iteration

The length of this list may be shorter than the number of iterations run when the update_objective_interval > 1

get_last_loss(return_all=False)#

Parameters:: return_all (Boolean, default is False) – If True, returns all the stored loss functions
Returns:: Last stored value of the loss function
Return type:: Float

get_last_objective(return_all=False)#

Parameters:: return_all (Boolean, default is False) – If True, returns all the stored loss functions
Returns:: Last stored value of the loss function
Return type:: Float

is_provably_convergent()#

Check if the algorithm is convergent based on the provable convergence criterion.

Returns:: Outcome of the convergence check
Return type:: Boolean

property iterations#: returns the iterations at which the objective has been evaluated

property loss#

returns a list of the values of the objective (alias of loss) during the iteration

The length of this list may be shorter than the number of iterations run when the update_objective_interval > 1

property max_iteration#: gets the maximum number of iterations

max_iteration_stop_criterion()#: Do not use: this is being deprecated

objective_to_dict(verbose=False)#: Internal function to save and print objective functions

objective_to_string(verbose=False)#: Do not use: this is being deprecated

run(iterations=None, callbacks: List[Callback] | None = None, verbose=1, **kwargs)#

run upto iterations with callbacks/logging.

For a demonstration of callbacks see TomographicImaging/CIL-Demos

Parameters:

iterations (int, default is None) – Number of iterations to run. If a positive infinity is passed, the algorithm will run indefinitely until a callback raises StopIteration.
callbacks (list of callables, default is Defaults to [ProgressCallback(verbose)]) – List of callables which are passed the current Algorithm object each iteration. Defaults to [ProgressCallback(verbose)].
verbose (0=quiet, 1=info, 2=debug) – Passed to the default callback to determine the verbosity of the printed output.

should_stop()#

default stopping criterion: number of iterations

The user can change this in concrete implementation of iterative algorithms.

property solution#: Returns the current solution.

property update_objective_interval#: gets the update_objective_interval

verbose_header(*_, **__)#: Do not use: this is being deprecated

verbose_output(*_, **__)#: Do not use: this is being deprecated

LADMM#

class cil.optimisation.algorithms.LADMM(f=None, g=None, operator=None, tau=None, sigma=1.0, initial=None, **kwargs)[source]#

LADMM is the Linearized Alternating Direction Method of Multipliers (LADMM).

The general form of ADMM is given by the following optimization problem:

\[\min_{x} f(x) + g(y), \text{ subject to } Ax + By = b\]

In CIL, we have implemented the case where \(A = Id\), \(B = -K\), \(b = 0\) which gives

\[\min_x f(Kx) + g(x).\]

The algorithm is given by the following iteration, for \(k\geq 1\):

\[\begin{split}\begin{cases} x_{k} = \mathrm{prox}_{\tau f} \left(x_{k-1} - \dfrac{\tau}{\sigma} A_{T}\left(Ax_{k-1} - z_{k-1} + u_{k-1} \right) \right)\\ z_{k} = \mathrm{prox}_{\sigma g} \left(Ax_{k} + u_{k-1}\right) \\ u_{k} = u_{k-1} + Ax_{k} - z_{k} \end{cases}\end{split}\]

where \(\mathrm{prox}_{\tau f}\) is the proximal operator of \(f\) and \(\mathrm{prox}_{\sigma g}\) is the proximal operator of \(g\).

Parameters:

operator (CIL Linear Operator) – Operator \(K\) in the objective function
f (CIL Function) – Convex function with “simple” proximal
g (CIL Function) – Convex function with “simple” proximal
sigma (float, positive, defaults to 1.) – Positive step size parameter
tau (float, positive, defaults to \(\frac{\sigma}{\|K\|^{2}}\)) – Positive step size parameter
initial (DataContainer, defaults to DataContainer filled with zeros) – Initial guess

Note

This implementation of ADMM minimises the same objective function as the Primal-Dual Hybrid Gradient (PDHG) method. The main algorithmic difference is that in ADMM we compute the proximal of \(f\) and \(g\) where in the PDHG this is a proximal-conjugate and proximal.

Note

Reference (Section 8) : O’Connor, D., Vandenberghe, L. On the equivalence of the primal-dual hybrid gradient method and Douglas–Rachford splitting. Math. Program. 179, 85–108 (2020). https://doi.org/10.1007/s10107-018-1321-1

set_up(f, g, operator, tau=None, sigma=1.0, initial=None)[source]#: Set up of the algorithm.

update()[source]#: Performs a single iteration of the LADMM algorithm

update_objective()[source]#: Update the objective function value

get_last_loss(return_all=False)#

Parameters:: return_all (Boolean, default is False) – If True, returns all the stored loss functions
Returns:: Last stored value of the loss function
Return type:: Float

get_last_objective(return_all=False)#

Parameters:: return_all (Boolean, default is False) – If True, returns all the stored loss functions
Returns:: Last stored value of the loss function
Return type:: Float

get_output()#

Returns the current solution.

Returns:: The current solution
Return type:: DataContainer

is_provably_convergent()#

Check if the algorithm is convergent based on the provable convergence criterion.

Returns:: Outcome of the convergence check
Return type:: Boolean

property iterations#: returns the iterations at which the objective has been evaluated

property loss#

returns a list of the values of the objective (alias of loss) during the iteration

The length of this list may be shorter than the number of iterations run when the update_objective_interval > 1

property max_iteration#: gets the maximum number of iterations

max_iteration_stop_criterion()#: Do not use: this is being deprecated

property objective#

returns a list of the values of the objective (alias of loss) during the iteration

The length of this list may be shorter than the number of iterations run when the update_objective_interval > 1

objective_to_dict(verbose=False)#: Internal function to save and print objective functions

objective_to_string(verbose=False)#: Do not use: this is being deprecated

run(iterations=None, callbacks: List[Callback] | None = None, verbose=1, **kwargs)#

run upto iterations with callbacks/logging.

For a demonstration of callbacks see TomographicImaging/CIL-Demos

Parameters:

iterations (int, default is None) – Number of iterations to run. If a positive infinity is passed, the algorithm will run indefinitely until a callback raises StopIteration.
callbacks (list of callables, default is Defaults to [ProgressCallback(verbose)]) – List of callables which are passed the current Algorithm object each iteration. Defaults to [ProgressCallback(verbose)].
verbose (0=quiet, 1=info, 2=debug) – Passed to the default callback to determine the verbosity of the printed output.

should_stop()#

default stopping criterion: number of iterations

The user can change this in concrete implementation of iterative algorithms.

property solution#: Returns the current solution.

property update_objective_interval#: gets the update_objective_interval

verbose_header(*_, **__)#: Do not use: this is being deprecated

verbose_output(*_, **__)#: Do not use: this is being deprecated

PD3O#

class cil.optimisation.algorithms.PD3O(f, g, h, operator, delta=None, gamma=None, initial=None, **kwargs)[source]#

Primal Dual Three Operator Splitting (PD3O) algorithm, see “A New Primal–Dual Algorithm for Minimizing the Sum of Three Functions with a Linear Operator”. This is a primal dual algorithm for minimising \(f(x)+g(x)+h(Ax)\) where all functions are proper, lower semi-continuous and convex, \(f\) should be differentiable with a Lipschitz continuous gradient and \(A\) is a bounded linear operator.

Parameters:

f (Function) – A smooth function with Lipschitz continuous gradient.
g (Function) – A convex function with a computationally computable proximal.
h (Function) – A composite convex function.
operator (Operator) – Bounded linear operator
delta (Float, optional, default is 1./(gamma*operator.norm()**2)) – The dual step-size
gamma (Float, optional, default is 2.0/f.L) – The primal step size
initial (DataContainer, optional default is a container of zeros, in the domain of the operator) – Initial point for the algorithm.

Note

Note that currently \(f\) in PD3O can be any type of deterministic function, an ApproximateGradientSumFunction or a scaled ApproximateGradientSumFunction.

Note

We have not implemented or tested when \(f\) is a stochastic function (ApproximateGradientSumFunction) wrapped as a SumFunction or TranslatedFunction.

Yan, M. A New Primal–Dual Algorithm for Minimizing the Sum of Three Functions with a Linear Operator. J Sci Comput 76, 1698–1717 (2018). https://doi.org/10.1007/s10915-018-0680-3

get_last_loss(return_all=False)#

Parameters:: return_all (Boolean, default is False) – If True, returns all the stored loss functions
Returns:: Last stored value of the loss function
Return type:: Float

get_last_objective(return_all=False)#

Parameters:: return_all (Boolean, default is False) – If True, returns all the stored loss functions
Returns:: Last stored value of the loss function
Return type:: Float

get_output()#

Returns the current solution.

Returns:: The current solution
Return type:: DataContainer

is_provably_convergent()#

Check if the algorithm is convergent based on the provable convergence criterion.

Returns:: Outcome of the convergence check
Return type:: Boolean

property iterations#: returns the iterations at which the objective has been evaluated

property loss#

returns a list of the values of the objective (alias of loss) during the iteration

The length of this list may be shorter than the number of iterations run when the update_objective_interval > 1

property max_iteration#: gets the maximum number of iterations

max_iteration_stop_criterion()#: Do not use: this is being deprecated

property objective#

returns a list of the values of the objective (alias of loss) during the iteration

The length of this list may be shorter than the number of iterations run when the update_objective_interval > 1

objective_to_dict(verbose=False)#: Internal function to save and print objective functions

objective_to_string(verbose=False)#: Do not use: this is being deprecated

run(iterations=None, callbacks: List[Callback] | None = None, verbose=1, **kwargs)#

run upto iterations with callbacks/logging.

For a demonstration of callbacks see TomographicImaging/CIL-Demos

Parameters:

iterations (int, default is None) – Number of iterations to run. If a positive infinity is passed, the algorithm will run indefinitely until a callback raises StopIteration.
callbacks (list of callables, default is Defaults to [ProgressCallback(verbose)]) – List of callables which are passed the current Algorithm object each iteration. Defaults to [ProgressCallback(verbose)].
verbose (0=quiet, 1=info, 2=debug) – Passed to the default callback to determine the verbosity of the printed output.

set_up(f, g, h, operator, delta=None, gamma=None, initial=None, **kwargs)[source]#: Set up the algorithm

should_stop()#

default stopping criterion: number of iterations

The user can change this in concrete implementation of iterative algorithms.

property solution#: Returns the current solution.

property update_objective_interval#: gets the update_objective_interval

verbose_header(*_, **__)#: Do not use: this is being deprecated

verbose_output(*_, **__)#: Do not use: this is being deprecated

update()[source]#: Performs a single iteration of the PD3O algorithm

update_objective()[source]#: Evaluates the primal objective

Algorithms (Stochastic)#

Consider optimisation problems that take the form of a separable sum:

\[\min_{x} f(x)+g(x) = \min_{x} \sum_{i=0}^{n-1} f_{i}(x) + g(x) = \min_{x} (f_{0}(x) + f_{1}(x) + ... + f_{n-1}(x))+g(x)\]

where \(n\) is the number of functions. Where there is a large number of \(f_i\) or their gradients are expensive to calculate, stochastic optimisation methods could prove more efficient. There is a growing range of Stochastic optimisation algorithms available with potential benefits of faster convergence in number of iterations or in computational cost. This is an area of continued development for CIL and, depending on the properties of the \(f_i\) and the regulariser \(g\), there is a range of different options for the user.

SPDHG#

Stochastic Primal Dual Hybrid Gradient (SPDHG) is a stochastic version of PDHG and deals with optimisation problems of the form:

\[\min_{x} f(Kx) + g(x) = \min_{x} \sum f_i(K_i x) + g(x)\]

where \(f_i\) and the regulariser \(g\) need only be proper, convex and lower semi-continuous ( i.e. do not need to be differentiable). Each iteration considers just one index of the sum, potentially reducing computational cost. For more examples see our [user notebooks]( vais-ral/CIL-Demos).

class cil.optimisation.algorithms.SPDHG(f=None, g=None, operator=None, tau=None, sigma=None, initial=None, sampler=None, prob_weights=None, **kwargs)[source]#

Stochastic Primal Dual Hybrid Gradient (SPDHG) solves separable optimisation problems of the type:

\[\min_{x} f(Kx) + g(x) = \min_{x} \sum f_i(K_i x) + g(x)\]

where \(f_i\) and the regulariser \(g\) need to be proper, convex and lower semi-continuous.

Parameters:

f (BlockFunction) – Each must be a convex function with a “simple” proximal method of its conjugate
g (Function) – A convex function with a “simple” proximal
operator (BlockOperator) – BlockOperator must contain Linear Operators
tau (positive float, optional) – Step size parameter for the primal problem. If None will be computed by algorithm, see note for details.
sigma (list of positive float, optional) – List of Step size parameters for dual problem. If None will be computed by algorithm, see note for details.
initial (DataContainer, optional) – Initial point for the SPDHG algorithm. The default value is a zero DataContainer in the range of the operator.
gamma (float, optional) – Parameter controlling the trade-off between the primal and dual step sizes
sampler (cil.optimisation.utilities.Sampler, optional) – A Sampler controllingthe selection of the next index for the SPDHG update. If None, a sampler will be created for uniform random sampling with replacement. See notes.
prob_weights (list of floats, optional,) – Consider that the sampler is called a large number of times this argument holds the expected number of times each index would be called, normalised to 1. Note that this should not be passed if the provided sampler has it as an attribute: if the sampler has a prob_weight attribute it will take precedence on this parameter. Should be a list of floats of length num_indices that sum to 1. If no sampler with prob_weights is passed, it defaults to [1/len(operator)]*len(operator).

Note

The sampler can be an instance of the cil.optimisation.utilities.Sampler class or a custom class with the __next__(self) method implemented, which outputs an integer index from {1, …, len(operator)}.

Note

“Random sampling with replacement” will select the next index with equal probability from 1 - len(operator).

Example

>>> data = dataexample.SIMULATED_PARALLEL_BEAM_DATA.get()
>>> data.reorder('astra')
>>> data = data.get_slice(vertical='centre')
>>> ig = ag.get_ImageGeometry()

>>> data_partitioned = data.partition(num_batches=10, mode='staggered')
>>> A_partitioned = ProjectionOperator(ig, data_partitioned.geometry, device = "cpu")

>>> F = BlockFunction(*[L2NormSquared(b=data_partitioned[i])
                        for i in range(10)])

>>> alpha = 0.025
>>> G = alpha * TotalVariation()
>>> spdhg = SPDHG(f=F, g=G, operator=A_partitioned, sampler=Sampler.sequential(len(A)),
                  initial=A.domain_geometry().allocate(1), update_objective_interval=10)
>>> spdhg.run(100)

Example

Further examples of usage see the [CIL demos.](vais-ral/CIL-Demos)

Note

When setting sigma and tau, there are 4 possible cases considered by setup function. In all cases the probabilities \(p_i\) are set by a default or user defined sampler:

Case 1: If neither sigma or tau are provided then sigma is set using the formula:

\[\sigma_i= \frac{0.99}{\|K_i\|^2}\]

and tau is set as per case 2

Case 2: If sigma is provided but not tau then tau is calculated using the formula

\[\tau = 0.99\min_i( \frac{p_i}{ (\sigma_i \|K_i\|^2) })\]

Case 3: If tau is provided but not sigma then sigma is calculated using the formula

\[\sigma_i= \frac{0.99 p_i}{\tau\|K_i\|^2}\]

Case 4: Both sigma and tau are provided.

Note

Convergence is guaranteed provided that [2, eq. (12)]:

\[\|\sigma[i]^{1/2} K[i] \tau^{1/2} \|^2 < p_i \text{ for all } i\]

References

[1]”Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications”, Chambolle, Antonin, Matthias J. Ehrhardt, Peter Richtárik, and Carola-Bibiane Schonlieb, SIAM Journal on Optimization 28, no. 4 (2018): 2783-2808. https://doi.org/10.1137/17M1134834

[2]”Faster PET reconstruction with non-smooth priors by randomization and preconditioning”, Matthias J Ehrhardt, Pawel Markiewicz and Carola-Bibiane Schönlieb, Physics in Medicine & Biology, Volume 64, Number 22, 2019. https://doi.org/10.1088/1361-6560/ab3d07

set_up(f, g, operator, sigma=None, tau=None, initial=None, sampler=None, prob_weights=None, **deprecated_kwargs)[source]#: set-up of the algorithm

set_step_sizes_from_ratio(gamma=1.0, rho=0.99)[source]#

Sets gamma, the step-size ratio for the SPDHG algorithm. Currently gamma takes a scalar value.

The step sizes sigma and tau are set using the equations:

\[\sigma_i= \frac{\gamma\rho }{\|K_i\|^2}\]

\[\tau = \rho\min_i([ \frac{p_i }{\sigma_i \|K_i\|^2})\]

Parameters:

gamma (Positive float) – parameter controlling the trade-off between the primal and dual step sizes
rho (Positive float) – parameter controlling the size of the product \(\sigma\tau\)

set_step_sizes(sigma=None, tau=None)[source]#

Sets sigma and tau step-sizes for the SPDHG algorithm after the initial set-up. The step sizes can be either scalar or array-objects.

When setting sigma and tau, there are 4 possible cases considered by setup function:

Case 1: If neither sigma or tau are provided then sigma is set using the formula:

\[\sigma_i= \frac{0.99}{\|K_i\|^2}\]

and tau is set as per case 2

Case 2: If sigma is provided but not tau then tau is calculated using the formula

\[\tau = 0.99\min_i( \frac{p_i}{ (\sigma_i \|K_i\|^2) })\]

Case 3: If tau is provided but not sigma then sigma is calculated using the formula

\[\sigma_i= \frac{0.99 p_i}{\tau\|K_i\|^2}\]

Case 4: Both sigma and tau are provided.

Parameters:

sigma (list of positive float, optional, default= see docstring) – List of Step size parameters for dual problem
tau (positive float, optional, default= see docstring) – Step size parameter for primal problem

check_convergence()[source]#

Checks whether convergence criterion for SPDHG is satisfied with the current scalar values of tau and sigma

Returns:: True if convergence criterion is satisfied. False if not satisfied or convergence is unknown.
Return type:: Boolean

Note

Convergence criterion currently can only be checked for scalar values of tau.

Note

This checks the convergence criterion. Numerical errors may mean some sigma and tau values that satisfy the convergence criterion may not converge. Alternatively, step sizes outside the convergence criterion may still allow (fast) convergence.

update()[source]#: Runs one iteration of SPDHG

update_objective()[source]#: calculates the objective with the current solution

property objective#

The saved primal objectives.

Returns:: The saved primal objectives from update_objective. The number of saved values depends on the update_objective_interval kwarg.
Return type:: list

property dual_objective#

The saved dual objectives.

Returns:: The saved dual objectives from update_objective. The number of saved values depends on the update_objective_interval kwarg.
Return type:: list

property primal_dual_gap#

The saved primal-dual gap.

Returns:: The saved primal dual gap from update_objective. The number of saved values depends on the update_objective_interval kwarg.
Return type:: list

get_last_loss(return_all=False)#

Parameters:: return_all (Boolean, default is False) – If True, returns all the stored loss functions
Returns:: Last stored value of the loss function
Return type:: Float

get_last_objective(return_all=False)#

Parameters:: return_all (Boolean, default is False) – If True, returns all the stored loss functions
Returns:: Last stored value of the loss function
Return type:: Float

get_output()#

Returns the current solution.

Returns:: The current solution
Return type:: DataContainer

is_provably_convergent()#

Check if the algorithm is convergent based on the provable convergence criterion.

Returns:: Outcome of the convergence check
Return type:: Boolean

property iterations#: returns the iterations at which the objective has been evaluated

property loss#

returns a list of the values of the objective (alias of loss) during the iteration

The length of this list may be shorter than the number of iterations run when the update_objective_interval > 1

property max_iteration#: gets the maximum number of iterations

max_iteration_stop_criterion()#: Do not use: this is being deprecated

objective_to_dict(verbose=False)#: Internal function to save and print objective functions

objective_to_string(verbose=False)#: Do not use: this is being deprecated

run(iterations=None, callbacks: List[Callback] | None = None, verbose=1, **kwargs)#

run upto iterations with callbacks/logging.

For a demonstration of callbacks see TomographicImaging/CIL-Demos

Parameters:

iterations (int, default is None) – Number of iterations to run. If a positive infinity is passed, the algorithm will run indefinitely until a callback raises StopIteration.
callbacks (list of callables, default is Defaults to [ProgressCallback(verbose)]) – List of callables which are passed the current Algorithm object each iteration. Defaults to [ProgressCallback(verbose)].
verbose (0=quiet, 1=info, 2=debug) – Passed to the default callback to determine the verbosity of the printed output.

should_stop()#

default stopping criterion: number of iterations

The user can change this in concrete implementation of iterative algorithms.

property solution#: Returns the current solution.

property update_objective_interval#: gets the update_objective_interval

verbose_header(*_, **__)#: Do not use: this is being deprecated

verbose_output(*_, **__)#: Do not use: this is being deprecated

Approximate gradient methods#

Alternatively, consider that, in addition to the functions \(f_i\) and the regulariser \(g\) being proper, convex and lower semi-continuous, the \(f_i\) are differentiable. In this case we consider stochastic methods that replace a gradient calculation in a deterministic algorithm with a, potentially cheaper to calculate, approximate gradient. For example, when \(g(x)=0\), the standard Gradient Descent algorithm utilises iterations of the form

\[x_{k+1}=x_k-\alpha \nabla f(x_k) =x_k-\alpha \sum_{i=0}^{n-1}\nabla f_i(x_k).\]

\(\nabla f(x_k)=\sum_{i=0}^{n-1}\nabla f_i(x_k)\) with \(n \nabla f_i(x_k)\), for an index \(i\) which changes each iteration, leads to the well known stochastic gradient descent algorithm.

Replacing, \(\nabla f(x_k)=\sum_{i=0}^{n-1}\nabla f_i(x_k)\) with \(n \nabla f_i(x_k)\), for an index \(i\) which changes each iteration, leads to the well known stochastic gradient descent algorithm.

In addition, if \(g(x)\neq 0\) and has a calculable proximal ( need not be differentiable) one can consider ISTA iterations:

\[x_{k+1}=prox_{\alpha g}(x_k-\alpha \nabla f(x_k) )=prox_{\alpha g}(x_k-\alpha \sum_{i=0}^{n-1}\nabla f_i(x_k))\]

and again replacing \(\nabla f(x_k)=\sum_{i=0}^{n-1}\nabla f_i(x_k)\) with an approximate gradient.

In a similar way, plugging approximate gradient calculations into deterministic algorithms can lead to a range of stochastic algorithms. In the following table, the left hand column has the approximate gradient function subclass, Approximate Gradient base class the header row has one of CIL’s deterministic optimisation algorithm and the body of the table has the resulting stochastic algorithm.

*In development

The stochastic gradient functions can be found listed under functions in the documentation.

Stochastic Gradient Descent Example#

The below is an example of Stochastic Gradient Descent built of the SGFunction and Gradient Descent algorithm:

from cil.optimisation.utilities import Sampler
from cil.optimisation.algorithms import GD
from cil.optimisation.functions import LeastSquares, SGFunction
from cil.utilities import dataexample
from cil.plugins.astra.operators import ProjectionOperator

# get the data
data = dataexample.SIMULATED_PARALLEL_BEAM_DATA.get()
data.reorder('astra')
data = data.get_slice(vertical='centre')

# create the geometries
ag = data.geometry
ig = ag.get_ImageGeometry()

# partition the data and build the projectors
n_subsets = 10
partitioned_data = data.partition(n_subsets, 'sequential')
A_partitioned = ProjectionOperator(ig, partitioned_data.geometry, device = "cpu")

# create the list of functions for the stochastic sum
list_of_functions = [LeastSquares(Ai, b=bi) for Ai,bi in zip(A_partitioned, partitioned_data)]

#define the sampler and the stochastic gradient function
sampler = Sampler.staggered(len(list_of_functions), stride=2)
f = SGFunction(list_of_functions, sampler=sampler)

#set up and run the gradient descent algorithm
alg = GD(initial=ig.allocate(0), objective_function=f, step_size=1/f.L)
alg.run(300)

Note#

All the approximate gradients written in CIL are of a similar order of magnitude to the full gradient calculation. For example, in the SGFunction we approximate the full gradient by \(n\nabla f_i\) for an index \(i\) given by the sampler. The multiplication by \(n\) is a choice to more easily allow comparisons between stochastic and non-stochastic methods and between stochastic methods with varying numbers of subsets. The multiplication ensures that the (SAGA, SGD, and SVRG and LSVRG) approximate gradients are an unbiased estimator of the full gradient ie \(\mathbb{E}\left[\tilde\nabla f(x)\right] =\nabla f(x)\).

This has an implication when choosing step sizes. For example, a suitable step size for GD with a SGFunction could be \(\propto 1/(L_{max}*n)\), where \(L_{max}\) is the largest Lipschitz constant of the list of functions in the SGFunction and the additional factor of \(n\) reflects this multiplication by \(n\) in the approximate gradient.

Memory requirements#

Note that the approximate gradient methods have different memory requirements: + The SGFunction has the same requirements as a SumFunction, so no increased memory usage + SAGFunction and SAGAFunction both store n+3 times the image size in memory to store the last calculated gradient for each function in the sum and for intermediary calculations. + SVRGFunction and LSVRGFunction with the default store_gradients = False store 4 times the image size in memory, including the “snapshot” point and gradient. If store_gradients = True, some computational effort is saved, at the expensive of stored memory n+4 times the image size.

Operators#

The two most important methods are direct and adjoint methods that describe the result of applying the operator, and its adjoint respectively, onto a compatible DataContainer input. The output is another DataContainer object or subclass hereof. An important special case is to represent the tomographic forward and backprojection operations.

Operator base classes#

All operators extend the Operator class. A special class is the LinearOperator which represents an operator for which the adjoint operation is defined. A ScaledOperator represents the multiplication of any operator with a scalar.

class cil.optimisation.operators.Operator(domain_geometry, **kwargs)[source]#

Operator that maps from a space X -> Y

Parameters:

domain_geometry (ImageGeometry or AcquisitionGeometry) – domain of the operator
range_geometry (ImageGeometry or AcquisitionGeometry, optional, default None) – range of the operator

is_linear()[source]#: Returns if the operator is linear :rtype: Bool

is_orthogonal()[source]#: Returns if the operator is orthogonal :rtype: Bool

direct(x, out=None)[source]#

Calls the operator

Parameters:

x (DataContainer or BlockDataContainer) – Element in the domain of the Operator
out (DataContainer or BlockDataContainer, default None) – If out is not None the output of the Operator will be filled in out, otherwise a new object is instantiated and returned.

Return type:

DataContainer or BlockDataContainer containing the result.

norm(**kwargs)[source]#

Returns the norm of the Operator. On first call the norm will be calculated using the operator’s calculate_norm method. Subsequent calls will return the cached norm.

Returns:: norm
Return type:: positive:float

set_norm(norm=None)[source]#

Sets the norm of the operator to a custom value.

Parameters:: norm (float, optional) – Positive real valued number or None

Note

The passed values are cached so that when self.norm() is called, the saved value will be returned and not calculated via the power method. If None is passed, the cache is cleared prompting the function to call the power method to calculate the norm the next time self.norm() is called.

calculate_norm()[source]#: Returns the norm of the Operator. Note that this gives a NotImplementedError if the SumOperator is not linear. :returns: Scalar :rtype: the norm of the Operator

range_geometry()[source]#: Returns the range of the Operator: Y space

domain_geometry()[source]#: Returns the domain of the Operator: X space

class cil.optimisation.operators.LinearOperator(domain_geometry, **kwargs)[source]#

Linear operator that maps from a space X <-> Y

Parameters:

domain_geometry (ImageGeometry or AcquisitionGeometry) – domain of the operator
range_geometry (ImageGeometry or AcquisitionGeometry, optional, default None) – range of the operator

is_linear()[source]#: Returns if the operator is linear

adjoint(x, out=None)[source]#

Returns the adjoint/inverse operation evaluated at the point \(x\)

Parameters:

x (DataContainer or BlockDataContainer) – Element in the domain of the Operator
out (DataContainer or BlockDataContainer, default None) – If out is not None the output of the Operator will be filled in out, otherwise a new object is instantiated and returned.

Return type:

DataContainer or BlockDataContainer containing the result.

Note

Only available to linear operators

static PowerMethod(operator, max_iteration=10, initial=None, tolerance=1e-05, return_all=False, method='auto', seed=None)[source]#

Power method or Power iteration algorithm

The Power method computes the largest (dominant) eigenvalue of a matrix in magnitude, e.g., absolute value in the real case and modulus in the complex case.

Parameters:

operator (LinearOperator)
max_iteration (positive:int, default=10) – Number of iterations for the Power method algorithm.
initial (DataContainer, default = None, optional) – Starting point for the Power method.
tolerance (positive:float, default = 1e-5) – Stopping criterion for the Power method. Check if two consecutive eigenvalue evaluations are below the tolerance.
return_all (boolean, default = False) – Toggles the verbosity of the return
method (string one of “auto”, “composed_with_adjoint” and “direct_only”, default = “auto”) – The default auto lets the code choose the method, this can be specified with “direct_only” or “composed_with_adjoint”
seed (int or default = None, optional) – If initial is not provided, this random is seed used by the random number generator to create the initial starting point

Returns:

dominant eigenvalue (positive:float)
number of iterations (positive:int) – Number of iterations run. Only returned if return_all is True.
eigenvector (DataContainer) – Corresponding eigenvector of the dominant eigenvalue. Only returned if return_all is True.
list of eigenvalues (list) – List of eigenvalues. Only returned if return_all is True.
convergence (boolean) – Check on wether the difference between the last two iterations is less than tolerance. Only returned if return_all is True.

Note

The power method contains two different algorithms chosen by the method flag.

In the case method=”direct_only”, for operator, \(A\), the power method computes the iterations \(x_{k+1} = A (x_k/\|x_{k}\|)\) initialised with a random vector \(x_0\) and returning the largest (dominant) eigenvalue in magnitude given by \(\|x_k\|\).

In the case method=”composed_with_adjoint”, the algorithm computes the largest (dominant) eigenvalue of \(A^{T}A\) returning the square root of this value, i.e. the iterations: \(x_{k+1} = A^TA (x_k/\|x_{k}\|)\) and returning \(\sqrt{\|x_k\|}\).

The default flag is method=”auto”, the algorithm checks to see if the operator.domain_geometry() == operator.range_geometry() and if so uses the method “direct_only” and if not the method “composed_with_adjoint”.

Examples

>>> M = np.array([[1.,0],[1.,2.]])
>>> Mop = MatrixOperator(M)
>>> Mop_norm = Mop.PowerMethod(Mop)
>>> Mop_norm
2.0000654846240296

PowerMethod is called when we compute the norm of a matrix or a LinearOperator.

>>> Mop_norm = Mop.norm()
2.0005647295658866

calculate_norm()[source]#: Returns the norm of the LinearOperator calculated by the PowerMethod with default values.

static dot_test(operator, domain_init=None, range_init=None, tolerance=1e-06, **kwargs)[source]#

Does a dot linearity test on the operator Evaluates if the following equivalence holds .. math:

Ax\times y = y \times A^Tx

Parameters:

operator – operator to test the dot_test
range_init – optional initialisation container in the operator range
domain_init – optional initialisation container in the operator domain
seed (int, default = 1) – Seed random generator
tolerance (float, default 1e-6) – Check if the following expression is below the tolerance
math:: (..) – |Ax\times y - y \times A^Tx|/(|A||x||y| + 1e-12) < tolerance

Return type:

boolean, True if the test is passed.

class cil.optimisation.operators.ScaledOperator(operator, scalar, **kwargs)[source]#

A class to represent the scalar multiplication of an Operator with a scalar. It holds an operator and a scalar. Basically it returns the multiplication of the result of direct and adjoint of the operator with the scalar. For the rest it behaves like the operator it holds.

Parameters

a scalar multiplier

Example

The scaled operator behaves like the following:

sop = ScaledOperator(operator, scalar)
sop.direct(x) = scalar * operator.direct(x)
sop.adjoint(x) = scalar * operator.adjoint(x)
sop.norm() = operator.norm()
sop.range_geometry() = operator.range_geometry()
sop.domain_geometry() = operator.domain_geometry()

direct(x, out=None)[source]#: direct method

adjoint(x, out=None)[source]#: adjoint method

norm(**kwargs)[source]#: norm of the operator

is_linear()[source]#: returns a boolean indicating whether the operator is linear

class cil.optimisation.operators.CompositionOperator(*operators, **kwargs)[source]#

Composes one or more operators. For example, CompositionOperator(left, right).direct(x) is equivalent to left.direct(right.direct(x))

Parameters:: args (Operator s) – Operators to be composed. As in mathematical notation, the operators will be applied right to left

direct(x, out=None)[source]#

Calls the composition operator at the point \(x\).

Parameters:

x (DataContainer or BlockDataContainer) – Element in the domain of the CompositionOperator
out (DataContainer or BlockDataContainer, default None) – If out is not None the output of the CompositionOperator will be filled in out, otherwise a new object is instantiated and returned.

Return type:

DataContainer or BlockDataContainer containing the result.

adjoint(x, out=None)[source]#

Calls the adjoint of the composition operator at the point \(x\).

Parameters:

x (DataContainer or BlockDataContainer) – Element in the range of the CompositionOperator
out (DataContainer or BlockDataContainer, default None) – If out is not None the output of the adjoint of the CompositionOperator will be filled in out, otherwise a new object is instantiated and returned.

Return type:

DataContainer or BlockDataContainer containing the result.

is_linear()[source]#: Returns if the operator is linear :rtype: Bool

class cil.optimisation.operators.DiagonalOperator(diagonal, domain_geometry=None)[source]#

Performs an element-wise multiplication, i.e., Hadamard Product of a DataContainer x and DataContainer diagonal, d .

\[(D\circ x) = \sum_{i,j}^{M,N} D_{i,j} x_{i, j}\]

In matrix-vector interpretation, if D is a \(M\times N\) dense matrix and is flattened, we have a \(M*N \times M*N\) vector. A sparse diagonal matrix, i.e., DigaonalOperator can be created if we add the vector above to the main diagonal. If the DataContainer x is also flattened, we have a \(M*N\) vector. Now, matrix-vector multiplcation is allowed and results to a \((M*N,1)\) vector. After reshaping we recover a \(M\times N\) DataContainer.

Parameters:

diagonal (DataContainer) – DataContainer with the same dimensions as the data to be operated on.
domain_geometry (ImageGeometry) – Specifies the geometry of the operator domain. If ‘None’ will use the diagonal geometry directly. default=None .

direct(x, out=None)[source]#: Returns \(D\circ x\)

adjoint(x, out=None)[source]#: Returns \(D^*\circ x\)

calculate_norm(**kwargs)[source]#: Returns the operator norm of DiagonalOperator which is the \(\infty\) norm of diagonal

\[\|D\|_{\infty} = \max_{i}\{|D_{i}|\}\]

class cil.optimisation.operators.ChannelwiseOperator(op, channels, dimension='prepend')[source]#

The ChannelwiseOperator takes in an operator and the number of channels to be used, and creates a new multi-channel ChannelwiseOperator, which will apply the operator independently on each channel for the number of channels specified.

ChannelwiseOperator does not currently support BlockOperators. Typically, if such behaviour is desired, it can be achieved by creating instead a BlockOperator of ChannelwiseOperators.

Parameters:

op (Operator) – Single-channel operator
channels (int) – Number of channels
dimension (str, optional) – ‘prepend’ (default) or ‘append’ channel dimension onto existing dimensions

Example

In this example, we create a ChannelwiseOperator that applies the same diagonal operator to each channel of a 2 channel image data.

>>> M = 3
>>> channels = 2
>>> ig = ImageGeometry(M, M, channels=channels)
>>> single_ig = ImageGeometry(M, M)
>>> x = ImageData( np.stack( [np.ones((M,M)),  2*np.ones((M,M))] , axis=0), geometry=ig )
>>> diag = ImageData(np.array(range(M*M), dtype=np.float64).reshape((M,M)), geometry=single_ig)
>>> D = DiagonalOperator(diag)
>>> C = ChannelwiseOperator(D,channels)
>>> y = C.direct(x)
>>> print('The original image data is:')
>>> print(x.as_array())
The original image data is:
[[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]
[[2. 2. 2.]
[2. 2. 2.]
[2. 2. 2.]]]
>>> print('The channel wise operator multiplies each channel element wise by:')
>>> print(diag.as_array())
The channel wise operator multiplies each channel element wise by:
[[0. 1. 2.]
[3. 4. 5.]
[6. 7. 8.]]
>>> print('The result of applying the channel wise operator is:')
>>> print(y.as_array())
The result of applying the channel wise operator is:
[[[ 0.  1.  2.]
[ 3.  4.  5.]
[ 6.  7.  8.]]
[[ 0.  2.  4.]
[ 6.  8. 10.]
[12. 14. 16.]]]

direct(x, out=None)[source]#

Returns \(D(x)\) where \(D\) is the ChannelwiseOperator and \(x\) is the input data.

Parameters:

x (DataContainer) – Input data
out (DataContainer, optional) – Output data, if not provided a new DataContainer will be created

Returns:

out – Output data

Return type:

DataContainer

adjoint(x, out=None)[source]#

Returns \(D^{*}(x)\) where \(D\) is the ChannelwiseOperator and \(x\) is the input data.

Parameters:

x (DataContainer) – Input data
out (DataContainer, optional) – Output data, if not provided a new DataContainer will be created

Returns:

out – Output data

Return type:

DataContainer

calculate_norm(**kwargs)[source]#: Evaluates operator norm of ChannelWiseOperator

class cil.optimisation.operators.SumOperator(operator1, operator2)[source]#

Sums two operators. For example, SumOperator(left, right).direct(x) is equivalent to left.direct(x)+right.direct(x)

Parameters:

operator1 (Operator) – The first Operator in the sum
operator2 (Operator) – The second Operator in the sum

Note

Both operators must have the same domain and range.

direct(x, out=None)[source]#

Calls the sum operator

Parameters:

x (DataContainer or BlockDataContainer) – Element in the domain of the SumOperator
out (DataContainer or BlockDataContainer, default None) – If out is not None the output of the SumOperator will be filled in out, otherwise a new object is instantiated and returned.

Return type:

DataContainer or BlockDataContainer containing the result.

adjoint(x, out=None)[source]#

Calls the adjoint of the sum operator, evaluated at the point \(x\).

Parameters:

x (DataContainer or BlockDataContainer) – Element in the range of the SumOperator
out (DataContainer or BlockDataContainer, default None) – If out is not None the output of the adjoint of the SumOperator will be filled in out, otherwise a new object is instantiated and returned.

Return type:

DataContainer or BlockDataContainer containing the result.

is_linear()[source]#: Returns if the operator is linear :rtype: Bool

Trivial operators#

Trivial operators are the following.

class cil.optimisation.operators.IdentityOperator(domain_geometry, range_geometry=None)[source]#

IdentityOperator: \(\mathrm{Id}: X \rightarrow Y\), \(\mathrm{Id}(x) = x\)

\(X\) : domain \(Y\) : range ( Default: \(Y = X\) )

Parameters:

domain_geometry (CIL Geometry) – domain of the operator
range_geometry (CIL Geometry, optional) – range of the operator, default: same as domain

direct(x, out=None)[source]#

Returns the input data \(x\)

Parameters:

x (DataContainer or BlockDataContainer) – Input data
out (DataContainer or BlockDataContainer, optional) – If out is not None the output of the Operator will be filled in out, otherwise a new object is instantiated and returned. The default is None.

Returns:

\(\mathrm{Id}(x) = x\)

Return type:

DataContainer or BlockDataContainer

adjoint(x, out=None)[source]#

Returns the input data, \(x\)

Parameters:

x (DataContainer or BlockDataContainer) – Input data
out (DataContainer or BlockDataContainer, optional) – If out is not None the output of the Operator will be filled in out, otherwise a new object is instantiated and returned. The default is None.

Returns:

\(\mathrm{Id}^*(x)=x\)

Return type:

DataContainer or BlockDataContainer

calculate_norm(**kwargs)[source]#: Evaluates operator norm of IdentityOperator

is_orthogonal()[source]#

Returns if the operator is orthogonal

Returns:: Always returns True for IdentityOperator
Return type:: Bool

class cil.optimisation.operators.ZeroOperator(domain_geometry, range_geometry=None)[source]#

ZeroOperator: \(\mathrm{O}: X \rightarrow Y\), maps any element of \(x\in X\) into the zero element in the space \(Y\), so \(\mathrm{O}(x) = \mathrm{O}_{Y}\).

Parameters:

domain_geometry (CIL Geometry) – domain of the operator
range_geometry (CIL Geometry, optional) – range of the operator, default: same as domain

Note

\[O^{*}: Y^{*} -> X^{*} \text{(Adjoint)} \quad \text{such that} \quad \langle O(x), y \rangle = \langle x, O^{*}(y) \rangle\]

direct(x, out=None)[source]#

Returns an element of the range space filled with zeros

Parameters:

x (DataContainer or BlockDataContainer) – Input data
out (DataContainer or BlockDataContainer, optional) – If out is not None the output of the Operator will be filled in out, otherwise a new object is instantiated and returned. The default is None.

Returns:

\(\mathrm{O}(x)\)

Return type:

DataContainer

adjoint(x, out=None)[source]#

Returns an element of the domain space filled with zeros

Parameters:

x (DataContainer or BlockDataContainer) – Input data
out (DataContainer or BlockDataContainer, optional) – If out is not None the output of the Operator will be filled in out, otherwise a new object is instantiated and returned. The default is None.

Returns:

\(\mathrm{O}^{*}(y)\)

Return type:

DataContainer

calculate_norm(**kwargs)[source]#: Evaluates operator norm of ZeroOperator

class cil.optimisation.operators.MatrixOperator(A)[source]#

Matrix wrapped in a CIL Operator to be used in optimisation algorithms.

Parameters:: A (a numpy matrix) – The matrix to be wrapped into a CIL Operator

direct(x, out=None)[source]#

Returns the matrix vector product \(Ax\)

Parameters:

x (DataContainer) – Input data
out (DataContainer, optional) – If out is not None the output of the Operator will be filled in out, otherwise a new object is instantiated and returned. The default is None.

Returns:

\(Ax\)

Return type:

DataContainer

adjoint(x, out=None)[source]#

Returns the matrix vector product \(A^{T}x\)

Parameters:

x (DataContainer) – Input data
out (DataContainer, optional) – If out is not None the output of the Operator will be filled in out, otherwise a new object is instantiated and returned. The default is None.

Returns:

\(A^{T}x\)

Return type:

DataContainer

size()[source]#: Returns the shape of the matrix

class cil.optimisation.operators.MaskOperator(mask, domain_geometry=None)[source]#

Parameters:

mask (DataContainer) – Boolean array with the same dimensions as the data to be operated on.
domain_geometry (ImageGeometry) – Specifies the geometry of the operator domain. If ‘None’ will use the mask geometry size and spacing as float32. default = None .

class cil.optimisation.operators.ProjectionMap(domain_geometry, index, range_geometry=None)[source]#

Projection Map or Canonical Projection (https://en.wikipedia.org/wiki/Projection_(mathematics))

Takes an element \(x = (x_{0},\dots,x_{i},\dots,x_{n})\) from a Cartesian product space \(X_{1}\times\cdots\times X_{n}\rightarrow X_{i}\) and projects it to element \(x_{i}\) specified by the index \(i\).

\[\pi_{i}: X_{1}\times\cdots\times X_{n}\rightarrow X_{i}\]

\[\pi_{i}(x_{0},\dots,x_{i},\dots,x_{n}) = x_{i}\]

The adjoint operation, is defined as

\[\pi_{i}^{*}(x_{i}) = (0, \cdots, x_{i}, \cdots, 0)\]

Parameters:

domain_geometry (BlockGeometry) – The domain of the ProjectionMap. A BlockGeometry is expected.
index (int) – Index to project to the corresponding ImageGeometry

direct(x, out=None)[source]#

Returns the ith (index) element of the Block data container, \(x\)

Parameters:

x (BlockDataContainer)
out (DataContainer, default None) – If out is not None the output of the ProjectionMap will be filled in out, otherwise a new object is instantiated and returned.

Return type:

DataContainer

adjoint(x, out=None)[source]#

Returns a BlockDataContainer of zeros with the ith (index) filled with the DataContainer, \(x\)

Parameters:

x (DataContainer)
out (BlockDataContainer, default None) – If out is not None the output of the adjoint of the ProjectionMap will be filled in out, otherwise a new object is instantiated and returned.

Return type:

BlockDataContainer

GradientOperator#

class cil.optimisation.operators.GradientOperator(domain_geometry, method='forward', bnd_cond='Neumann', **kwargs)[source]#

Gradient Operator: Computes first-order forward/backward differences on 2D, 3D, 4D ImageData under Neumann/Periodic boundary conditions

Parameters:

domain_geometry (ImageGeometry) – Set up the domain of the function
method (str, default 'forward') – Accepts: ‘forward’, ‘backward’, ‘centered’, note C++ optimised routine only works with ‘forward’
bnd_cond (str, default, 'Neumann') – Set the boundary conditions to use ‘Neumann’ or ‘Periodic’
**kwargs –

correlation: str, default ‘Space’
’Space’ will compute the gradient on only the spatial dimensions, ‘SpaceChannels’ will include the channel dimension direction

backend: str, default ‘c’
’c’ or ‘numpy’, defaults to ‘c’ if correlation is ‘SpaceChannels’ or channels = 1

num_threads: int
If backend is ‘c’ specify the number of threads to use. Default is number of cpus/2

split: boolean
If ‘True’, and backend ‘c’ will return a BlockDataContainer with grouped spatial domains. i.e. [Channel, [Z, Y, X]], otherwise [Channel, Z, Y, X]

Returns:

a BlockDataContainer containing images of the derivatives order given by dimension_labels i.e. [‘horizontal_y’,’horizontal_x’] will return [d(‘horizontal_y’), d(‘horizontal_x’)]

Return type:

BlockDataContainer

Example

2D example

\begin{eqnarray} \nabla : X \rightarrow Y\\ u \in X, \nabla(u) &=& [\partial_{y} u, \partial_{x} u]\\ u^{*} \in Y, \nabla^{*}(u^{*}) &=& \partial_{y} v1 + \partial_{x} v2 \end{eqnarray}

direct(x, out=None)[source]#

Computes the first-order forward differences

Parameters:

x (ImageData)
out (BlockDataContainer, optional) – pre-allocated output memory to store result

Returns:

result data if out not specified

Return type:

BlockDataContainer

adjoint(x, out=None)[source]#

Computes the first-order backward differences

Parameters:

x (BlockDataContainer) – Gradient images for each dimension in ImageGeometry domain
out (ImageData, optional) – pre-allocated output memory to store result

Returns:

result data if out not specified

Return type:

ImageData

calculate_norm()[source]#

Returns the analytical norm of the GradientOperator.

\[\begin{split}(\partial_{z}, \partial_{y}, \partial_{x}) &= \sqrt{\|\partial_{z}\|^{2} + \|\partial_{y}\|^{2} + \|\partial_{x}\|^{2} } \\ &= \sqrt{ \frac{4}{h_{z}^{2}} + \frac{4}{h_{y}^{2}} + \frac{4}{h_{x}^{2}}}\end{split}\]

Where the voxel sizes in each dimension are equal to 1 this simplifies to:

2D geometries \(norm = \sqrt{8}\)

3D geometries \(norm = \sqrt{12}\)

class cil.optimisation.operators.FiniteDifferenceOperator(domain_geometry, range_geometry=None, direction=None, method='forward', bnd_cond='Neumann')[source]#

Computes forward/backward/centered finite differences of a DataContainer under Neumann/Periodic boundary conditions

Parameters:

domain_geometry – Domain geometry for the FiniteDifferenceOperator
direction (string label from domain geometry or integer number) – Direction to evaluate finite differences
method ('forward', 'backward', 'centered') – Method for finite differences
bnd_cond – ‘Neumann’, ‘Periodic’

direct(x, out=None)[source]#

Calls the operator

Parameters:

x (DataContainer or BlockDataContainer) – Element in the domain of the Operator
out (DataContainer or BlockDataContainer, default None) – If out is not None the output of the Operator will be filled in out, otherwise a new object is instantiated and returned.

Return type:

DataContainer or BlockDataContainer containing the result.

adjoint(x, out=None)[source]#

Returns the adjoint/inverse operation evaluated at the point \(x\)

Parameters:

x (DataContainer or BlockDataContainer) – Element in the domain of the Operator
out (DataContainer or BlockDataContainer, default None) – If out is not None the output of the Operator will be filled in out, otherwise a new object is instantiated and returned.

Return type:

DataContainer or BlockDataContainer containing the result.

Note

Only available to linear operators

class cil.optimisation.operators.SparseFiniteDifferenceOperator(domain_geometry, range_geometry=None, direction=0, bnd_cond='Neumann')[source]#

Create Sparse Matrices for the Finite Difference Operator

direct(x)[source]#

Calls the operator

Parameters:

x (DataContainer or BlockDataContainer) – Element in the domain of the Operator
out (DataContainer or BlockDataContainer, default None) – If out is not None the output of the Operator will be filled in out, otherwise a new object is instantiated and returned.

Return type:

DataContainer or BlockDataContainer containing the result.

class cil.optimisation.operators.SymmetrisedGradientOperator(domain_geometry, bnd_cond='Neumann', **kwargs)[source]#

The symmetrised gradient is the operator, \(E\), defined by \(E: V \rightarrow W\) where V is BlockGeometry and W is the range of the Symmetrised Gradient and

\[\begin{split}E(v) = 0.5 ( \nabla v + (\nabla v)^{T} ) \\\end{split}\]

In 2 dimensions, let \(v(x,y)=(v_1(x,y),v_2(x,y))\) which gives

\[\begin{split}\nabla v =\left( \begin{matrix} \partial_{x} v_1 & \partial_x v_2\\ \partial_{y}v_1 & \partial_y v_2 \end{matrix}\right)\end{split}\]

and thus

\[\begin{split}E(v) = 0.5 ( \nabla v + (\nabla v)^{T} ) =\left( \begin{matrix} \partial_{x} v_1 & 0.5 (\partial_{y} v_1 + \partial_{x} v_2) \\ 0.5 (\partial_{x} v_1 + \partial_{y} v_2) & \partial_{y} v_2 \end{matrix}\right)\end{split}\]

Parameters:

domain_geometry (BlockGeometry with shape (2,1) or (3,1)) – Set up the domain of the function.
bnd_cond (str, optional, default Neumann) – Boundary condition either Neumann or Periodic
correlation (str, optional, default Channel) – Correlation either SpaceChannel or Channel

direct(x, out=None)[source]#: Returns \(E(v) = 0.5 * ( \nabla v + (\nabla v)^{T} )\)

Parameters:#

x: BlockDataContainer out: BlockDataContainer, default None

If out is not None the output of direct will be filled in out, otherwise a new object is instantiated and returned.

adjoint(x, out=None)[source]#: Returns the adjoint of the symmetrised gradient operator

Parameters:#

x: BlockDataContainer out: BlockDataContainer, default None

If out is not None the output of adjoint will be filled in out, otherwise a new object is instantiated and returned.

WaveletOperator#

We utilise PyWavelets (https://pywavelets.readthedocs.io/en/latest/index.html) to build wavelet operators in CIL:

class cil.optimisation.operators.WaveletOperator(domain_geometry, range_geometry=None, level=None, wname='haar', axes=None, **kwargs)[source]#

Computes forward or inverse (adjoint) discrete wavelet transform (DWT) of the input

Parameters:

domain_geometry (cil geometry) – Domain geometry for the WaveletOperator
range_geometry (cil geometry, optional) – Output geometry for the WaveletOperator. Default = domain_geometry with the right coefficient array size deduced from pywavelets
level (int, optional, default= log_2(min(shape(axes)))) – integer for decomposition level. Default = log_2(min(shape(axes))), i.e. the maximum number of accurate downsamplings possible
wname (string, optional, default='haar') – label for wavelet used.
axes (list of ints, optional, default=`None`) – Defines the dimensions to decompose along. Note that channel is the first dimension: for example, spatial DWT is given by axes=range(1,3) and channelwise DWT is axes=range(1) Default = None, meaning all dimensions are transformed. Same as axes = range(ndim)

**kwargs#

correlation: str, default ‘All’. Note: Only applied if axes = None!: ‘All’ will compute the wavelet decomposition on every possible dimension. ‘Space’ will compute the wavelet decomposition on only the spatial dimensions. If there are multiple channels, each channel is decomposed independently. ‘Channels’ will compute the wavelet decomposition on only the channels, independently for every spatial point.
bnd_cond: str, default ‘symmetric’. More commonly known as the padding or extension method used in discrete convolutions. All options supported by PyWavelets are valid.: Most common examples are ‘symmetric’ (padding by mirroring edge values), ‘zero’ (padding with zeros), ‘periodic’ (wrapping values around as in circular convolution). Some padding methods can have unexpected effect on the wavelet coefficients at the edges. See https://pywavelets.readthedocs.io/en/latest/ref/signal-extension-modes.html for more details and all options.

true_adjoint: bool, default True. For biorthogonal wavelets the true mathematical adjoint should no longer produce perfect reconstructions, setting true_adjoint`as `False the reconstruction (using the adjoint) should be (almost) the original input.

Note

The default decomposition level is the theoretical maximum: log_2(min(input.shape)). However, this is not always recommended and pywavelets should give a warning if the coarsest scales are too small to be meaningful.

Note

We currently do not support wavelets that are not orthogonal or bi-orthogonal.

direct(x, out=None)[source]#

Returns the value of the WaveletOperator applied to \(x\)

Parameters:

x (DataContainer)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the WaveletOperator applied to \(x\) or None if out

adjoint(Wx, out=None)[source]#

Returns the value of the adjoint of the WaveletOperator applied to \(x\)

Parameters:

x (DataContainer)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the adjoint of the WaveletOperator applied to \(x\) or None if out

calculate_norm()[source]#

Returns the norm of WaveletOperator, which is equal to 1.0 if the wavelet is orthogonal

Returns:: norm
Return type:: float

is_orthogonal()[source]#

Returns if the operator is orthogonal

Return type:: Bool

Functions#

A Function represents a mathematical function of one or more inputs and is intended to accept DataContainers as input as well as any additional parameters.

Fixed parameters can be passed in during the creation of the function object. The methods of the function reflect the properties of it, for example, if the function represented is differentiable the function should contain a method gradient which should return the gradient of the function evaluated at an input point. If the function is not differentiable but allows a simple proximal operator, the method proximal should return the proximal operator evaluated at an input point. The function value is evaluated by calling the function itself, e.g. f(x) for a Function f and input point x.

Base classes#

class cil.optimisation.functions.Function(L=None)[source]#

Abstract class representing a function

Parameters:: L (number, positive, default None) – Lipschitz constant of the gradient of the function F(x), when it is differentiable.

Note

The Lipschitz of the gradient of the function is a positive real number, such that \(\|f'(x) - f'(y)\| \leq L \|x-y\|\), assuming \(f: IG \rightarrow \mathbb{R}\)

gradient(x, out=None)[source]#

Returns the value of the gradient of function \(F\) evaluated at \(x\), if it is differentiable

\[F'(x)\]

Parameters:

x (DataContainer)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the gradient of the function at x.

proximal(x, tau, out=None)[source]#

Returns the proximal operator of function \(\tau F\) evaluated at x

\[\text{prox}_{\tau F}(x) = \underset{z}{\text{argmin}} \frac{1}{2}\|z - x\|^{2} + \tau F(z)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the proximal operator of the function at x with scalar \(\tau\).

convex_conjugate(x)[source]#

Evaluation of the function F* at x, where F* is the convex conjugate of function F,

\[F^{*}(x^{*}) = \underset{x}{\sup} \langle x^{*}, x \rangle - F(x)\]

Parameters:: x (DataContainer)
Return type:: The value of the convex conjugate of the function at x.

proximal_conjugate(x, tau, out=None)[source]#

Returns the proximal operator of the convex conjugate of function \(\tau F\) evaluated at \(x^{*}\)

\[\text{prox}_{\tau F^{*}}(x^{*}) = \underset{z^{*}}{\text{argmin}} \frac{1}{2}\|z^{*} - x^{*}\|^{2} + \tau F^{*}(z^{*})\]

Due to Moreau’s identity, we have an analytic formula to compute the proximal operator of the convex conjugate \(F^{*}\)

\[\text{prox}_{\tau F^{*}}(x) = x - \tau\text{prox}_{\tau^{-1} F}(\tau^{-1}x)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the proximal operator of the convex conjugate at point \(x\) for scalar \(\tau\) or None if out.

centered_at(center)[source]#

Returns a translated function, namely if we have a function \(F(x)\) the center is at the origin.: TranslateFunction is \(F(x - b)\) and the center is at point b.

Parameters:: center (DataContainer) – The point to center the function at.
Return type:: The translated function.

property L#

Lipschitz of the gradient of function f.

L is positive real number, such that \(\|f'(x) - f'(y)\| \leq L\|x-y\|\), assuming \(f: IG \rightarrow \mathbb{R}\)

class cil.optimisation.functions.SumFunction(*functions)[source]#

SumFunction represents the sum of \(n\geq2\) functions

\[(F_{1} + F_{2} + ... + F_{n})(\cdot) = F_{1}(\cdot) + F_{2}(\cdot) + ... + F_{n}(\cdot)\]

Parameters:: *functions (Functions) – Functions to set up a SumFunction
Raises:: ValueError – If the number of function is strictly less than 2.

Examples

\[F(x) = \|x\|^{2} + \frac{1}{2}\|x - 1\|^{2}\]

>>> from cil.optimisation.functions import L2NormSquared
>>> from cil.framework import ImageGeometry
>>> f1 = L2NormSquared()
>>> f2 = 0.5 * L2NormSquared(b = ig.allocate(1))
>>> F = SumFunction(f1, f2)

\[F(x) = \sum_{i=1}^{50} \|x - i\|^{2}\]

>>> F = SumFunction(*[L2NormSquared(b=i) for i in range(50)])

property L#

Returns the Lipschitz constant for the SumFunction

\[L = \sum_{i} L_{i}\]

where \(L_{i}\) is the Lipschitz constant of the smooth function \(F_{i}\).

property Lmax#

Returns the maximum Lipschitz constant for the SumFunction

\[L = \max_{i}\{L_{i}\}\]

where \(L_{i}\) is the Lipschitz constant of the smooth function \(F_{i}\).

gradient(x, out=None)[source]#

Returns the value of the sum of the gradient of functions evaluated at \(x\), if all of them are differentiable.

\[(F'_{1} + F'_{2} + ... + F'_{n})(x) = F'_{1}(x) + F'_{2}(x) + ... + F'_{n}(x)\]

Parameters:

x (DataContainer) – Point to evaluate the gradient at.
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the sum of the gradients evaluated at point \(x\).

centered_at(center)#

Returns a translated function, namely if we have a function \(F(x)\) the center is at the origin.: TranslateFunction is \(F(x - b)\) and the center is at point b.

Parameters:: center (DataContainer) – The point to center the function at.
Return type:: The translated function.

convex_conjugate(x)#

Evaluation of the function F* at x, where F* is the convex conjugate of function F,

\[F^{*}(x^{*}) = \underset{x}{\sup} \langle x^{*}, x \rangle - F(x)\]

Parameters:: x (DataContainer)
Return type:: The value of the convex conjugate of the function at x.

proximal(x, tau, out=None)#

Returns the proximal operator of function \(\tau F\) evaluated at x

\[\text{prox}_{\tau F}(x) = \underset{z}{\text{argmin}} \frac{1}{2}\|z - x\|^{2} + \tau F(z)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the proximal operator of the function at x with scalar \(\tau\).

proximal_conjugate(x, tau, out=None)#

Returns the proximal operator of the convex conjugate of function \(\tau F\) evaluated at \(x^{*}\)

\[\text{prox}_{\tau F^{*}}(x^{*}) = \underset{z^{*}}{\text{argmin}} \frac{1}{2}\|z^{*} - x^{*}\|^{2} + \tau F^{*}(z^{*})\]

Due to Moreau’s identity, we have an analytic formula to compute the proximal operator of the convex conjugate \(F^{*}\)

\[\text{prox}_{\tau F^{*}}(x) = x - \tau\text{prox}_{\tau^{-1} F}(\tau^{-1}x)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the proximal operator of the convex conjugate at point \(x\) for scalar \(\tau\) or None if out.

class cil.optimisation.functions.ScaledFunction(function, scalar)[source]#

ScaledFunction represents the scalar multiplication with a Function.

Let a function F then and a scalar \(\alpha\).

If \(G(x) = \alpha F(x)\) then:

\(G(x) = \alpha F(x)\) ( __call__ method )
\(G'(x) = \alpha F'(x)\) ( gradient method )
\(G^{*}(x^{*}) = \alpha F^{*}(\frac{x^{*}}{\alpha})\) ( convex_conjugate method )
\(\text{prox}_{\tau G}(x) = \text{prox}_{(\tau\alpha) F}(x)\) ( proximal method )

property L#

Lipschitz of the gradient of function f.

L is positive real number, such that \(\|f'(x) - f'(y)\| \leq L\|x-y\|\), assuming \(f: IG \rightarrow \mathbb{R}\)

convex_conjugate(x)[source]#

Returns the convex conjugate of the scaled function.

\[G^{*}(x^{*}) = \alpha F^{*}(\frac{x^{*}}{\alpha})\]

Parameters:: x (DataContainer)
Return type:: The value of the convex conjugate of the scaled function.

gradient(x, out=None)[source]#

Returns the gradient of the scaled function evaluated at \(x\).

\[G'(x) = \alpha F'(x)\]

Parameters:

x (DataContainer) – Point to evaluate the gradient at.
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the gradient of the scaled function evaluated at \(x\).

proximal(x, tau, out=None)[source]#

Returns the proximal operator of the scaled function, evaluated at \(x\).

\[\text{prox}_{\tau G}(x) = \text{prox}_{(\tau\alpha) F}(x)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the proximal operator of the scaled function evaluated at \(x\) with scalar \(\tau\).

proximal_conjugate(x, tau, out=None)[source]#

This returns the proximal conjugate operator for the function at \(x\), \(\tau\)

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the proximal conjugate operator for the function evaluated at \(x\) and \(\tau\).

centered_at(center)#

Returns a translated function, namely if we have a function \(F(x)\) the center is at the origin.: TranslateFunction is \(F(x - b)\) and the center is at point b.

Parameters:: center (DataContainer) – The point to center the function at.
Return type:: The translated function.

class cil.optimisation.functions.SumScalarFunction(function, constant)[source]#

SumScalarFunction represents the sum a function with a scalar.

\[(F + scalar)(x) = F(x) + scalar\]

Although SumFunction has no general expressions for

convex_conjugate
proximal
proximal_conjugate

if the second argument is a ConstantFunction then we can derive the above analytically.

convex_conjugate(x)[source]#

Returns the convex conjugate of a \((F+scalar)\), evaluated at \(x\).

\[(F+scalar)^{*}(x^{*}) = F^{*}(x^{*}) - scalar\]

Parameters:: x (DataContainer)
Return type:: The value of the convex conjugate evaluated at \(x\).

proximal(x, tau, out=None)[source]#

Returns the proximal operator of \(F+scalar\)

\[\text{prox}_{\tau (F+scalar)}(x) = \text{prox}_{\tau F}\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the evaluation of the proximal operator evaluated at \(x\) and \(\tau\).

property L#

Returns the Lipschitz constant for the SumFunction

\[L = \sum_{i} L_{i}\]

where \(L_{i}\) is the Lipschitz constant of the smooth function \(F_{i}\).

property Lmax#

Returns the maximum Lipschitz constant for the SumFunction

\[L = \max_{i}\{L_{i}\}\]

where \(L_{i}\) is the Lipschitz constant of the smooth function \(F_{i}\).

centered_at(center)#

Returns a translated function, namely if we have a function \(F(x)\) the center is at the origin.: TranslateFunction is \(F(x - b)\) and the center is at point b.

Parameters:: center (DataContainer) – The point to center the function at.
Return type:: The translated function.

gradient(x, out=None)#

Returns the value of the sum of the gradient of functions evaluated at \(x\), if all of them are differentiable.

\[(F'_{1} + F'_{2} + ... + F'_{n})(x) = F'_{1}(x) + F'_{2}(x) + ... + F'_{n}(x)\]

Parameters:

x (DataContainer) – Point to evaluate the gradient at.
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the sum of the gradients evaluated at point \(x\).

proximal_conjugate(x, tau, out=None)#

Returns the proximal operator of the convex conjugate of function \(\tau F\) evaluated at \(x^{*}\)

\[\text{prox}_{\tau F^{*}}(x^{*}) = \underset{z^{*}}{\text{argmin}} \frac{1}{2}\|z^{*} - x^{*}\|^{2} + \tau F^{*}(z^{*})\]

Due to Moreau’s identity, we have an analytic formula to compute the proximal operator of the convex conjugate \(F^{*}\)

\[\text{prox}_{\tau F^{*}}(x) = x - \tau\text{prox}_{\tau^{-1} F}(\tau^{-1}x)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the proximal operator of the convex conjugate at point \(x\) for scalar \(\tau\) or None if out.

class cil.optimisation.functions.TranslateFunction(function, center)[source]#

TranslateFunction represents the translation of function F with respect to the center b.

Let a function F and consider \(G(x) = F(x - center)\).

Function F is centered at 0, whereas G is centered at point b.

If \(G(x) = F(x - b)\) then:

\(G(x) = F(x - b)\) ( __call__ method )
\(G'(x) = F'(x - b)\) ( gradient method )
\(G^{*}(x^{*}) = F^{*}(x^{*}) + <x^{*}, b >\) ( convex_conjugate method )
\(\text{prox}_{\tau G}(x) = \text{prox}_{\tau F}(x - b) + b\) ( proximal method )

property L#

Lipschitz of the gradient of function f.

L is positive real number, such that \(\|f'(x) - f'(y)\| \leq L\|x-y\|\), assuming \(f: IG \rightarrow \mathbb{R}\)

centered_at(center)#

Returns a translated function, namely if we have a function \(F(x)\) the center is at the origin.: TranslateFunction is \(F(x - b)\) and the center is at point b.

Parameters:: center (DataContainer) – The point to center the function at.
Return type:: The translated function.

proximal_conjugate(x, tau, out=None)#

Returns the proximal operator of the convex conjugate of function \(\tau F\) evaluated at \(x^{*}\)

\[\text{prox}_{\tau F^{*}}(x^{*}) = \underset{z^{*}}{\text{argmin}} \frac{1}{2}\|z^{*} - x^{*}\|^{2} + \tau F^{*}(z^{*})\]

Due to Moreau’s identity, we have an analytic formula to compute the proximal operator of the convex conjugate \(F^{*}\)

\[\text{prox}_{\tau F^{*}}(x) = x - \tau\text{prox}_{\tau^{-1} F}(\tau^{-1}x)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the proximal operator of the convex conjugate at point \(x\) for scalar \(\tau\) or None if out.

gradient(x, out=None)[source]#

Returns the gradient of the translated function.

\[G'(x) = F'(x - b)\]

Parameters:

x (DataContainer) – Point to evaluate the gradient at.
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the gradient of the translated function evaluated at \(x\).

proximal(x, tau, out=None)[source]#

Returns the proximal operator of the translated function.

\[\text{prox}_{\tau G}(x) = \text{prox}_{\tau F}(x-b) + b\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the proximal operator of the translated function at \(x\) and \(\tau\).

convex_conjugate(x)[source]#

Returns the convex conjugate of the translated function.

\[G^{*}(x^{*}) = F^{*}(x^{*}) + <x^{*}, b >\]

Parameters:: x (DataContainer)
Return type:: The value of the convex conjugate of the translated function at \(x\).

Simple functions#

class cil.optimisation.functions.ConstantFunction(constant=0)[source]#

ConstantFunction: \(F(x) = constant, constant\in\mathbb{R}\)

gradient(x, out=None)[source]#

Returns the value of the gradient of the function, \(F'(x)=0\) :param x: Point to evaluate the gradient at. :type x: DataContainer :param out: :type out: return DataContainer, if None a new DataContainer is returned, default None.

Return type:: A DataContainer of zeros, the same size as \(x\).

convex_conjugate(x)[source]#

The convex conjugate of constant function \(F(x) = c\in\mathbb{R}\) is

\[\begin{split}F(x^{*}) = \begin{cases} -c, & if x^{*} = 0\\ \infty, & \mbox{otherwise} \end{cases}\end{split}\]

However, \(x^{*} = 0\) only in the limit of iterations, so in fact this can be infinity. We do not want to have inf values in the convex conjugate, so we have to penalise this value accordingly. The following penalisation is useful in the PDHG algorithm, when we compute primal & dual objectives for convergence purposes.

\[F^{*}(x^{*}) = \sum \max\{x^{*}, 0\}\]

Parameters:: x (DataContainer)
Return type:: The maximum of x and 0, summed over the entries of x.

proximal(x, tau, out=None)[source]#

Returns the proximal operator of the constant function, which is the same element, i.e.,

\[\text{prox}_{\tau F}(x) = x\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, equal to \(x\).

property L#

Lipschitz of the gradient of function f.

L is positive real number, such that \(\|f'(x) - f'(y)\| \leq L\|x-y\|\), assuming \(f: IG \rightarrow \mathbb{R}\)

centered_at(center)#

Returns a translated function, namely if we have a function \(F(x)\) the center is at the origin.: TranslateFunction is \(F(x - b)\) and the center is at point b.

Parameters:: center (DataContainer) – The point to center the function at.
Return type:: The translated function.

proximal_conjugate(x, tau, out=None)#

Returns the proximal operator of the convex conjugate of function \(\tau F\) evaluated at \(x^{*}\)

\[\text{prox}_{\tau F^{*}}(x^{*}) = \underset{z^{*}}{\text{argmin}} \frac{1}{2}\|z^{*} - x^{*}\|^{2} + \tau F^{*}(z^{*})\]

Due to Moreau’s identity, we have an analytic formula to compute the proximal operator of the convex conjugate \(F^{*}\)

\[\text{prox}_{\tau F^{*}}(x) = x - \tau\text{prox}_{\tau^{-1} F}(\tau^{-1}x)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the proximal operator of the convex conjugate at point \(x\) for scalar \(\tau\) or None if out.

class cil.optimisation.functions.ZeroFunction[source]#

ZeroFunction represents the zero function, \(F(x) = 0\)

property L#

Lipschitz of the gradient of function f.

L is positive real number, such that \(\|f'(x) - f'(y)\| \leq L\|x-y\|\), assuming \(f: IG \rightarrow \mathbb{R}\)

centered_at(center)#

Returns a translated function, namely if we have a function \(F(x)\) the center is at the origin.: TranslateFunction is \(F(x - b)\) and the center is at point b.

Parameters:: center (DataContainer) – The point to center the function at.
Return type:: The translated function.

convex_conjugate(x)#

The convex conjugate of constant function \(F(x) = c\in\mathbb{R}\) is

\[\begin{split}F(x^{*}) = \begin{cases} -c, & if x^{*} = 0\\ \infty, & \mbox{otherwise} \end{cases}\end{split}\]

\[F^{*}(x^{*}) = \sum \max\{x^{*}, 0\}\]

Parameters:: x (DataContainer)
Return type:: The maximum of x and 0, summed over the entries of x.

gradient(x, out=None)#

Return type:: A DataContainer of zeros, the same size as \(x\).

proximal(x, tau, out=None)#

Returns the proximal operator of the constant function, which is the same element, i.e.,

\[\text{prox}_{\tau F}(x) = x\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, equal to \(x\).

proximal_conjugate(x, tau, out=None)#

Returns the proximal operator of the convex conjugate of function \(\tau F\) evaluated at \(x^{*}\)

\[\text{prox}_{\tau F^{*}}(x^{*}) = \underset{z^{*}}{\text{argmin}} \frac{1}{2}\|z^{*} - x^{*}\|^{2} + \tau F^{*}(z^{*})\]

Due to Moreau’s identity, we have an analytic formula to compute the proximal operator of the convex conjugate \(F^{*}\)

\[\text{prox}_{\tau F^{*}}(x) = x - \tau\text{prox}_{\tau^{-1} F}(\tau^{-1}x)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the proximal operator of the convex conjugate at point \(x\) for scalar \(\tau\) or None if out.

class cil.optimisation.functions.Rosenbrock(alpha, beta)[source]#

Rosenbrock function

\[\]

F(x,y) = (alpha - x)^2 + beta(y-x^2)^2

The function has a global minimum at .. math:: (x,y)=(alpha, alpha^2)

gradient(x, out=None)[source]#

Gradient of the Rosenbrock function

\[\]

nabla f(x,y) = left[ 2*((x-alpha) - 2beta x(y-x^2)) ; 2beta (y - x^2) right]

property L#

Lipschitz of the gradient of function f.

L is positive real number, such that \(\|f'(x) - f'(y)\| \leq L\|x-y\|\), assuming \(f: IG \rightarrow \mathbb{R}\)

centered_at(center)#

Returns a translated function, namely if we have a function \(F(x)\) the center is at the origin.: TranslateFunction is \(F(x - b)\) and the center is at point b.

Parameters:: center (DataContainer) – The point to center the function at.
Return type:: The translated function.

convex_conjugate(x)#

Evaluation of the function F* at x, where F* is the convex conjugate of function F,

\[F^{*}(x^{*}) = \underset{x}{\sup} \langle x^{*}, x \rangle - F(x)\]

Parameters:: x (DataContainer)
Return type:: The value of the convex conjugate of the function at x.

proximal(x, tau, out=None)#

Returns the proximal operator of function \(\tau F\) evaluated at x

\[\text{prox}_{\tau F}(x) = \underset{z}{\text{argmin}} \frac{1}{2}\|z - x\|^{2} + \tau F(z)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the proximal operator of the function at x with scalar \(\tau\).

proximal_conjugate(x, tau, out=None)#

Returns the proximal operator of the convex conjugate of function \(\tau F\) evaluated at \(x^{*}\)

\[\text{prox}_{\tau F^{*}}(x^{*}) = \underset{z^{*}}{\text{argmin}} \frac{1}{2}\|z^{*} - x^{*}\|^{2} + \tau F^{*}(z^{*})\]

Due to Moreau’s identity, we have an analytic formula to compute the proximal operator of the convex conjugate \(F^{*}\)

\[\text{prox}_{\tau F^{*}}(x) = x - \tau\text{prox}_{\tau^{-1} F}(\tau^{-1}x)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the proximal operator of the convex conjugate at point \(x\) for scalar \(\tau\) or None if out.

Composition of operator and a function#

This class allows the user to write a function which does the following:

\[F ( x ) = G ( Ax )\]

where \(A\) is an operator. For instance the least squares function can be expressed as

\[F(x) = || Ax - b ||^2_2 \qquad \text{where} \qquad G(y) = || y - b ||^2_2\]

F1 = Norm2Sq(A, b)
# or equivalently
F2 = OperatorCompositionFunction(L2NormSquared(b=b), A)

class cil.optimisation.functions.OperatorCompositionFunction(function, operator)[source]#

Composition of a function with an operator as : \((F \circ A)(x) = F(Ax)\)

parameter function:

Function F

parameter operator:

Operator A

For general operator, we have no explicit formulas for convex_conjugate, proximal and proximal_conjugate

property L#

Lipschitz of the gradient of function f.

L is positive real number, such that \(\|f'(x) - f'(y)\| \leq L\|x-y\|\), assuming \(f: IG \rightarrow \mathbb{R}\)

gradient(x, out=None)[source]#

Return the gradient of \(F(Ax)\),

\((F(Ax))' = A^{T}F'(Ax)\)

centered_at(center)#

Returns a translated function, namely if we have a function \(F(x)\) the center is at the origin.: TranslateFunction is \(F(x - b)\) and the center is at point b.

Parameters:: center (DataContainer) – The point to center the function at.
Return type:: The translated function.

convex_conjugate(x)#

Evaluation of the function F* at x, where F* is the convex conjugate of function F,

\[F^{*}(x^{*}) = \underset{x}{\sup} \langle x^{*}, x \rangle - F(x)\]

Parameters:: x (DataContainer)
Return type:: The value of the convex conjugate of the function at x.

proximal(x, tau, out=None)#

Returns the proximal operator of function \(\tau F\) evaluated at x

\[\text{prox}_{\tau F}(x) = \underset{z}{\text{argmin}} \frac{1}{2}\|z - x\|^{2} + \tau F(z)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the proximal operator of the function at x with scalar \(\tau\).

proximal_conjugate(x, tau, out=None)#

Returns the proximal operator of the convex conjugate of function \(\tau F\) evaluated at \(x^{*}\)

\[\text{prox}_{\tau F^{*}}(x^{*}) = \underset{z^{*}}{\text{argmin}} \frac{1}{2}\|z^{*} - x^{*}\|^{2} + \tau F^{*}(z^{*})\]

Due to Moreau’s identity, we have an analytic formula to compute the proximal operator of the convex conjugate \(F^{*}\)

\[\text{prox}_{\tau F^{*}}(x) = x - \tau\text{prox}_{\tau^{-1} F}(\tau^{-1}x)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the proximal operator of the convex conjugate at point \(x\) for scalar \(\tau\) or None if out.

Indicator box#

class cil.optimisation.functions.IndicatorBox(lower=None, upper=None, accelerated=True)[source]#

Indicator function for box constraint

\[\begin{split}f(x) = \mathbb{I}_{[a, b]} = \begin{cases} 0, \text{ if } x \in [a, b] \\ \infty, \text{otherwise} \end{cases}\end{split}\]

Parameters:

lower (float, DataContainer or numpy array, default None) – Lower bound. If set to None, it is equivalent to -np.inf.
upper (float, DataContainer or numpy array, default None) – Upper bound. If set to None, it is equivalent to np.inf.
accelerated (bool, default True) – Specifies whether to use the accelerated version or not, using numba or numpy backends respectively.

If lower or upper are passed a DataContainer (or derived class such as ImageData or AcquisitionData) or a numpy array, the bounds can be set to different values for each element.

In order to save computing time it is possible to suppress the evaluation of the function. This is achieved by setting suppress_evaluation to True. IndicatorBox evaluated on any input will then return 0.

If accelerated is set to True (default), the Numba backend is used. Otherwise, the Numpy backend is used. An optional parameter to set the number of threads used by Numba can be set with set_num_threads. Setting the number of threads when accelerate is set to False will not have any effect. The default number of threads is defined in the cil.utilities.multiprocessing module, and it is equivalent to half of the CPU cores available.

Example:#

In order to save computing time it is possible to suppress the evaluation of the function.

ib = IndicatorBox(lower=0, upper=1)
ib.set_suppress_evaluation(True)
ib.evaluate(x) # returns 0

Example:#

Set the number of threads used in accelerated mode.

num_threads = 4
ib = IndicatorBox(lower=0, upper=1)
ib.set_num_threads(num_threads)

set_suppress_evaluation(value)[source]#

Suppresses the evaluation of the function

Parameters:: value (bool) – If True, the function evaluation on any input will return 0, without calculation.

proximal(x, tau, out=None)[source]#

Proximal operator of IndicatorBox at x

\[prox_{\tau * f}(x)\]

Parameters:

x (DataContainer) – Input to the proximal operator
tau (float) – Step size. Notice it is ignored in IndicatorBox
out (DataContainer, optional) – Output of the proximal operator. If not provided, a new DataContainer is created.

Note

tau is ignored but it is in the signature of the generic Function class

gradient(x, out=None)[source]#: IndicatorBox is not differentiable, so calling gradient will raise a ValueError

property num_threads#

Get the optional number of threads parameter to use for the accelerated version.

Defaults to the value set in the CIL multiprocessing module.

set_num_threads(value)[source]#

Set the optional number of threads parameter to use for the accelerated version.

This is discarded if accelerated=False.

property L#

Lipschitz of the gradient of function f.

L is positive real number, such that \(\|f'(x) - f'(y)\| \leq L\|x-y\|\), assuming \(f: IG \rightarrow \mathbb{R}\)

centered_at(center)#

Returns a translated function, namely if we have a function \(F(x)\) the center is at the origin.: TranslateFunction is \(F(x - b)\) and the center is at point b.

Parameters:: center (DataContainer) – The point to center the function at.
Return type:: The translated function.

convex_conjugate(x)#

Evaluation of the function F* at x, where F* is the convex conjugate of function F,

\[F^{*}(x^{*}) = \underset{x}{\sup} \langle x^{*}, x \rangle - F(x)\]

Parameters:: x (DataContainer)
Return type:: The value of the convex conjugate of the function at x.

proximal_conjugate(x, tau, out=None)#

Returns the proximal operator of the convex conjugate of function \(\tau F\) evaluated at \(x^{*}\)

\[\text{prox}_{\tau F^{*}}(x^{*}) = \underset{z^{*}}{\text{argmin}} \frac{1}{2}\|z^{*} - x^{*}\|^{2} + \tau F^{*}(z^{*})\]

Due to Moreau’s identity, we have an analytic formula to compute the proximal operator of the convex conjugate \(F^{*}\)

\[\text{prox}_{\tau F^{*}}(x) = x - \tau\text{prox}_{\tau^{-1} F}(\tau^{-1}x)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the proximal operator of the convex conjugate at point \(x\) for scalar \(\tau\) or None if out.

KullbackLeibler#

class cil.optimisation.functions.KullbackLeibler(b, eta=None, mask=None, backend='numba')[source]#

Kullback Leibler

\[\begin{split}F(u, v) = \begin{cases} u \log(\frac{u}{v}) - u + v & \mbox{ if } u > 0, v > 0\\ v & \mbox{ if } u = 0, v \ge 0 \\ \infty, & \mbox{otherwise} \end{cases}\end{split}\]

where the \(0\log0 := 0\) convention is used.

Parameters:

b (DataContainer, non-negative) – Translates the function at point b.
eta (DataContainer, default = 0) – Background noise
mask (DataContainer, default = None) – Mask for the data b
backend ({'numba','numpy'}, optional) – Backend for the KullbackLeibler methods. Numba is the default backend.

Note

The Kullback-Leibler function is used in practice as a fidelity term in minimisation problems where the acquired data follow Poisson distribution. If we denote the acquired data with b then, we write

\[\underset{i}{\sum} F(b_{i}, (v + \eta)_{i})\]

where, \(\eta\) is an additional noise.

In the case of Positron Emission Tomography reconstruction \(\eta\) represents scatter and random events contribution during the PET acquisition. Hence, in that case the KullbackLeibler fidelity measures the distance between \(\mathcal{A}v + \eta\) and acquisition data \(b\), where \(\mathcal{A}\) is the projection operator. This is related to PoissonLogLikelihoodWithLinearModelForMean , definition that is used in PET reconstruction in the SIRF software.

Note

The default implementation uses the build-in function kl_div from scipy. The methods of the KullbackLeibler are accelerated provided that numba library is installed.

Examples

>>> from cil.optimisation.functions import KullbackLeibler
>>> from cil.framework import ImageGeometry
>>> ig = ImageGeometry(3,4)
>>> data = ig.allocate('random')
>>> F = KullbackLeibler(b = data)

property L#

Lipschitz of the gradient of function f.

L is positive real number, such that \(\|f'(x) - f'(y)\| \leq L\|x-y\|\), assuming \(f: IG \rightarrow \mathbb{R}\)

centered_at(center)#

Returns a translated function, namely if we have a function \(F(x)\) the center is at the origin.: TranslateFunction is \(F(x - b)\) and the center is at point b.

Parameters:: center (DataContainer) – The point to center the function at.
Return type:: The translated function.

convex_conjugate(x)#

Evaluation of the function F* at x, where F* is the convex conjugate of function F,

\[F^{*}(x^{*}) = \underset{x}{\sup} \langle x^{*}, x \rangle - F(x)\]

Parameters:: x (DataContainer)
Return type:: The value of the convex conjugate of the function at x.

gradient(x, out=None)#

Returns the value of the gradient of function \(F\) evaluated at \(x\), if it is differentiable

\[F'(x)\]

Parameters:

x (DataContainer)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the gradient of the function at x.

proximal(x, tau, out=None)#

Returns the proximal operator of function \(\tau F\) evaluated at x

\[\text{prox}_{\tau F}(x) = \underset{z}{\text{argmin}} \frac{1}{2}\|z - x\|^{2} + \tau F(z)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the proximal operator of the function at x with scalar \(\tau\).

proximal_conjugate(x, tau, out=None)#

Returns the proximal operator of the convex conjugate of function \(\tau F\) evaluated at \(x^{*}\)

\[\text{prox}_{\tau F^{*}}(x^{*}) = \underset{z^{*}}{\text{argmin}} \frac{1}{2}\|z^{*} - x^{*}\|^{2} + \tau F^{*}(z^{*})\]

Due to Moreau’s identity, we have an analytic formula to compute the proximal operator of the convex conjugate \(F^{*}\)

\[\text{prox}_{\tau F^{*}}(x) = x - \tau\text{prox}_{\tau^{-1} F}(\tau^{-1}x)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the proximal operator of the convex conjugate at point \(x\) for scalar \(\tau\) or None if out.

L1 Norm#

class cil.optimisation.functions.L1Norm(b=None, weight=None)[source]#

L1Norm function

Consider the following cases:

\[F(x) = ||x||_{1}\]
\[F(x) = ||x - b||_{1}\]

In the weighted case, \(w\) is an array of non-negative weights.

\[F(x) = ||x||_{L^1(w)}\]
\[F(x) = ||x - b||_{L^1(w)}\]

with \(||x||_{L^1(w)} = || x w||_1 = \sum_{i=1}^{n} |x_i| w_i\).

Parameters:

weight (DataContainer, numpy ndarray, default None) – Array of non-negative weights. If None returns the L1 Norm.
b (DataContainer, default None) – Translation of the function.

convex_conjugate(x)[source]#

Returns the value of the convex conjugate of the L1 Norm function at \(x\).

This is the indicator of the unit \(L^{\infty}\) norm:

\[F^{*}(x^{*}) = \mathbb{I}_{\{\|\cdot\|_{\infty}\leq1\}}(x^{*})\]
\[F^{*}(x^{*}) = \mathbb{I}_{\{\|\cdot\|_{\infty}\leq1\}}(x^{*}) + \langle x^{*},b\rangle\]

\[\begin{split}\mathbb{I}_{\{\|\cdot\|_{\infty}\leq1\}}(x^{*}) = \begin{cases} 0, \mbox{if } \|x^{*}\|_{\infty}\leq1\\ \infty, \mbox{otherwise} \end{cases}\end{split}\]

In the weighted case the convex conjugate is the indicator of the unit \(L^{\infty}(w^{-1})\) norm.

See: https://math.stackexchange.com/questions/1533217/convex-conjugate-of-l1-norm-function-with-weight

\[F^{*}(x^{*}) = \mathbb{I}_{\{\|\cdot\|_{L^\infty(w^{-1})}\leq 1\}}(x^{*})\]
\[F^{*}(x^{*}) = \mathbb{I}_{\{\|\cdot\|_{L^\infty(w^{-1})}\leq 1\}}(x^{*}) + \langle x^{*},b\rangle\]

with \(\|x\|_{L^\infty(w^{-1})} = \max_{i} \frac{|x_i|}{w_i}\) and possible cases of \(\frac{0}{0}\) are defined to be 1.

Parameters:: x (DataContainer) – where to evaluate the convex conjugate of the L1 Norm function.
Returns:: The value of the convex conjugate of the WeightedL1Norm function at \(x\)
Return type:: float

proximal(x, tau, out=None)[source]#

Returns the value of the proximal operator of the L1 Norm function at \(x\) with scaling parameter \(\tau\).

Consider the following cases:

\[\mathrm{prox}_{\tau F}(x) = \mathrm{ShrinkOperator}_\tau(x)\]
\[\mathrm{prox}_{\tau F}(x) = \mathrm{ShrinkOperator}_\tau(x - b) + b\]

where,

\[\mathrm{ShrinkOperator}_\tau(x) = sgn(x) * \max\{ |x| - \tau, 0 \}\]

The weighted case follows from Example 6.23 in Chapter 6 of “First-Order Methods in Optimization” by Amir Beck, SIAM 2017 https://archive.siam.org/books/mo25/mo25_ch6.pdf

\[\mathrm{prox}_{\tau F}(x) = \mathrm{ShrinkOperator}_{\tau*w}(x)\]
\[\mathrm{prox}_{\tau F}(x) = \mathrm{ShrinkOperator}_{\tau*w}(x - b) + b\]

Parameters:

x (DataContainer)
tau (float, real, ndarray, DataContainer)
out (DataContainer, default None) – If not None, the result will be stored in this object.

Returns:

The value of the proximal operator of the L1 norm function at \(x\).

Return type:

float

property L#

Lipschitz of the gradient of function f.

L is positive real number, such that \(\|f'(x) - f'(y)\| \leq L\|x-y\|\), assuming \(f: IG \rightarrow \mathbb{R}\)

centered_at(center)#

Returns a translated function, namely if we have a function \(F(x)\) the center is at the origin.: TranslateFunction is \(F(x - b)\) and the center is at point b.

Parameters:: center (DataContainer) – The point to center the function at.
Return type:: The translated function.

gradient(x, out=None)#

Returns the value of the gradient of function \(F\) evaluated at \(x\), if it is differentiable

\[F'(x)\]

Parameters:

x (DataContainer)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the gradient of the function at x.

proximal_conjugate(x, tau, out=None)#

Returns the proximal operator of the convex conjugate of function \(\tau F\) evaluated at \(x^{*}\)

\[\text{prox}_{\tau F^{*}}(x^{*}) = \underset{z^{*}}{\text{argmin}} \frac{1}{2}\|z^{*} - x^{*}\|^{2} + \tau F^{*}(z^{*})\]

Due to Moreau’s identity, we have an analytic formula to compute the proximal operator of the convex conjugate \(F^{*}\)

\[\text{prox}_{\tau F^{*}}(x) = x - \tau\text{prox}_{\tau^{-1} F}(\tau^{-1}x)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the proximal operator of the convex conjugate at point \(x\) for scalar \(\tau\) or None if out.

L2 Norm Squared#

class cil.optimisation.functions.L2NormSquared(**kwargs)[source]#

L2NormSquared function: \(F(x) = \| x\|^{2}_{2} = \underset{i}{\sum}x_{i}^{2}\)

Following cases are considered:

\(F(x) = \|x\|^{2}_{2}\)

\(F(x) = \|x - b\|^{2}_{2}\)

Parameters:: b (DataContainer, optional) – Translation of the function

Note

For case b) we can use F = L2NormSquared().centered_at(b), see TranslateFunction.

Example

>>> F = L2NormSquared()
>>> F = L2NormSquared(b=b)
>>> F = L2NormSquared().centered_at(b)

gradient(x, out=None)[source]#

Returns the value of the gradient of the L2NormSquared function at x.

Following cases are considered:

\(F'(x) = 2x\)

\(F'(x) = 2(x-b)\)

convex_conjugate(x)[source]#

Returns the value of the convex conjugate of the L2NormSquared function at x.

Consider the following cases:

\[F^{*}(x^{*}) = \frac{1}{4}\|x^{*}\|^{2}_{2}\]

\[F^{*}(x^{*}) = \frac{1}{4}\|x^{*}\|^{2}_{2} + \langle x^{*}, b\rangle\]

proximal(x, tau, out=None)[source]#

Returns the value of the proximal operator of the L2NormSquared function at x.

Consider the following cases:

\[\text{prox}_{\tau F}(x) = \frac{x}{1+2\tau}\]

\[\text{prox}_{\tau F}(x) = \frac{x-b}{1+2\tau} + b\]

property L#

Lipschitz of the gradient of function f.

L is positive real number, such that \(\|f'(x) - f'(y)\| \leq L\|x-y\|\), assuming \(f: IG \rightarrow \mathbb{R}\)

centered_at(center)#

Returns a translated function, namely if we have a function \(F(x)\) the center is at the origin.: TranslateFunction is \(F(x - b)\) and the center is at point b.

Parameters:: center (DataContainer) – The point to center the function at.
Return type:: The translated function.

proximal_conjugate(x, tau, out=None)#

Returns the proximal operator of the convex conjugate of function \(\tau F\) evaluated at \(x^{*}\)

\[\text{prox}_{\tau F^{*}}(x^{*}) = \underset{z^{*}}{\text{argmin}} \frac{1}{2}\|z^{*} - x^{*}\|^{2} + \tau F^{*}(z^{*})\]

Due to Moreau’s identity, we have an analytic formula to compute the proximal operator of the convex conjugate \(F^{*}\)

\[\text{prox}_{\tau F^{*}}(x) = x - \tau\text{prox}_{\tau^{-1} F}(\tau^{-1}x)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the proximal operator of the convex conjugate at point \(x\) for scalar \(\tau\) or None if out.

class cil.optimisation.functions.WeightedL2NormSquared(**kwargs)[source]#

WeightedL2NormSquared function: \(F(x) = \|x\|_{W,2}^2 = \Sigma_iw_ix_i^2 = \langle x, Wx\rangle = x^TWx\) where \(W=\text{diag}(weight)\) if weight is a DataContainer or \(W=\text{weight} I\) if weight is a scalar.

Parameters:

**kwargs
weight (a scalar or a DataContainer with the same shape as the intended domain of this WeightedL2NormSquared function)
b (a DataContainer with the same shape as the intended domain of this WeightedL2NormSquared function) – A shift so that the function becomes \(F(x) = \| x-b\|_{W,2}^2 = \Sigma_iw_i(x_i-b_i)^2 = \langle x-b, W(x-b) \rangle = (x-b)^TW(x-b)\)

gradient(x, out=None)[source]#: Returns the value of \(F'(x) = 2Wx\) or, if b is defined, \(F'(x) = 2W(x-b)\) where \(W=\text{diag}(weight)\) if weight is a DataContainer or \(\text{weight}I\) if weight is a scalar.

convex_conjugate(x)[source]#: Returns the value of the convex conjugate of the WeightedL2NormSquared function at x.

property L#

Lipschitz of the gradient of function f.

L is positive real number, such that \(\|f'(x) - f'(y)\| \leq L\|x-y\|\), assuming \(f: IG \rightarrow \mathbb{R}\)

centered_at(center)#

Returns a translated function, namely if we have a function \(F(x)\) the center is at the origin.: TranslateFunction is \(F(x - b)\) and the center is at point b.

Parameters:: center (DataContainer) – The point to center the function at.
Return type:: The translated function.

proximal(x, tau, out=None)[source]#: Returns the value of the proximal operator of the WeightedL2NormSquared function at x.

proximal_conjugate(x, tau, out=None)#

Returns the proximal operator of the convex conjugate of function \(\tau F\) evaluated at \(x^{*}\)

\[\text{prox}_{\tau F^{*}}(x^{*}) = \underset{z^{*}}{\text{argmin}} \frac{1}{2}\|z^{*} - x^{*}\|^{2} + \tau F^{*}(z^{*})\]

Due to Moreau’s identity, we have an analytic formula to compute the proximal operator of the convex conjugate \(F^{*}\)

\[\text{prox}_{\tau F^{*}}(x) = x - \tau\text{prox}_{\tau^{-1} F}(\tau^{-1}x)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the proximal operator of the convex conjugate at point \(x\) for scalar \(\tau\) or None if out.

Least Squares#

class cil.optimisation.functions.LeastSquares(A, b, c=1.0, weight=None)[source]#

(Weighted) Least Squares function

\[F(x) = c\|Ax-b\|_2^2\]

or if weighted

\[F(x) = c\|Ax-b\|_{2,W}^{2}\]

where \(W=\text{diag}(weight)\).

Parameters:

A (LinearOperator)
b (Data, DataContainer)
c (Scaling Constant, float, default 1.0)
weight (DataContainer with all positive elements of size of the range of operator A, default None)

Note

L is the Lipshitz Constant of the gradient of \(F\) which is \(2 c ||A||_2^2 = 2 c \sigma_1(A)^2\), or \(2 c ||W|| ||A||_2^2 = 2c||W|| \sigma_1(A)^2\), where \(\sigma_1(A)\) is the largest singular value of \(A\) and \(W=\text{diag}(weight)\).

gradient(x, out=None)[source]#

Returns the value of the gradient of \(F(x)\):

\[F'(x) = 2cA^T(Ax-b)\]

\[F'(x) = 2cA^T(W(Ax-b))\]

where \(W=\text{diag}(weight)\).

property L#

Lipschitz of the gradient of function f.

L is positive real number, such that \(\|f'(x) - f'(y)\| \leq L\|x-y\|\), assuming \(f: IG \rightarrow \mathbb{R}\)

centered_at(center)#

Returns a translated function, namely if we have a function \(F(x)\) the center is at the origin.: TranslateFunction is \(F(x - b)\) and the center is at point b.

Parameters:: center (DataContainer) – The point to center the function at.
Return type:: The translated function.

convex_conjugate(x)#

Evaluation of the function F* at x, where F* is the convex conjugate of function F,

\[F^{*}(x^{*}) = \underset{x}{\sup} \langle x^{*}, x \rangle - F(x)\]

Parameters:: x (DataContainer)
Return type:: The value of the convex conjugate of the function at x.

proximal(x, tau, out=None)#

Returns the proximal operator of function \(\tau F\) evaluated at x

\[\text{prox}_{\tau F}(x) = \underset{z}{\text{argmin}} \frac{1}{2}\|z - x\|^{2} + \tau F(z)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the proximal operator of the function at x with scalar \(\tau\).

proximal_conjugate(x, tau, out=None)#

Returns the proximal operator of the convex conjugate of function \(\tau F\) evaluated at \(x^{*}\)

\[\text{prox}_{\tau F^{*}}(x^{*}) = \underset{z^{*}}{\text{argmin}} \frac{1}{2}\|z^{*} - x^{*}\|^{2} + \tau F^{*}(z^{*})\]

Due to Moreau’s identity, we have an analytic formula to compute the proximal operator of the convex conjugate \(F^{*}\)

\[\text{prox}_{\tau F^{*}}(x) = x - \tau\text{prox}_{\tau^{-1} F}(\tau^{-1}x)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the proximal operator of the convex conjugate at point \(x\) for scalar \(\tau\) or None if out.

L1 Sparsity#

class cil.optimisation.functions.L1Sparsity(Q, b=None, weight=None)[source]#

L1Sparsity function

Calculates the following cases, depending on if the optional parameter weight or data b is passed. For weight=None:

\[F(x) = ||Qx||_{1}\]
\[F(x) = ||Qx - b||_{1}\]

In the weighted case, weight = \(w\) is an array of non-negative weights.

\[F(x) = ||Qx||_{L^1(w)}\]
\[F(x) = ||Qx - b||_{L^1(w)}\]

with \(||x||_{L^1(w)} = || x \cdot w||_1 = \sum_{i=1}^{n} |x_i| w_i\).

In all cases \(Q\) is an orthogonal operator.

Parameters:

Q (orthogonal Operator) – Note that for the correct calculation of the proximal the provided operator must be orthogonal
b (Data, DataContainer, default is None)
weight (array, optional, default=None) – non-negative weight array matching the size of the range of operator \(Q\).

convex_conjugate(x)[source]#

Returns the value of the convex conjugate of the L1Sparsity function at x. Here, we need to use the convex conjugate of L1Sparsity, which is the Indicator of the unit \(\ell^{\infty}\) norm on the range of the (bijective) operator Q.

Consider the non-weighted case:

\[F^{*}(x^{*}) = \mathbb{I}_{\{\|\cdot\|_{\infty}\leq1\}}(Qx^{*})\]
\[F^{*}(x^{*}) = \mathbb{I}_{\{\|\cdot\|_{\infty}\leq1\}}(Qx^{*}) + \langle Qx^{*},b\rangle\]

\[\begin{split}\mathbb{I}_{\{\|\cdot\|_{\infty}\leq1\}}(x^{*}) = \begin{cases} 0, \mbox{if } \|x^{*}\|_{\infty}\leq1\\ \infty, \mbox{otherwise} \end{cases}\end{split}\]

In the weighted case the convex conjugate is the indicator of the unit \(L^{\infty}( w^{-1} )\) norm.

See: https://math.stackexchange.com/questions/1533217/convex-conjugate-of-l1-norm-function-with-weight

\[F^{*}(x^{*}) = \mathbb{I}_{\{\|\cdot\|_{L^\infty(w^{-1})}\leq 1\}}(Qx^{*})\]
\[F^{*}(x^{*}) = \mathbb{I}_{\{\|\cdot\|_{L^\infty(w^{-1})}\leq 1\}}(Qx^{*}) + \langle Qx^{*},b\rangle\]

with \(\|x\|_{L^\infty(w^{-1})} = \max_{i} \frac{|x_i|}{w_i}\) and possible cases of 0 / 0 are defined to be 1.

proximal(x, tau, out=None)[source]#

Returns the value of the proximal operator of the L1 Norm function at x with scaling parameter tau.

Consider the following cases:

\[\mathrm{prox}_{\tau F}(x) = Q^T \mathrm{ShinkOperator}_{\tau}(Qx)\]
\[\mathrm{prox}_{\tau F}(x) = Q^T \left( \mathrm{ShinkOperator}_\tau(Qx- b) + b \right)\]

where,

\[\mathrm{prox}_{\tau | \cdot |}(x) = \mathrm{ShinkOperator}(x) = sgn(x) * \max\{ |x| - \tau, 0 \}\]

The weighted case follows from Example 6.23 in Chapter 6 of “First-Order Methods in Optimization” by Amir Beck, SIAM 2017 https://archive.siam.org/books/mo25/mo25_ch6.pdf

\[\mathrm{prox}_{\tau F}(x) = Q^T \mathrm{ShinkOperator}_{\tau*w}(Qx)\]
\[\mathrm{prox}_{\tau F}(x) = Q^T \left( \mathrm{ShinkOperator}_{\tau*w}(Qx-b) + b \right)\]

Parameters:

x (DataContainer)
tau (float, ndarray, DataContainer)
out (DataContainer, default None) – If not None, the result will be stored in this object.

Returns:

The value of the proximal operator of the L1 norm function at x

Return type:

DataContainer.

property L#

Lipschitz of the gradient of function f.

L is positive real number, such that \(\|f'(x) - f'(y)\| \leq L\|x-y\|\), assuming \(f: IG \rightarrow \mathbb{R}\)

centered_at(center)#

Returns a translated function, namely if we have a function \(F(x)\) the center is at the origin.: TranslateFunction is \(F(x - b)\) and the center is at point b.

Parameters:: center (DataContainer) – The point to center the function at.
Return type:: The translated function.

gradient(x, out=None)#

Returns the value of the gradient of function \(F\) evaluated at \(x\), if it is differentiable

\[F'(x)\]

Parameters:

x (DataContainer)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the gradient of the function at x.

proximal_conjugate(x, tau, out=None)#

Returns the proximal operator of the convex conjugate of function \(\tau F\) evaluated at \(x^{*}\)

\[\text{prox}_{\tau F^{*}}(x^{*}) = \underset{z^{*}}{\text{argmin}} \frac{1}{2}\|z^{*} - x^{*}\|^{2} + \tau F^{*}(z^{*})\]

Due to Moreau’s identity, we have an analytic formula to compute the proximal operator of the convex conjugate \(F^{*}\)

\[\text{prox}_{\tau F^{*}}(x) = x - \tau\text{prox}_{\tau^{-1} F}(\tau^{-1}x)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the proximal operator of the convex conjugate at point \(x\) for scalar \(\tau\) or None if out.

Mixed L21 norm#

class cil.optimisation.functions.MixedL21Norm(**kwargs)[source]#

MixedL21Norm function: \(F(x) = ||x||_{2,1} = \sum |x|_{2} = \sum \sqrt{ (x^{1})^{2} + (x^{2})^{2} + \dots}\)

where x is a BlockDataContainer, i.e., \(x=(x^{1}, x^{2}, \dots)\)

convex_conjugate(x)[source]#

Returns the value of the convex conjugate of the MixedL21Norm function at x.

This is the Indicator function of \(\mathbb{I}_{\{\|\cdot\|_{2,\infty}\leq1\}}(x^{*})\),

i.e.,

\[\begin{split}\mathbb{I}_{\{\|\cdot\|_{2, \infty}\leq1\}}(x^{*}) = \begin{cases} 0, \mbox{if } \|x\|_{2, \infty}\leq1\\ \infty, \mbox{otherwise} \end{cases}\end{split}\]

where,

\[\|x\|_{2,\infty} = \max\{ \|x\|_{2} \} = \max\{ \sqrt{ (x^{1})^{2} + (x^{2})^{2} + \dots}\}\]

proximal(x, tau, out=None)[source]#

Returns the value of the proximal operator of the MixedL21Norm function at x.

\[\mathrm{prox}_{\tau F}(x) = \frac{x}{\|x\|_{2}}\max\{ \|x\|_{2} - \tau, 0 \}\]

where the convention 0 · (0/0) = 0 is used.

property L#

Lipschitz of the gradient of function f.

L is positive real number, such that \(\|f'(x) - f'(y)\| \leq L\|x-y\|\), assuming \(f: IG \rightarrow \mathbb{R}\)

centered_at(center)#

Returns a translated function, namely if we have a function \(F(x)\) the center is at the origin.: TranslateFunction is \(F(x - b)\) and the center is at point b.

Parameters:: center (DataContainer) – The point to center the function at.
Return type:: The translated function.

gradient(x, out=None)#

Returns the value of the gradient of function \(F\) evaluated at \(x\), if it is differentiable

\[F'(x)\]

Parameters:

x (DataContainer)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the gradient of the function at x.

proximal_conjugate(x, tau, out=None)#

Returns the proximal operator of the convex conjugate of function \(\tau F\) evaluated at \(x^{*}\)

\[\text{prox}_{\tau F^{*}}(x^{*}) = \underset{z^{*}}{\text{argmin}} \frac{1}{2}\|z^{*} - x^{*}\|^{2} + \tau F^{*}(z^{*})\]

Due to Moreau’s identity, we have an analytic formula to compute the proximal operator of the convex conjugate \(F^{*}\)

\[\text{prox}_{\tau F^{*}}(x) = x - \tau\text{prox}_{\tau^{-1} F}(\tau^{-1}x)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the proximal operator of the convex conjugate at point \(x\) for scalar \(\tau\) or None if out.

Smooth Mixed L21 norm#

class cil.optimisation.functions.SmoothMixedL21Norm(epsilon)[source]#

SmoothMixedL21Norm function: \(F(x) = ||x||_{2,1} = \sum |x|_{2} = \sum \sqrt{ (x^{1})^{2} + (x^{2})^{2} + \epsilon^2 + \dots}\)

where x is a BlockDataContainer, i.e., \(x=(x^{1}, x^{2}, \dots)\)

Conjugate, proximal and proximal conjugate methods no closed-form solution

gradient(x, out=None)[source]#

Returns the value of the gradient of the SmoothMixedL21Norm function at x.

frac{x}{|x|}

property L#

Lipschitz of the gradient of function f.

L is positive real number, such that \(\|f'(x) - f'(y)\| \leq L\|x-y\|\), assuming \(f: IG \rightarrow \mathbb{R}\)

centered_at(center)#

Returns a translated function, namely if we have a function \(F(x)\) the center is at the origin.: TranslateFunction is \(F(x - b)\) and the center is at point b.

Parameters:: center (DataContainer) – The point to center the function at.
Return type:: The translated function.

convex_conjugate(x)#

Evaluation of the function F* at x, where F* is the convex conjugate of function F,

\[F^{*}(x^{*}) = \underset{x}{\sup} \langle x^{*}, x \rangle - F(x)\]

Parameters:: x (DataContainer)
Return type:: The value of the convex conjugate of the function at x.

proximal(x, tau, out=None)#

Returns the proximal operator of function \(\tau F\) evaluated at x

\[\text{prox}_{\tau F}(x) = \underset{z}{\text{argmin}} \frac{1}{2}\|z - x\|^{2} + \tau F(z)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the proximal operator of the function at x with scalar \(\tau\).

proximal_conjugate(x, tau, out=None)#

Returns the proximal operator of the convex conjugate of function \(\tau F\) evaluated at \(x^{*}\)

\[\text{prox}_{\tau F^{*}}(x^{*}) = \underset{z^{*}}{\text{argmin}} \frac{1}{2}\|z^{*} - x^{*}\|^{2} + \tau F^{*}(z^{*})\]

Due to Moreau’s identity, we have an analytic formula to compute the proximal operator of the convex conjugate \(F^{*}\)

\[\text{prox}_{\tau F^{*}}(x) = x - \tau\text{prox}_{\tau^{-1} F}(\tau^{-1}x)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the proximal operator of the convex conjugate at point \(x\) for scalar \(\tau\) or None if out.

Mixed L11 norm#

class cil.optimisation.functions.MixedL11Norm(**kwargs)[source]#

MixedL11Norm function

\[F(x) = ||x||_{1,1} = \sum |x_{1}| + |x_{2}| + \cdots + |x_{n}|\]

Note

MixedL11Norm is a separable function, therefore it can also be defined using the BlockFunction.

Total variation#

class cil.optimisation.functions.TotalVariation(max_iteration=10, tolerance=None, correlation='Space', backend='c', lower=None, upper=None, isotropic=True, split=False, strong_convexity_constant=0, warm_start=True)[source]#

Total variation Function

\[\mathrm{TV}(u) := \|\nabla u\|_{2,1} = \sum \|\nabla u\|_{2},\, (\mbox{isotropic})\]

\[\mathrm{TV}(u) := \|\nabla u\|_{1,1} = \sum \|\nabla u\|_{1}\, (\mbox{anisotropic})\]

Notes

The TotalVariation (TV) Function acts as a composite function, i.e., the composition of the MixedL21Norm function and the GradientOperator operator,

\[f(u) = \|u\|_{2,1}, \Rightarrow (f\circ\nabla)(u) = f(\nabla x) = \mathrm{TV}(u)\]

In that case, the proximal operator of TV does not have an exact solution and we use an iterative algorithm to solve:

\[\mathrm{prox}_{\tau \mathrm{TV}}(b) := \underset{u}{\mathrm{argmin}} \frac{1}{2\tau}\|u - b\|^{2} + \mathrm{TV}(u)\]

The algorithm used for the proximal operator of TV is the Fast Gradient Projection algorithm (or FISTA) applied to the _dual problem_ of the above problem, see [1], [2], [12].

See also “Multicontrast MRI Reconstruction with Structure-Guided Total Variation”, Ehrhardt, Betcke, 2016.

Parameters:

max_iteration (int, default = 5) – Maximum number of iterations for the FGP algorithm to solve to solve the dual problem of the Total Variation Denoising problem (ROF). If warm_start=False, this should be around 100, or larger, with a set tolerance.
tolerance (float, default = None) –
Stopping criterion for the FGP algorithm used to to solve the dual problem of the Total Variation Denoising problem (ROF). If the difference between iterates in the FGP algorithm is less than the tolerance the iterations end before the max_iteration number.

\[\|x^{k+1} - x^{k}\|_{2} < \mathrm{tolerance}\]
correlation (str, default = Space) – Correlation between Space and/or SpaceChannels for the GradientOperator.
backend (str, default = c) – Backend to compute the GradientOperator
lower ('float, default = None) – A constraint is enforced using the IndicatorBox function, e.g., IndicatorBox(lower, upper).
upper ('float, default = None) – A constraint is enforced using the IndicatorBox function, e.g., IndicatorBox(lower, upper).
isotropic (boolean, default = True) –
Use either isotropic or anisotropic definition of TV.

\[|x|_{2} = \sqrt{x_{1}^{2} + x_{2}^{2}},\, (\mbox{isotropic})\]

\[|x|_{1} = |x_{1}| + |x_{2}|\, (\mbox{anisotropic})\]
split (boolean, default = False) – Splits the Gradient into spatial gradient and spectral or temporal gradient for multichannel data.
strong_convexity_constant (float, default = 0) –
A strongly convex term weighted by the strong_convexity_constant (\(\gamma\)) parameter is added to the Total variation. Now the TotalVariation function is \(\gamma\) - strongly convex and the proximal operator is

\[\underset{u}{\mathrm{argmin}} \frac{1}{2\tau}\|u - b\|^{2} + \mathrm{TV}(u) + \frac{\gamma}{2}\|u\|^{2} \Leftrightarrow\]

\[\underset{u}{\mathrm{argmin}} \frac{1}{2\frac{\tau}{1+\gamma\tau}}\|u - \frac{b}{1+\gamma\tau}\|^{2} + \mathrm{TV}(u)\]
warm_start (boolean, default = True) – If set to true, the FGP algorithm used to solve the dual problem of the Total Variation Denoising problem (ROF) is initiated by the final value from the previous iteration and not at zero. This allows the max_iteration value to be reduced to 5-10 iterations.

Note

With warm_start set to the default, True, the TV function will keep in memory the range of the gradient of the image to be denoised, i.e. N times the dimensionality of the image. This increases the memory requirements. However, during the evaluation of proximal the memory requirements will be unchanged as the same amount of memory will need to be allocated and deallocated.

Note

In the case where the Total variation becomes a \(\gamma\) - strongly convex function, i.e.,

\[\mathrm{TV}(u) + \frac{\gamma}{2}\|u\|^{2}\]

\(\gamma\) should be relatively small, so as the second term above will not act as an additional regulariser. For more information, see [11], [3].

Examples

\[\underset{u}{\mathrm{argmin}} \frac{1}{2}\|u - b\|^{2} + \alpha\|\nabla u\|_{2,1}\]

>>> alpha = 2.0
>>> TV = TotalVariation()
>>> sol = TV.proximal(b, tau = alpha)

Examples

\[\underset{u}{\mathrm{argmin}} \frac{1}{2}\|u - b\|^{2} + \alpha\|\nabla u\|_{1,1} + \mathbb{I}_{C}(u)\]

where \(C = \{1.0\leq u\leq 2.0\}\).

>>> alpha = 2.0
>>> TV = TotalVariation(isotropic=False, lower=1.0, upper=2.0)
>>> sol = TV.proximal(b, tau = alpha)

Examples

\[\underset{u}{\mathrm{argmin}} \frac{1}{2}\|u - b\|^{2} + (\alpha\|\nabla u\|_{2,1} + \frac{\gamma}{2}\|u\|^{2})\]

>>> alpha = 2.0
>>> gamma = 1e-3
>>> TV = alpha * TotalVariation(isotropic=False, strong_convexity_constant=gamma)
>>> sol = TV.proximal(b, tau = 1.0)

proximal(x, tau, out=None)[source]#: Returns the proximal operator of the TotalVariation function at x .

convex_conjugate(x)[source]#: Returns the value of convex conjugate of the TotalVariation function at x .

property L#

Lipschitz of the gradient of function f.

L is positive real number, such that \(\|f'(x) - f'(y)\| \leq L\|x-y\|\), assuming \(f: IG \rightarrow \mathbb{R}\)

calculate_Lipschitz()[source]#: Default value for the Lipschitz constant.

centered_at(center)#

Returns a translated function, namely if we have a function \(F(x)\) the center is at the origin.: TranslateFunction is \(F(x - b)\) and the center is at point b.

Parameters:: center (DataContainer) – The point to center the function at.
Return type:: The translated function.

gradient(x, out=None)#

Returns the value of the gradient of function \(F\) evaluated at \(x\), if it is differentiable

\[F'(x)\]

Parameters:

x (DataContainer)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the gradient of the function at x.

proximal_conjugate(x, tau, out=None)#

Returns the proximal operator of the convex conjugate of function \(\tau F\) evaluated at \(x^{*}\)

\[\text{prox}_{\tau F^{*}}(x^{*}) = \underset{z^{*}}{\text{argmin}} \frac{1}{2}\|z^{*} - x^{*}\|^{2} + \tau F^{*}(z^{*})\]

Due to Moreau’s identity, we have an analytic formula to compute the proximal operator of the convex conjugate \(F^{*}\)

\[\text{prox}_{\tau F^{*}}(x) = x - \tau\text{prox}_{\tau^{-1} F}(\tau^{-1}x)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the proximal operator of the convex conjugate at point \(x\) for scalar \(\tau\) or None if out.

property gradient_operator#: GradientOperator is created if it is not instantiated yet. The domain of the _gradient, is created in the __call__ and proximal methods.

Function of Absolute Value#

class cil.optimisation.functions.FunctionOfAbs(function: Function, assume_lower_semi: bool = False)[source]#

A function which acts on the absolute value of the complex-valued input,

\[G(z) = H(abs(z))\]

This function is initialised with another CIL function, \(H\) in the above formula. When this function is called, first the absolute value of the input is taken, and then the input is passed to the provided function.

Included in this class is the proximal map for FunctionOfAbs. From this, the proximal conjugate is also available from the parent CIL Function class, which is valid for this function. In the case that \(H\) is lower semi-continuous, convex, non-decreasing and finite at the origin, (and thus assume_lower_semi is set to True in the init) the convex conjugate is also defined. The gradient is not defined for this function.

For reference: see https://doi.org/10.48550/arXiv.2410.22161

Parameters:

function (Function) – Function acting on a real input, \(H\) in the above formula.
assume_lower_semi (bool, default False) – If True, assume that the function is lower semi-continuous, convex, non-decreasing and finite at the origin. This allows the convex conjugate to be calculated as the monotone conjugate, which is less than or equal to the convex conjugate. If False, the convex conjugate returned as 0. This is to ensure compatibility with Algorithms such as PDHG.

proximal(x: DataContainer, tau: float, out: DataContainer | None = None) → DataContainer[source]#

Returns the proximal map of function \(\tau G\) evaluated at x

\[\text{prox}_{\tau G}(x) = \underset{z}{\text{argmin}} \frac{1}{2}\|z - x\|^{2} + \tau G(z)\]

This is accomplished by calculating a bounded proximal map and making a change of phase, \(prox_G(z) = prox^+_H(r) \circ \Phi\) where \(z = r \circ \Phi\), \(r = abs(z)\), \(\Phi = \exp(i \angle(z))\), and \(\circ\) is element-wise product. Also define \(prox^+\) to be the proximal map of \(H\) in which the minimisation carried out over the positive orthant.

Parameters:

x (DataContainer) – The input to the function
tau (scalar) – The scalar multiplying the function in the proximal map
out (return DataContainer, if None a new DataContainer is returned, default None.) – DataContainer to store the result of the proximal map

Return type:

DataContainer, the proximal map of the function at x with scalar \(\tau\).

convex_conjugate(x: DataContainer) → float[source]#

Evaluation of the function G* at x, where G* is the convex conjugate of function G,

\[G^{*}(x^{*}) = \underset{x}{\sup} \langle x^{*}, x \rangle - G(x)\]

If \(H\) is lower semi-continuous, convex, non-decreasing finite at the origin, then \(G^*(z*) = H^+(|z*|)\), where the monotone conjugate \(g^+\) is

\[H^+(z^*) =sup {(z, z^*) - H(z) : z >= O}\]

The monotone conjugate will therefore be less than or equal to the convex conjugate, since it is taken over a smaller set. It is not available directly, but may coincide with the convex conjugate, which is therefore the best estimate we have. This is only valid for real x. In other cases, a general convex conjugate is not available or defined.

For reference see: Convex Analysis, R. Tyrrell Rocakfellar, pp110-111.

Parameters:

x (DataContainer) – The input to the function

Returns:

float
The value of the convex conjugate of the function at x.

gradient(x)[source]#: Gradient of the function at x is not defined for this function.

property L#

Lipschitz of the gradient of function f.

L is positive real number, such that \(\|f'(x) - f'(y)\| \leq L\|x-y\|\), assuming \(f: IG \rightarrow \mathbb{R}\)

centered_at(center)#

Returns a translated function, namely if we have a function \(F(x)\) the center is at the origin.: TranslateFunction is \(F(x - b)\) and the center is at point b.

Parameters:: center (DataContainer) – The point to center the function at.
Return type:: The translated function.

proximal_conjugate(x, tau, out=None)#

Returns the proximal operator of the convex conjugate of function \(\tau F\) evaluated at \(x^{*}\)

\[\text{prox}_{\tau F^{*}}(x^{*}) = \underset{z^{*}}{\text{argmin}} \frac{1}{2}\|z^{*} - x^{*}\|^{2} + \tau F^{*}(z^{*})\]

Due to Moreau’s identity, we have an analytic formula to compute the proximal operator of the convex conjugate \(F^{*}\)

\[\text{prox}_{\tau F^{*}}(x) = x - \tau\text{prox}_{\tau^{-1} F}(\tau^{-1}x)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the proximal operator of the convex conjugate at point \(x\) for scalar \(\tau\) or None if out.

Approximate Gradient base class#

class cil.optimisation.functions.ApproximateGradientSumFunction(functions, sampler=None)[source]#

ApproximateGradientSumFunction represents the following sum

\[\sum_{i=0}^{n-1} f_{i} = (f_{0} + f_{2} + ... + f_{n-1})\]

where there are \(n\) functions. This function class has two ways of calling gradient

full_gradient calculates the gradient of the sum \(\sum_{i=0}^{n-1} \nabla f_{i}\)

gradient calls an approximate_gradient function which may be less computationally expensive to calculate than the full gradient

This class is an abstract class.

Parameters:

functions (list of functions) – A list of functions: \([f_{0}, f_{2}, ..., f_{n-1}]\). Each function is assumed to be smooth with an implemented gradient() method. All functions must have the same domain. The number of functions (equivalently the length of the list) must be strictly greater than 1.
sampler (An instance of a CIL Sampler class ( sampler()) or of another class which has a __next__ function implemented to output integers in \({0,...,n-1}\).) – This sampler is called each time gradient is called and sets the internal function_num passed to the approximate_gradient function. Default is Sampler.random_with_replacement(len(functions)).

Note

We ensure that the approximate gradient is of a similar order of magnitude to the full gradient calculation. For example, in the SGFunction we approximate the full gradient by \(n\nabla f_i\) for an index \(i\) given by the sampler. The multiplication by \(n\) is a choice to more easily allow comparisons between stochastic and non-stochastic methods and between stochastic methods with varying numbers of subsets.

Note

Each time gradient is called the class keeps track of which functions have been used to calculate the gradient. This may be useful for debugging or plotting after using this function in an iterative algorithm.

The property data_passes_indices is a list of lists holding the indices of the functions that are processed in each call of gradient. This list is updated each time gradient is called by appending a list of the indices of the functions used to calculate the gradient.

The property data_passes is a list of floats that holds the amount of data that has been processed up until each call of gradient. This list is updated each time gradient is called by appending the proportion of the data used when calculating the approximate gradient since the class was initialised (a full gradient calculation would be 1 full data pass). Warning: if your functions do not contain an equal amount of data, for example your data was not partitioned into equal batches, then you must first use the `set_data_partition_weights” function for this to be accurate.

Note

The gradient() returns the approximate gradient depending on an index provided by the sampler method.

Example

This class is an abstract base class, so we give an example using the SGFunction child class.

Consider the objective is to minimise:

\[\sum_{i=0}^{n-1} f_{i}(x) = \sum_{i=0}^{n-1}\|A_{i} x - b_{i}\|^{2}\]

>>> list_of_functions = [LeastSquares(Ai, b=bi)] for Ai,bi in zip(A_subsets, b_subsets))
>>> f = ApproximateGradientSumFunction(list_of_functions)

>>> list_of_functions = [LeastSquares(Ai, b=bi)] for Ai,bi in zip(A_subsets, b_subsets))
>>> sampler = Sampler.sequential(len(list_of_functions))
>>> f = SGFunction(list_of_functions, sampler=sampler)
>>> f.full_gradient(x)
This will return :math:`\sum_{i=0}^{n-1} \nabla f_{i}(x)`
>>> f.gradient(x)
As per the approximate gradient implementation in the SGFunction this will return :math:`\nabla f_{0}`. The choice of the `0` index is because we chose a `sequential` sampler and this is the first time we called `gradient`.
>>> f.gradient(x)
This will return :math:`\nabla f_{1}` because we chose  a `sequential` sampler and this is the second time we called `gradient`.

full_gradient(x, out=None)[source]#

Returns the value of the full gradient of the sum of functions at \(x\).

\[\nabla_x(f_{0} + f_{1} + ... + f_{n-1})(x) = \nabla_xf_{0}(x) + \nabla_xf_{1}(x) + ... + \nabla_xf_{n-1}(x)\]

Parameters:

x (DataContainer)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Returns:

The value of the gradient of the sum function at x or nothing if out

Return type:

DataContainer

abstractmethod approximate_gradient(x, function_num, out=None)[source]#

Returns the approximate gradient at a given point x given a function_number in {0,…,len(functions)-1}.

Parameters:

x (DataContainer)
out (return DataContainer, if None a new DataContainer is returned, default None.)
function_num (int) – Between 0 and the number of functions in the list

Returns:

the value of the approximate gradient of the sum function at x given a function_number in {0,…,len(functions)-1}

Return type:

DataContainer

gradient(x, out=None)[source]#

Selects a random function using the sampler and then calls the approximate gradient at x

Parameters:

x (DataContainer)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Returns:

the value of the approximate gradient of the sum function at x

Return type:

DataContainer

set_data_partition_weights(weights)[source]#

Setter for the partition weights used to calculate the data passes

Parameters:: weights (list of positive floats that sum to one.) – The proportion of the data held in each function. Equivalent to the proportions that you partitioned your data into.

property data_passes_indices#: The property data_passes_indices is a list of lists holding the indices of the functions that are processed in each call of gradient. This list is updated each time gradient is called by appending a list of the indices of the functions used to calculate the gradient.

property data_passes#: The property data_passes is a list of floats that holds the amount of data that has been processed up until each call of gradient. This list is updated each time gradient is called by appending the proportion of the data used when calculating the approximate gradient since the class was initialised (a full gradient calculation would be 1 full data pass). Note that if your functions do not contain an equal amount of data, for example your data was not partitioned into equal batches, then you must first use the set_data_partition_weights function for this to be accurate.

property L#

Returns the Lipschitz constant for the SumFunction

\[L = \sum_{i} L_{i}\]

where \(L_{i}\) is the Lipschitz constant of the smooth function \(F_{i}\).

property Lmax#

Returns the maximum Lipschitz constant for the SumFunction

\[L = \max_{i}\{L_{i}\}\]

where \(L_{i}\) is the Lipschitz constant of the smooth function \(F_{i}\).

centered_at(center)#

Returns a translated function, namely if we have a function \(F(x)\) the center is at the origin.: TranslateFunction is \(F(x - b)\) and the center is at point b.

Parameters:: center (DataContainer) – The point to center the function at.
Return type:: The translated function.

convex_conjugate(x)#

Evaluation of the function F* at x, where F* is the convex conjugate of function F,

\[F^{*}(x^{*}) = \underset{x}{\sup} \langle x^{*}, x \rangle - F(x)\]

Parameters:: x (DataContainer)
Return type:: The value of the convex conjugate of the function at x.

proximal(x, tau, out=None)#

Returns the proximal operator of function \(\tau F\) evaluated at x

\[\text{prox}_{\tau F}(x) = \underset{z}{\text{argmin}} \frac{1}{2}\|z - x\|^{2} + \tau F(z)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the proximal operator of the function at x with scalar \(\tau\).

proximal_conjugate(x, tau, out=None)#

Returns the proximal operator of the convex conjugate of function \(\tau F\) evaluated at \(x^{*}\)

\[\text{prox}_{\tau F^{*}}(x^{*}) = \underset{z^{*}}{\text{argmin}} \frac{1}{2}\|z^{*} - x^{*}\|^{2} + \tau F^{*}(z^{*})\]

Due to Moreau’s identity, we have an analytic formula to compute the proximal operator of the convex conjugate \(F^{*}\)

\[\text{prox}_{\tau F^{*}}(x) = x - \tau\text{prox}_{\tau^{-1} F}(\tau^{-1}x)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the proximal operator of the convex conjugate at point \(x\) for scalar \(\tau\) or None if out.

Stochastic Gradient function#

class cil.optimisation.functions.SGFunction(functions, sampler=None)[source]#

Stochastic gradient function, a child class of ApproximateGradientSumFunction, which defines from a list of functions, \({f_0,...,f_{n-1}}\) a SumFunction, \(f_0+...+f_{n-1}\) where each time the gradient is called, the sampler provides an index, \(i \in {0,...,n-1}\) and the gradient method returns the approximate gradient \(n \nabla_x f_i(x)\). This can be used with the cil.optimisation.algorithms algorithm GD to give a stochastic gradient descent algorithm.

Parameters:

functions (list of functions) – A list of functions: [f_{0}, f_{1}, ..., f_{n-1}]. Each function is assumed to be smooth with an implemented gradient() method. Although CIL does not define a domain of a Function, all functions are supposed to have the same domain. The number of functions must be strictly greater than 1.
sampler (An instance of a CIL Sampler class ( sampler()) or of another class which has a __next__ function implemented to output integers in {0,…,n-1}.) – This sampler is called each time gradient is called and sets the internal function_num passed to the approximate_gradient function. Default is Sampler.random_with_replacement(len(functions)).

approximate_gradient(x, function_num, out=None)[source]#

Returns the gradient of the function at index function_num at x.

Parameters:

x (DataContainer)
out (return DataContainer, if None a new DataContainer is returned, default None.)
function_num (int) – Between 0 and the number of functions in the list

Returns:

the value of the approximate gradient of the sum function at x given a function_number in {0,…,len(functions)-1}

Return type:

DataContainer

property L#

Returns the Lipschitz constant for the SumFunction

\[L = \sum_{i} L_{i}\]

where \(L_{i}\) is the Lipschitz constant of the smooth function \(F_{i}\).

property Lmax#

Returns the maximum Lipschitz constant for the SumFunction

\[L = \max_{i}\{L_{i}\}\]

where \(L_{i}\) is the Lipschitz constant of the smooth function \(F_{i}\).

centered_at(center)#

Returns a translated function, namely if we have a function \(F(x)\) the center is at the origin.: TranslateFunction is \(F(x - b)\) and the center is at point b.

Parameters:: center (DataContainer) – The point to center the function at.
Return type:: The translated function.

convex_conjugate(x)#

Evaluation of the function F* at x, where F* is the convex conjugate of function F,

\[F^{*}(x^{*}) = \underset{x}{\sup} \langle x^{*}, x \rangle - F(x)\]

Parameters:: x (DataContainer)
Return type:: The value of the convex conjugate of the function at x.

property data_passes#: The property data_passes is a list of floats that holds the amount of data that has been processed up until each call of gradient. This list is updated each time gradient is called by appending the proportion of the data used when calculating the approximate gradient since the class was initialised (a full gradient calculation would be 1 full data pass). Note that if your functions do not contain an equal amount of data, for example your data was not partitioned into equal batches, then you must first use the set_data_partition_weights function for this to be accurate.

property data_passes_indices#: The property data_passes_indices is a list of lists holding the indices of the functions that are processed in each call of gradient. This list is updated each time gradient is called by appending a list of the indices of the functions used to calculate the gradient.

full_gradient(x, out=None)#

Returns the value of the full gradient of the sum of functions at \(x\).

\[\nabla_x(f_{0} + f_{1} + ... + f_{n-1})(x) = \nabla_xf_{0}(x) + \nabla_xf_{1}(x) + ... + \nabla_xf_{n-1}(x)\]

Parameters:

x (DataContainer)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Returns:

The value of the gradient of the sum function at x or nothing if out

Return type:

DataContainer

gradient(x, out=None)#

Selects a random function using the sampler and then calls the approximate gradient at x

Parameters:

x (DataContainer)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Returns:

the value of the approximate gradient of the sum function at x

Return type:

DataContainer

proximal(x, tau, out=None)#

Returns the proximal operator of function \(\tau F\) evaluated at x

\[\text{prox}_{\tau F}(x) = \underset{z}{\text{argmin}} \frac{1}{2}\|z - x\|^{2} + \tau F(z)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the proximal operator of the function at x with scalar \(\tau\).

proximal_conjugate(x, tau, out=None)#

Returns the proximal operator of the convex conjugate of function \(\tau F\) evaluated at \(x^{*}\)

\[\text{prox}_{\tau F^{*}}(x^{*}) = \underset{z^{*}}{\text{argmin}} \frac{1}{2}\|z^{*} - x^{*}\|^{2} + \tau F^{*}(z^{*})\]

Due to Moreau’s identity, we have an analytic formula to compute the proximal operator of the convex conjugate \(F^{*}\)

\[\text{prox}_{\tau F^{*}}(x) = x - \tau\text{prox}_{\tau^{-1} F}(\tau^{-1}x)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the proximal operator of the convex conjugate at point \(x\) for scalar \(\tau\) or None if out.

set_data_partition_weights(weights)#

Setter for the partition weights used to calculate the data passes

Parameters:: weights (list of positive floats that sum to one.) – The proportion of the data held in each function. Equivalent to the proportions that you partitioned your data into.

SAG function#

class cil.optimisation.functions.SAGFunction(functions, sampler=None)[source]#

The stochastic average gradient (SAG) function takes a index \(i_k\) and calculates the approximate gradient of \(\sum_{i=0}^{n-1}f_i\) at iteration \(x_k\) as

\[\begin{split}\sum_{i=0}^{n-1} g_i^k \qquad \text{where} \qquad g_i^k= \begin{cases} \nabla f_i(x_k), \text{ if } i=i_k\\ g_i^{k-1},\text{ otherwise } \end{cases}\end{split}\]

The idea is that by incorporating a memory of previous gradient values the SAG method can achieve a faster convergence rate than black-box stochastic gradient methods.

Note

Compared with the literature, we do not divide by \(n\), the number of functions, so that we return an approximate gradient of the whole sum function and not an average gradient.

Note

Reference: Schmidt, M., Le Roux, N. and Bach, F., 2017. Minimizing finite sums with the stochastic average gradient. Mathematical Programming, 162, pp.83-112. https://doi.org/10.1007/s10107-016-1030-6.

Parameters:

functions (list of functions) – A list of functions: \(f_{0}, f_{1}, ..., f_{n-1}\). Each function is assumed to be smooth with an implemented gradient() method. All functions must have the same domain. The number of functions (equivalently the length of the list n) must be strictly greater than 1.
sampler (An instance of a CIL Sampler class ( sampler()) or of another class which has a next function implemented to output integers in \({0,...,n-1}\).) – This sampler is called each time gradient is called and sets the internal function_num passed to the approximate_gradient function. Default is Sampler.random_with_replacement(len(functions)).

Note

The user has the option of calling the class method warm_start_approximate_gradients after initialising this class. This will compute and store the gradient for each function at an initial point, equivalently setting \(g_i^0=\nabla f_i(x_0)\) for initial point \(x_0\). If this method is not called, the gradients are initialised with zeros.

Note

This function’s memory requirements are n + 3 times the image space, that is with 100 subsets the memory requirement is 103 images, which is huge.

approximate_gradient(x, function_num, out=None)[source]#

SAG approximate gradient, calculated at the point \(x\) and updated using the function index given by function_num.

Parameters:

x (DataContainer (e.g. ImageData object)) – Element in the domain of the functions
function_num (int) – Between 0 and the number of functions in the list

warm_start_approximate_gradients(initial)[source]#

A function to warm start SAG or SAGA algorithms by initialising all the gradients at an initial point. Equivalently setting \(g_i^0 = \nabla f_i(x_0)\) for initial point \(x_0\).

Parameters:: initial (DataContainer,) – The initial point to warmstart the calculation

Note

When using SAG or SAGA with a deterministic algorithm, you should warm start the SAG-SAGA Function with the same initial point that you initialise the algorithm

property data_passes_indices#: The property data_passes_indices is a list of lists holding the indices of the functions that are processed in each call of gradient. This list is updated each time gradient is called by appending a list of the indices of the functions used to calculate the gradient. This is overwritten from the base class to first check to see if the approximate gradient was warm started and, if it was, ensure that the first element of data_passes_indices contains each index used to warm start and the index used in the first call to gradient. Thus the length of data_passes_indices is always equal to the number of calls to gradient.

property L#

Returns the Lipschitz constant for the SumFunction

\[L = \sum_{i} L_{i}\]

where \(L_{i}\) is the Lipschitz constant of the smooth function \(F_{i}\).

property Lmax#

Returns the maximum Lipschitz constant for the SumFunction

\[L = \max_{i}\{L_{i}\}\]

where \(L_{i}\) is the Lipschitz constant of the smooth function \(F_{i}\).

centered_at(center)#

Returns a translated function, namely if we have a function \(F(x)\) the center is at the origin.: TranslateFunction is \(F(x - b)\) and the center is at point b.

Parameters:: center (DataContainer) – The point to center the function at.
Return type:: The translated function.

convex_conjugate(x)#

Evaluation of the function F* at x, where F* is the convex conjugate of function F,

\[F^{*}(x^{*}) = \underset{x}{\sup} \langle x^{*}, x \rangle - F(x)\]

Parameters:: x (DataContainer)
Return type:: The value of the convex conjugate of the function at x.

property data_passes#: The property data_passes is a list of floats that holds the amount of data that has been processed up until each call of gradient. This list is updated each time gradient is called by appending the proportion of the data used when calculating the approximate gradient since the class was initialised (a full gradient calculation would be 1 full data pass). Note that if your functions do not contain an equal amount of data, for example your data was not partitioned into equal batches, then you must first use the set_data_partition_weights function for this to be accurate.

full_gradient(x, out=None)#

Returns the value of the full gradient of the sum of functions at \(x\).

\[\nabla_x(f_{0} + f_{1} + ... + f_{n-1})(x) = \nabla_xf_{0}(x) + \nabla_xf_{1}(x) + ... + \nabla_xf_{n-1}(x)\]

Parameters:

x (DataContainer)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Returns:

The value of the gradient of the sum function at x or nothing if out

Return type:

DataContainer

gradient(x, out=None)#

Selects a random function using the sampler and then calls the approximate gradient at x

Parameters:

x (DataContainer)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Returns:

the value of the approximate gradient of the sum function at x

Return type:

DataContainer

proximal(x, tau, out=None)#

Returns the proximal operator of function \(\tau F\) evaluated at x

\[\text{prox}_{\tau F}(x) = \underset{z}{\text{argmin}} \frac{1}{2}\|z - x\|^{2} + \tau F(z)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the proximal operator of the function at x with scalar \(\tau\).

proximal_conjugate(x, tau, out=None)#

Returns the proximal operator of the convex conjugate of function \(\tau F\) evaluated at \(x^{*}\)

\[\text{prox}_{\tau F^{*}}(x^{*}) = \underset{z^{*}}{\text{argmin}} \frac{1}{2}\|z^{*} - x^{*}\|^{2} + \tau F^{*}(z^{*})\]

Due to Moreau’s identity, we have an analytic formula to compute the proximal operator of the convex conjugate \(F^{*}\)

\[\text{prox}_{\tau F^{*}}(x) = x - \tau\text{prox}_{\tau^{-1} F}(\tau^{-1}x)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the proximal operator of the convex conjugate at point \(x\) for scalar \(\tau\) or None if out.

set_data_partition_weights(weights)#

Setter for the partition weights used to calculate the data passes

Parameters:: weights (list of positive floats that sum to one.) – The proportion of the data held in each function. Equivalent to the proportions that you partitioned your data into.

SAGA function#

class cil.optimisation.functions.SAGAFunction(functions, sampler=None)[source]#

SAGA (SAG-Ameliore) is an accelerated version of the stochastic average gradient (SAG) function which takes a index \(i_k\) and calculates the approximate gradient of \(\sum_{i=0}^{n-1}f_i\) at iteration \(x_k\) as

\[\begin{split}n\left(g_{i_k}^{k}-g_{i_k}^{k-1}\right)+\sum_{i=0}^{n-1} g_i^{k-1} \qquad \text{where} \qquad g_i^k= \begin{cases} \nabla f_i(x_k), \text{ if } i=i_k\\ g_i^{k-1},\text{ otherwise} \end{cases}\end{split}\]

SAGA improves on the theory behind SAG and SVRG, with better theoretical convergence rates. Compared to SAG it is an unbiased estimator.

Note

Compared with the literature, we do not divide by \(n\), the number of functions, so that we return an approximate gradient of the whole sum function and not an average gradient.

Note

This function’s memory requirements are n + 3 times the image space, that is with 100 subsets the memory requirement is 103 images, which is huge.

Note

Reference: Defazio, A., Bach, F. and Lacoste-Julien, S., 2014. SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in neural information processing systems, 27. https://proceedings.neurips.cc/paper_files/paper/2014/file/ede7e2b6d13a41ddf9f4bdef84fdc737-Paper.pdf

Parameters:

functions (list of functions) – A list of functions: [f_{0}, f_{1}, ..., f_{n-1}]. Each function is assumed to be smooth function with an implemented gradient() method. Each function must have the same domain. The number of functions must be strictly greater than 1.
sampler (An instance of one of the sampler() classes which has a next function implemented and a num_indices property.) – This sampler is called each time gradient is called and sets the internal function_num passed to the approximate_gradient function. The num_indices must match the number of functions provided. Default is Sampler.random_with_replacement(len(functions)).

Note

property L#

Returns the Lipschitz constant for the SumFunction

\[L = \sum_{i} L_{i}\]

where \(L_{i}\) is the Lipschitz constant of the smooth function \(F_{i}\).

property Lmax#

Returns the maximum Lipschitz constant for the SumFunction

\[L = \max_{i}\{L_{i}\}\]

where \(L_{i}\) is the Lipschitz constant of the smooth function \(F_{i}\).

approximate_gradient(x, function_num, out=None)#

SAG approximate gradient, calculated at the point \(x\) and updated using the function index given by function_num.

Parameters:

x (DataContainer (e.g. ImageData object)) – Element in the domain of the functions
function_num (int) – Between 0 and the number of functions in the list

centered_at(center)#

Returns a translated function, namely if we have a function \(F(x)\) the center is at the origin.: TranslateFunction is \(F(x - b)\) and the center is at point b.

Parameters:: center (DataContainer) – The point to center the function at.
Return type:: The translated function.

convex_conjugate(x)#

Evaluation of the function F* at x, where F* is the convex conjugate of function F,

\[F^{*}(x^{*}) = \underset{x}{\sup} \langle x^{*}, x \rangle - F(x)\]

Parameters:: x (DataContainer)
Return type:: The value of the convex conjugate of the function at x.

property data_passes#: The property data_passes is a list of floats that holds the amount of data that has been processed up until each call of gradient. This list is updated each time gradient is called by appending the proportion of the data used when calculating the approximate gradient since the class was initialised (a full gradient calculation would be 1 full data pass). Note that if your functions do not contain an equal amount of data, for example your data was not partitioned into equal batches, then you must first use the set_data_partition_weights function for this to be accurate.

property data_passes_indices#: The property data_passes_indices is a list of lists holding the indices of the functions that are processed in each call of gradient. This list is updated each time gradient is called by appending a list of the indices of the functions used to calculate the gradient. This is overwritten from the base class to first check to see if the approximate gradient was warm started and, if it was, ensure that the first element of data_passes_indices contains each index used to warm start and the index used in the first call to gradient. Thus the length of data_passes_indices is always equal to the number of calls to gradient.

full_gradient(x, out=None)#

Returns the value of the full gradient of the sum of functions at \(x\).

\[\nabla_x(f_{0} + f_{1} + ... + f_{n-1})(x) = \nabla_xf_{0}(x) + \nabla_xf_{1}(x) + ... + \nabla_xf_{n-1}(x)\]

Parameters:

x (DataContainer)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Returns:

The value of the gradient of the sum function at x or nothing if out

Return type:

DataContainer

gradient(x, out=None)#

Selects a random function using the sampler and then calls the approximate gradient at x

Parameters:

x (DataContainer)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Returns:

the value of the approximate gradient of the sum function at x

Return type:

DataContainer

proximal(x, tau, out=None)#

Returns the proximal operator of function \(\tau F\) evaluated at x

\[\text{prox}_{\tau F}(x) = \underset{z}{\text{argmin}} \frac{1}{2}\|z - x\|^{2} + \tau F(z)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the proximal operator of the function at x with scalar \(\tau\).

proximal_conjugate(x, tau, out=None)#

Returns the proximal operator of the convex conjugate of function \(\tau F\) evaluated at \(x^{*}\)

\[\text{prox}_{\tau F^{*}}(x^{*}) = \underset{z^{*}}{\text{argmin}} \frac{1}{2}\|z^{*} - x^{*}\|^{2} + \tau F^{*}(z^{*})\]

Due to Moreau’s identity, we have an analytic formula to compute the proximal operator of the convex conjugate \(F^{*}\)

\[\text{prox}_{\tau F^{*}}(x) = x - \tau\text{prox}_{\tau^{-1} F}(\tau^{-1}x)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the proximal operator of the convex conjugate at point \(x\) for scalar \(\tau\) or None if out.

set_data_partition_weights(weights)#

Setter for the partition weights used to calculate the data passes

Parameters:: weights (list of positive floats that sum to one.) – The proportion of the data held in each function. Equivalent to the proportions that you partitioned your data into.

warm_start_approximate_gradients(initial)#

A function to warm start SAG or SAGA algorithms by initialising all the gradients at an initial point. Equivalently setting \(g_i^0 = \nabla f_i(x_0)\) for initial point \(x_0\).

Parameters:: initial (DataContainer,) – The initial point to warmstart the calculation

Note

When using SAG or SAGA with a deterministic algorithm, you should warm start the SAG-SAGA Function with the same initial point that you initialise the algorithm

Stochastic Variance Reduced Gradient Function#

class cil.optimisation.functions.SVRGFunction(functions, sampler=None, snapshot_update_interval=None, store_gradients=False)[source]#

The Stochastic Variance Reduced Gradient (SVRG) function calculates the approximate gradient of \(\sum_{i=1}^{n-1}f_i\). For this approximation, every snapshot_update_interval number of iterations, a full gradient calculation is made at this “snapshot” point. Intermediate gradient calculations update this snapshot by taking a index \(i_k\) and calculating the gradient of \(f_{i_k}\)’s at the current iterate and the snapshot, updating the approximate gradient to be:

\[n*\nabla f_{i_k}(x_k) - n*\nabla f_{i_k}(\tilde{x}) + \nabla \sum_{i=0}^{n-1}f_i(\tilde{x}),\]

where \(\tilde{x}\) is the latest “snapshot” point and \(x_k\) is the value at the current iteration.

Note

Compared with the literature, we multiply by \(n\), the number of functions, so that we return an approximate gradient of the whole sum function and not an average gradient.

Note

In the case where store_gradients=False the memory requirements are 4 times the image size (1 stored full gradient at the “snapshot”, one stored “snapshot” point and two lots of intermediary calculations). Alternatively, if store_gradients=True the memory requirement is n+4 (n gradients at the snapshot for each function in the sum, one stored full gradient at the “snapshot”, one stored “snapshot” point and two lots of intermediary calculations).

Johnson, R. and Zhang, T., 2013. Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems, 26.https://proceedings.neurips.cc/paper_files/paper/2013/file/ac1dd209cbcc5e5d1c6e28598e8cbbe8-Paper.pdf

Parameters:

functions (list of functions) – A list of functions: [f_{0}, f_{1}, ..., f_{n-1}]. Each function is assumed to be smooth with an implemented gradient() method. All functions must have the same domain. The number of functions must be strictly greater than 1.
sampler (An instance of a CIL Sampler class ( sampler()) or of another class which has a next function implemented to output integers in {0, 1, …, n-1}.) – This sampler is called each time gradient is called and sets the internal function_num passed to the approximate_gradient function. Default is Sampler.random_with_replacement(len(functions)).
snapshot_update_interval (positive int or None, optional) – The interval for updating the full gradient (taking a snapshot). The default is 2*len(functions) so a “snapshot” is taken every 2*len(functions) iterations. If the user passes 0 then no full gradient snapshots will be taken.
store_gradients (bool, default: False) – Flag indicating whether to store and update a list of gradients for each function \(f_i\) or just to store the snapshot point \(\tilde{x}\) and its gradient \(\nabla \sum_{i=0}^{n-1}f_i(\tilde{x})\).

gradient(x, out=None)[source]#

Selects a random function using the sampler and then calls the approximate gradient at x or calculates a full gradient depending on the update frequency

Parameters:

x (DataContainer (e.g. ImageData object))
out (return DataContainer, if None a new DataContainer is returned, default None.)

Returns:

the value of the approximate gradient of the sum function at x

Return type:

DataContainer (e.g. ImageData object)

approximate_gradient(x, function_num, out=None)[source]#

Calculates the stochastic gradient at the point \(x\) by using the gradient of the selected function, indexed by \(i_k\), the function_number in {0,…,len(functions)-1}, and the full gradient at the snapshot \(\tilde{x}\)

\[n*\nabla f_{i_k}(x_k) - n*\nabla f_{i_k}(\tilde{x}) + \nabla \sum_{i=0}^{n-1}f_i(\tilde{x})\]

Note

Compared with the literature, we multiply by \(n\), the number of functions, so that we return an approximate gradient of the whole sum function and not an average gradient.

Parameters:

x (DataContainer (e.g. ImageData object))
out (return DataContainer, if None a new DataContainer is returned, default None.)
function_num (int) – Between 0 and n-1, where n is the number of functions in the list

Returns:

the value of the approximate gradient of the sum function at x given a function_number in {0,…,len(functions)-1}

Return type:

DataContainer (e.g. ImageData object)

property L#

Returns the Lipschitz constant for the SumFunction

\[L = \sum_{i} L_{i}\]

where \(L_{i}\) is the Lipschitz constant of the smooth function \(F_{i}\).

property Lmax#

Returns the maximum Lipschitz constant for the SumFunction

\[L = \max_{i}\{L_{i}\}\]

where \(L_{i}\) is the Lipschitz constant of the smooth function \(F_{i}\).

centered_at(center)#

Returns a translated function, namely if we have a function \(F(x)\) the center is at the origin.: TranslateFunction is \(F(x - b)\) and the center is at point b.

Parameters:: center (DataContainer) – The point to center the function at.
Return type:: The translated function.

convex_conjugate(x)#

Evaluation of the function F* at x, where F* is the convex conjugate of function F,

\[F^{*}(x^{*}) = \underset{x}{\sup} \langle x^{*}, x \rangle - F(x)\]

Parameters:: x (DataContainer)
Return type:: The value of the convex conjugate of the function at x.

property data_passes#: The property data_passes is a list of floats that holds the amount of data that has been processed up until each call of gradient. This list is updated each time gradient is called by appending the proportion of the data used when calculating the approximate gradient since the class was initialised (a full gradient calculation would be 1 full data pass). Note that if your functions do not contain an equal amount of data, for example your data was not partitioned into equal batches, then you must first use the set_data_partition_weights function for this to be accurate.

property data_passes_indices#: The property data_passes_indices is a list of lists holding the indices of the functions that are processed in each call of gradient. This list is updated each time gradient is called by appending a list of the indices of the functions used to calculate the gradient.

full_gradient(x, out=None)#

Returns the value of the full gradient of the sum of functions at \(x\).

\[\nabla_x(f_{0} + f_{1} + ... + f_{n-1})(x) = \nabla_xf_{0}(x) + \nabla_xf_{1}(x) + ... + \nabla_xf_{n-1}(x)\]

Parameters:

x (DataContainer)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Returns:

The value of the gradient of the sum function at x or nothing if out

Return type:

DataContainer

proximal(x, tau, out=None)#

Returns the proximal operator of function \(\tau F\) evaluated at x

\[\text{prox}_{\tau F}(x) = \underset{z}{\text{argmin}} \frac{1}{2}\|z - x\|^{2} + \tau F(z)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the proximal operator of the function at x with scalar \(\tau\).

proximal_conjugate(x, tau, out=None)#

Returns the proximal operator of the convex conjugate of function \(\tau F\) evaluated at \(x^{*}\)

\[\text{prox}_{\tau F^{*}}(x^{*}) = \underset{z^{*}}{\text{argmin}} \frac{1}{2}\|z^{*} - x^{*}\|^{2} + \tau F^{*}(z^{*})\]

Due to Moreau’s identity, we have an analytic formula to compute the proximal operator of the convex conjugate \(F^{*}\)

\[\text{prox}_{\tau F^{*}}(x) = x - \tau\text{prox}_{\tau^{-1} F}(\tau^{-1}x)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the proximal operator of the convex conjugate at point \(x\) for scalar \(\tau\) or None if out.

set_data_partition_weights(weights)#

Setter for the partition weights used to calculate the data passes

Parameters:: weights (list of positive floats that sum to one.) – The proportion of the data held in each function. Equivalent to the proportions that you partitioned your data into.

Loopless Stochastic Variance Reduced Gradient Function#

class cil.optimisation.functions.LSVRGFunction(functions, sampler=None, snapshot_update_probability=None, store_gradients=False, seed=None)[source]#

A class representing a function for Loopless Stochastic Variance Reduced Gradient (SVRG) approximation. This is similar to SVRG, except the full gradient at a “snapshot” is calculated at random intervals rather than at fixed numbers of iterations.

Parameters:

functions (list of functions) – A list of functions: [f_{0}, f_{1}, ..., f_{n-1}]. Each function is assumed to be smooth with an implemented gradient() method. All functions must have the same domain. The number of functions n must be strictly greater than 1.
sampler (An instance of a CIL Sampler class ( sampler()) or of another class which has a next function implemented to output integers in {0,…,n-1}.) – This sampler is called each time gradient is called and sets the internal function_num passed to the approximate_gradient function. Default is Sampler.random_with_replacement(len(functions)).
snapshot_update_probability (positive float, default: 1/n) – The probability of updating the full gradient (taking a snapshot) at each iteration. The default is \(1./n\) so, in expectation, a snapshot will be taken every \(n\) iterations.
store_gradients (bool, default: False) – Flag indicating whether to store and update a list of gradients for each function \(f_i\) or just to store the snapshot point \(\tilde{x}\) and it’s gradient \(\nabla \sum_{i=0}^{n-1}f_i(\tilde{x})\).
seed (int) – Seed for calculating the random update snapshot intervals (generated using numpy.random).

Reference#

Kovalev, D., Horváth, S. &; Richtárik, P.. (2020). Don’t Jump Through Hoops and Remove Those Loops: SVRG and Katyusha are Better Without the Outer Loop. Proceedings of the 31st International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 117:451-467 Available from https://proceedings.mlr.press/v117/kovalev20a.html.

Note

property L#

Returns the Lipschitz constant for the SumFunction

\[L = \sum_{i} L_{i}\]

where \(L_{i}\) is the Lipschitz constant of the smooth function \(F_{i}\).

property Lmax#

Returns the maximum Lipschitz constant for the SumFunction

\[L = \max_{i}\{L_{i}\}\]

where \(L_{i}\) is the Lipschitz constant of the smooth function \(F_{i}\).

approximate_gradient(x, function_num, out=None)#

\[n*\nabla f_{i_k}(x_k) - n*\nabla f_{i_k}(\tilde{x}) + \nabla \sum_{i=0}^{n-1}f_i(\tilde{x})\]

Note

Compared with the literature, we multiply by \(n\), the number of functions, so that we return an approximate gradient of the whole sum function and not an average gradient.

Parameters:

x (DataContainer (e.g. ImageData object))
out (return DataContainer, if None a new DataContainer is returned, default None.)
function_num (int) – Between 0 and n-1, where n is the number of functions in the list

Returns:

the value of the approximate gradient of the sum function at x given a function_number in {0,…,len(functions)-1}

Return type:

DataContainer (e.g. ImageData object)

centered_at(center)#

Returns a translated function, namely if we have a function \(F(x)\) the center is at the origin.: TranslateFunction is \(F(x - b)\) and the center is at point b.

Parameters:: center (DataContainer) – The point to center the function at.
Return type:: The translated function.

convex_conjugate(x)#

Evaluation of the function F* at x, where F* is the convex conjugate of function F,

\[F^{*}(x^{*}) = \underset{x}{\sup} \langle x^{*}, x \rangle - F(x)\]

Parameters:: x (DataContainer)
Return type:: The value of the convex conjugate of the function at x.

property data_passes#: The property data_passes is a list of floats that holds the amount of data that has been processed up until each call of gradient. This list is updated each time gradient is called by appending the proportion of the data used when calculating the approximate gradient since the class was initialised (a full gradient calculation would be 1 full data pass). Note that if your functions do not contain an equal amount of data, for example your data was not partitioned into equal batches, then you must first use the set_data_partition_weights function for this to be accurate.

property data_passes_indices#: The property data_passes_indices is a list of lists holding the indices of the functions that are processed in each call of gradient. This list is updated each time gradient is called by appending a list of the indices of the functions used to calculate the gradient.

full_gradient(x, out=None)#

Returns the value of the full gradient of the sum of functions at \(x\).

\[\nabla_x(f_{0} + f_{1} + ... + f_{n-1})(x) = \nabla_xf_{0}(x) + \nabla_xf_{1}(x) + ... + \nabla_xf_{n-1}(x)\]

Parameters:

x (DataContainer)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Returns:

The value of the gradient of the sum function at x or nothing if out

Return type:

DataContainer

proximal(x, tau, out=None)#

Returns the proximal operator of function \(\tau F\) evaluated at x

\[\text{prox}_{\tau F}(x) = \underset{z}{\text{argmin}} \frac{1}{2}\|z - x\|^{2} + \tau F(z)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the proximal operator of the function at x with scalar \(\tau\).

proximal_conjugate(x, tau, out=None)#

Returns the proximal operator of the convex conjugate of function \(\tau F\) evaluated at \(x^{*}\)

\[\text{prox}_{\tau F^{*}}(x^{*}) = \underset{z^{*}}{\text{argmin}} \frac{1}{2}\|z^{*} - x^{*}\|^{2} + \tau F^{*}(z^{*})\]

Due to Moreau’s identity, we have an analytic formula to compute the proximal operator of the convex conjugate \(F^{*}\)

\[\text{prox}_{\tau F^{*}}(x) = x - \tau\text{prox}_{\tau^{-1} F}(\tau^{-1}x)\]

Parameters:

x (DataContainer)
tau (scalar)
out (return DataContainer, if None a new DataContainer is returned, default None.)

Return type:

DataContainer, the value of the proximal operator of the convex conjugate at point \(x\) for scalar \(\tau\) or None if out.

set_data_partition_weights(weights)#

Setter for the partition weights used to calculate the data passes

Parameters:: weights (list of positive floats that sum to one.) – The proportion of the data held in each function. Equivalent to the proportions that you partitioned your data into.

gradient(x, out=None)[source]#

Selects a random function using the sampler and then calls the approximate gradient at x or calculates a full gradient depending on the update probability.

Parameters:

x (DataContainer (e.g. ImageData objects))
out (return DataContainer, if None a new DataContainer is returned, default None.)

Returns:

the value of the approximate gradient of the sum function at x

Return type:

DataContainer (e.g. ImageData object)

Utilities#

Contains utilities for the CIL optimisation framework.

Samplers#

Here, we define samplers that select from a list of indices {0, 1, …, N-1} either randomly or by some deterministic pattern. The cil.optimisation.utilities.sampler class defines a function next() which gives the next sample. It also has utility to get_samples to access which samples have or will be drawn.

For ease of use we provide the following static methods in cil.optimisation.utilities.sampler that allow you to configure your sampler object rather than initialising the classes directly:

static Sampler.from_function(num_indices, function, prob_weights=None)[source]#

Instantiate a sampler that wraps a function for index selection.

Parameters:

num_indices (int) – The sampler will select from a range of indices 0 to num_indices.
function (callable) – A deterministic function that takes an integer as an argument, representing the iteration number, and returns an integer between 0 and num_indices. The function signature should be function(iteration_number: int) -> int
prob_weights (list of floats of length num_indices that sum to 1. Default is [1 / num_indices] * num_indices) – Consider that the sampler is incremented a large number of times this argument holds the expected number of times each index would be outputted, normalised to 1.

Returns:

An instance of the Sampler class which samples from a function.

Return type:

Sampler

Example

This example creates a sampler that always outputs 2. The probability weights are passed to the sampler as they are not uniform.

>>> num_indices=3
>>>
>>> def my_sampling_function(iteration_number):
>>>     return 2
>>>
>>> sampler = Sampler.from_function(num_indices=num_indices, function=my_sampling_function, prob_weights=[0, 0, 1])
>>> print(list(sampler.view_samples(12)))
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]

Example

This example creates a sampler that outputs sequential indices, starting from 1. The probability weights are not passed to the sampler as they are uniform.

>>> num_indices=10
>>>
>>> def my_sampling_function(iteration_number):
>>>     return (iteration_number+1)%10
>>>
>>> sampler = Sampler.from_function(num_indices=num_indices, function=my_sampling_function)
>>> print(list(sampler.view_samples(25)))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5]

Example

This example creates a sampler that samples in order from a custom list. The num_indices is 6, although note that the index 5 is never output by the sampler. The number of indices must be at least one greater than any of the elements in the custom_list. The probability weights are passed to the sampler as they are not uniform.

>>> custom_list = [0,0,0,0,0,0,3,2,1,4]
>>> num_indices = 6
>>>
>>> def my_sampling_function(iteration_number, custom_list=custom_list]):
>>>    return(custom_list[iteration_number%len(custom_list)])
>>>
>>> sampler = Sampler.from_function(num_indices=num_indices, function=my_sampling_function, prob_weights=[0.6, 0.1, 0.1, 0.1, 0.1, 0.0])
>>> print(list(sampler.view_samples(25)))
[0, 0, 0, 0, 0, 0, 3, 2, 1, 4, 0, 0, 0, 0, 0, 0, 3, 2, 1, 4, 0, 0, 0, 0, 0]
>>> print(sampler)
Sampler that wraps a function that takes an iteration number and selects from a list of indices {0, 1, …, N-1}, where N is the number of indices.
Type : from_function
Current iteration number : 0
number of indices : 6
Probability weights : [0.6, 0.1, 0.1, 0.1, 0.1, 0.0]

static Sampler.sequential(num_indices)[source]#

Instantiates a sampler that outputs sequential indices.

Parameters:: num_indices (int) – The sampler will select from a range of indices 0 to num_indices.
Returns:: An instance of the Sampler class that will generate indices sequentially.
Return type:: Sampler

Example

>>> sampler = Sampler.sequential(10)
>>> print(sampler.view_samples(5))
[0 1 2 3 4]
>>> print(next(sampler))
0
>>> print(sampler.next())
1
>>> print(sampler.get_current_sample())
1

static Sampler.staggered(num_indices, stride)[source]#

Instantiates a sampler which outputs in a staggered order.

Parameters:

num_indices (int) – The sampler will select from a range of indices 0 to num_indices.
stride (int) – The stride between returned indices. The stride should be less than the num_indices.

Returns:

An instance of the Sampler class that will generate indices in a staggered pattern.

Return type:

Sampler

Example

>>> sampler = Sampler.staggered(num_indices=21, stride=4)
>>> print(next(sampler))
0
>>> print(sampler.next())
4
>>> print(sampler.view_samples(5))
[ 0  4  8 12 16]
>>> print(sampler.get_previous_samples())
[0 4]

Example

>>> sampler = Sampler.staggered(num_indices=17, stride=8)
>>> print(next(sampler))
0
>>> print(sampler.next())
8
>>> print(sampler.view_samples(10))
[ 0  8  16 1 9 2 10 3 11 4]
>>> print(sampler.get_previous_samples())
[0 8]
>>> print(sampler.get_current_sample())
8

static Sampler.herman_meyer(num_indices)[source]#

Instantiates a sampler which outputs in a Herman Meyer order.

Parameters:: num_indices (int) – The sampler will select from a range of indices 0 to num_indices. For Herman-Meyer sampling this number should not be prime.
Returns:: An instance of the Sampler class which outputs in a Herman Meyer order.
Return type:: Sampler

Reference#

With thanks to Imraj Singh and Zeljko Kereta for their help with the initial implementation of the Herman Meyer sampling. Their implementation was used in:

Singh I, et al. Deep Image Prior PET Reconstruction using a SIRF-Based Objective - IEEE MIC, NSS & RTSD 2022. https://discovery.ucl.ac.uk/id/eprint/10176077/1/MIC_Conference_Record.pdf

The sampling method was introduced in:

Herman GT, Meyer LB. Algebraic reconstruction techniques can be made computationally efficient. IEEE Trans Med Imaging. doi: 10.1109/42.241889.

Example

>>> sampler=Sampler.herman_meyer(12)
>>> print(sampler.view_samples(16))
[ 0  6  3  9  1  7  4 10  2  8  5 11  0  6  3  9]

static Sampler.random_with_replacement(num_indices, prob=None, seed=None)[source]#

Instantiates a sampler which outputs an index between 0 - num_indices with a given probability.

Parameters:

num_indices (int) – The sampler will select from a range of indices 0 to num_indices
prob (list of floats, optional) – The probability for each index to be selected by the ‘next’ operation. If not provided, the indices will be sampled uniformly. The list should have a length equal to num_indices, and the values should sum to 1
seed (int, optional) – Used to initialise the random number generator where repeatability is required.

Returns:

An instance of the RandomSampler class that will generate indices randomly with replacement

Return type:

RandomSampler

Example

>>> sampler = Sampler.random_with_replacement(5)
>>> print(sampler.view_samples(10))
[3 4 0 0 2 3 3 2 2 1]
>>> print(next(sampler))
3
>>> print(sampler.next())
4
>>> print(sampler.get_current_sample())
4

Example

>>> sampler = Sampler.random_with_replacement(num_indices=4, prob=[0.7,0.1,0.1,0.1])
>>> print(sampler.view_samples(10))
[0 1 3 0 0 3 0 0 0 0]
>>> print(sampler.get_previous_samples())
[]

static Sampler.random_without_replacement(num_indices, seed=None)[source]#

Instantiates a sampler which outputs an index between 0 - num_indices. Once sampled the index will not be sampled again until all indices have been returned.

Parameters:

num_indices (int) – The sampler will select from a range of indices 0 to num_indices.
seed (int, optional) – Used to initialise the random number generator where repeatability is required.

Returns:

An instance of the RandomSampler class that will generate indices randomly without replacement

Return type:

RandomSampler

Example

>>> sampler=Sampler.randomWithoutReplacement(num_indices=7, seed=1)
>>> print(sampler.view_samples(16))
[6 2 1 0 4 3 5 1 0 4 2 5 6 3 3 2]

They will all instantiate a Sampler defined in the following class:

class cil.optimisation.utilities.Sampler(num_indices, function, sampling_type=None, prob_weights=None)[source]#

Initialises a sampler that returns and then increments indices from a sequence defined by a function.

Static methods to easily configure several samplers are provided, such as sequential, staggered, Herman-Mayer, random with and without replacement.

Custom deterministic samplers can be created by using the from_function static method or by subclassing this sampler class.

Parameters:

function (Callable[[int], int]) – A function that takes an integer iteration number and returns an integer between 0 and num_indices.
num_indices (int) – The sampler will select from a range of indices 0 to num_indices.
sampling_type (str, optional, default = None) – The sampling type used. This is recorded for reference and printed when print is called.
prob_weights (list of floats of length num_indices that sum to 1. Default is [1 / num_indices] * num_indices) – Consider that the sampler is incremented a large number of times this argument holds the expected number of times each index would be outputted, normalised to 1.

Returns:

An instance of the Sampler class representing the desired configuration.

Return type:

Sampler

Example

>>> sampler = Sampler.random_with_replacement(5)
>>> print(sampler.view_samples(20))
[3 4 0 0 2 3 3 2 2 1 1 4 4 3 0 2 4 4 2 4]
>>> print(next(sampler))
3
>>> print(sampler.next())
4
>>> print(sampler.get_current_sample())
4

Example

>>> sampler = Sampler.staggered(num_indices=21, stride=4)
>>> print(next(sampler))
0
>>> print(sampler.next())
4
>>> print(sampler.view_samples(5))
[ 0  4  8 12 16]
>>> print(sampler.get_previous_samples())
[0 4]

Example

>>> sampler = Sampler.sequential(10)
>>> print(sampler.get_previous_samples())
[]
>>> print(sampler.view_samples(5))
[0 1 2 3 4]
>>> print(next(sampler))
0
>>> print(sampler.next())
1
>>> print(sampler.get_current_sample())
1
>>> print(sampler.get_previous_samples())
[0 1]

Example

>>> sampler = Sampler.herman_meyer(12)
>>> print(sampler.view_samples(16))
[ 0  6  3  9  1  7  4 10  2  8  5 11  0  6  3  9]

Example

This example creates a sampler that outputs sequential indices, starting from 1.

>>> num_indices=10
>>>
>>> def my_sampling_function(iteration_number):
>>>     return (iteration_number+1)%10
>>>
>>> sampler = Sampler.from_function(num_indices=num_indices, function=my_sampling_function)
>>> print(list(sampler.view_samples(25)))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5]

Note

The optimal choice of sampler depends on the data and the number of calls to the sampler. Note that a low number of calls to a random sampler won’t give an even distribution. For a small number of samples (e.g. <5*num_indices) the user may wish to consider another sampling method e.g. random without replacement, which, when calling num_indices samples is guaranteed to draw each index exactly once.

The random samplers are instantiated from a random sampling class which is a child class of cil.optimisation.utilities.sampler and provides options for sampling with and without replacement:

class cil.optimisation.utilities.SamplerRandom(num_indices, seed=None, replace=True, prob=None, sampling_type='random_with_replacement')[source]#

The user is recommended to not instantiate this class directly but instead use one of the static methods in the parent Sampler class that will return instances of different samplers.

This class produces Samplers that output random samples with and without replacement from the set {0, 1, …, N-1} where N=num_indices.

Custom random samplers can be created by subclassing this sampler class.

Parameters:

num_indices (int) – The sampler will select from a range of indices 0 to num_indices.
sampling_type (str, optional, default = 'random_with_replacement") – The sampling type used. This is recorded for reference and printed when print is called.
prob_weights (list of floats of length num_indices that sum to 1. Default is [1 / num_indices] * num_indices) – Consider that the sampler is incremented a large number of times this argument holds the expected number of times each index would be outputted, normalised to 1.
replace (bool, default is True) – If True, sample with replace, otherwise sample without replacement
seed (int, optional) – Used to initialise the random number generator where repeatability is required.

Returns:

An instance of the Sampler class representing the desired configuration.

Return type:

Sampler

Example

>>> sampler = Sampler.random_with_replacement(5)
>>> print(sampler.view_samples(20))
[3 4 0 0 2 3 3 2 2 1 1 4 4 3 0 2 4 4 2 4]

Example

>>> sampler=Sampler.randomWithoutReplacement(num_indices=7, seed=1)
>>> print(sampler.view_samples(16))
[6 2 1 0 4 3 5 1 0 4 2 5 6 3 3 2]

Callbacks#

A list of Callback s to be executed each iteration can be passed to Algorithm’s run method.

from cil.optimisation.utilities.callbacks import LogfileCallback
...
algorithm.run(..., callbacks=[LogfileCallback("log.txt")])

class cil.optimisation.utilities.callbacks.Callback(verbose=1)[source]#

Base Callback to inherit from for use in Algorithm.run(callbacks: list[Callback]).

Parameters:: verbose (int, choice of 0,1,2, default 1) – 0=quiet, 1=info, 2=debug.

Built-in callbacks include:

class cil.optimisation.utilities.callbacks.ProgressCallback(verbose=1, tqdm_class=<class 'tqdm.asyncio.tqdm_asyncio'>, **tqdm_kwargs)[source]#

tqdm-based progress bar.

Parameters:

tqdm_class (default tqdm.auto.tqdm)
**tqdm_kwargs – Passed to tqdm_class.

class cil.optimisation.utilities.callbacks.TextProgressCallback(verbose=1, *, tqdm_class=<class 'cil.optimisation.utilities.callbacks._TqdmText'>, **tqdm_kwargs)[source]#

ProgressCallback but printed on separate lines to screen.

Parameters:: miniters (int, default Algorithm.update_objective_interval) – Number of algorithm iterations between screen prints.

class cil.optimisation.utilities.callbacks.LogfileCallback(log_file, mode='a', **kwargs)[source]#

TextProgressCallback but to a file instead of screen.

Parameters:

log_file (FileDescriptorOrPath) – Passed to open().
mode (str) – Passed to open().

class cil.optimisation.utilities.callbacks.EarlyStoppingObjectiveValue(threshold=1e-06)[source]#

Callback that stops iterations if the change in the objective value is less than a provided threshold value.

Parameters:: threshold (float, default 1e-6)

Note

This callback only compares the last two calculated objective values. If update_objective_interval is greater than 1, the objective value is not calculated at each iteration (which is the default behaviour), only every update_objective_interval iterations.

class cil.optimisation.utilities.callbacks.CGLSEarlyStopping(epsilon=1e-06, omega=1000000.0)[source]#

Callback to work with CGLS. It causes the algorithm to terminate if \(||A^T(Ax-b)||_2 < \epsilon||A^T(Ax_0-b)||_2\) where epsilon is set to default as ‘1e-6’, \(x\) is the current iterate and \(x_0\) is the initial value. It will also terminate if the algorithm begins to diverge i.e. if \(||x||_2> \omega\), where omega is set to default as 1e6.

Parameters:

epsilon (float, default 1e-6) – Usually a small number: the algorithm to terminate if \(||A^T(Ax-b)||_2 < \epsilon||A^T(Ax_0-b)||_2\)
omega (float, default 1e6) – Usually a large number: the algorithm will terminate if \(||x||_2> \omega\)

Note

This callback is implemented to replicate the automatic behaviour of CGLS in CIL versions <=24. It also replicates the behaviour of https://web.stanford.edu/group/SOL/software/cgls/.

Users can also write custom callbacks.

Below is an example of a custom callback implementing early stopping. In each iteration of the TestAlgo, the objective \(x\) is reduced by \(5\). The EarlyStopping callback terminates the algorithm when \(x \le -15\). The algorithm thus terminates after \(3\) iterations.

from cil.optimisation.algorithms import Algorithm
from cil.optimisation.utilities import callbacks

class TestAlgo(Algorithm):
    def __init__(self, *args, **kwargs):
        self.x = 0
        super().__init__(*args, **kwargs)
        self.configured = True

    def update(self):
        self.x -= 5

    def update_objective(self):
        self.loss.append(2 ** self.x)

class EarlyStopping(callbacks.Callback):
    def __call__(self, algorithm: Algorithm):
        if algorithm.x <= -15:  # arbitrary stopping criterion
            raise StopIteration

algo = TestAlgo()
algo.run(20, callbacks=[callbacks.ProgressCallback(), EarlyStopping()])

Output:
 15%|███                 | 3/20 [00:00<00:00, 11770.73it/s, objective=3.05e-5]

Step size methods#

A step size method is a class which acts on an algorithm and can be passed to cil.optimisation.algorithm.GD, cil.optimisation.algorithm.ISTA cil.optimisation.algorithm.FISTA and it’s method get_step_size is called after the calculation of the gradient before the gradient descent step is taken. It outputs a float value to be used as the step-size.

Currently in CIL we have a base class:

class cil.optimisation.utilities.StepSizeMethods.StepSizeRule[source]#

Abstract base class for a step size rule. The abstract method, get_step_size takes in an algorithm and thus can access all parts of the algorithm (e.g. current iterate, current gradient, objective functions etc) and from this should return a float as a step size.

abstractmethod get_step_size(algorithm)[source]#

Returns:: the calculated step size
Return type:: float

We also have a number of example classes:

class cil.optimisation.utilities.StepSizeMethods.ConstantStepSize(step_size)[source]#

Step-size rule that always returns a constant step-size.

Parameters:: step_size (float) – The step-size to be returned with each call.

get_step_size(algorithm)[source]#

Returns:: the calculated step size
Return type:: float

class cil.optimisation.utilities.StepSizeMethods.ArmijoStepSizeRule(alpha=1000000.0, beta=0.5, max_iterations=None, warmstart=True)[source]#

Applies the Armijo rule to calculate the step size (step_size).

The Armijo rule runs a while loop to find the appropriate step_size by starting from a very large number (alpha). The step_size is found by reducing the step size (by a factor beta) in an iterative way until a certain criterion is met. To avoid infinite loops, we add a maximum number of times (max_iterations) the while loop is run.

Reference#

Algorithm 3.1 in Nocedal, J. and Wright, S.J. eds., 1999. Numerical optimization. New York, NY: Springer New York. https://www.math.uci.edu/~qnie/Publications/NumericalOptimization.pdf)
https://projecteuclid.org/download/pdf_1/euclid.pjm/1102995080

param alpha:: The starting point for the step size iterations
type alpha:: float, optional, default=1e6
param beta:: The amount the step_size is reduced if the criterion is not met
type beta:: float between 0 and 1, optional, default=0.5
param max_iterations:: The maximum number of iterations to find a suitable step size
type max_iterations:: integer, optional, default is numpy.ceil (2 * numpy.log10(alpha) / numpy.log10(2))
param warmstart:: If warmstart = True the initial step size at each Armijo iteration is the calculated step size from the last iteration. If warmstart = False at each Armijo iteration, the initial step size is reset to the original, large alpha. In the case of well-behaved convex functions, warmstart = True is likely to be computationally less expensive. In the case of non-convex functions, or particularly tricky functions, setting warmstart = False may be beneficial.
type warmstart:: Boolean, default is True

get_step_size(algorithm)[source]#

Applies the Armijo rule to calculate the step size (step_size)

Returns:: the calculated step size
Return type:: float

class cil.optimisation.utilities.StepSizeMethods.BarzilaiBorweinStepSizeRule(initial, mode='short', stabilisation_param='auto')[source]#

Applies the Barzilai- Borwein rule to calculate the step size (step_size).

Let \(\Delta x=x_k-x_{k-1}\) and \(\Delta g=g_k-g_{k-1}\). Where \(x_k\) is the \(k\) th iterate (current solution after iteration \(k\) ) and \(g_k\) is the gradient calculation in the \(k\) th iterate, found in algorithm.gradient_update. A Barzilai-Borwein (BB) iteration is \(x_{k+1}=x_k-\alpha_kg_k\) where the step size \(\alpha _k\) is either

\(\alpha_k^{LONG}=\frac{\Delta x\cdot\Delta x}{\Delta x\cdot\Delta g}\), or
\(\alpha_k^{SHORT}=\frac{\Delta x \cdot\Delta g}{\Delta g \cdot\Delta g}\).

Where the operator \(\cdot\) is the standard inner product between two vectors.

This is suitable for use with gradient based iterative methods where the calculated gradient is stored as algorithm.gradient_update.

Parameters:

initial (float, greater than zero) – The step-size for the first iteration. We recommend something of the order \(1/f.L\) where \(f\) is the (differentiable part of) the objective you wish to minimise.
mode (One of 'long', 'short' or 'alternate', default is 'short'.) – This calculates the step-size based on the LONG, SHORT or alternating between the two, starting with short.
stabilisation_param ('auto', float or 'off', default is 'auto') – In order to add stability the step-size has an upper limit of \(\Delta/\|g_k\|\) where by ‘default’, the stabilisation_param, \(\Delta\) is determined automatically to be the minimium of \(\Delta x\) from the first 3 iterations. The user can also pass a fixed constant or turn “off” the stabilisation, equivalently passing np.inf.

Reference#

Barzilai, Jonathan; Borwein, Jonathan M. (1988). “Two-Point Step Size Gradient Methods”. IMA Journal of Numerical Analysis. 8: 141–148, https://doi.org/10.1093/imanum/8.1.141
Burdakov, O., Dai, Y. and Huang, N., 2019. STABILIZED BARZILAI-BORWEIN METHOD. Journal of Computational Mathematics, 37(6). https://doi.org/10.4208/jcm.1911-m2019-0171
https://en.wikipedia.org/wiki/Barzilai-Borwein_method

get_step_size(algorithm)[source]#

Applies the B-B rule to calculate the step size (step_size)

Returns:: the calculated step size
Return type:: float

Preconditioners#

A preconditioner is a class which acts on an algorithm and can be passed to cil.optimisation.algorithm.GD, cil.optimisation.algorithm.ISTA or cil.optimisation.algorithm.FISTA and it’s method apply is called after the calculation of the gradient before the gradient descent step is taken. It modifies and returns a passed gradient.

Currently in CIL we have a base class:

class cil.optimisation.utilities.preconditioner.Preconditioner[source]#

Abstract base class for Preconditioner objects. The apply method of this class takes an initialised CIL function as an argument and modifies a provided gradient.

apply(x)[source]#: Abstract method to call the preconditioner.

abstractmethod apply(algorithm, gradient, out)[source]#

Abstract method to apply the preconditioner.

Parameters:

algorithm (Algorithm) – The algorithm object.
gradient (DataContainer) – The calculated gradient to modify
out (DataContainer,) – Container to fill with the modified gradient.

Note

In CIL algorithms, the preconditioners are used in-place. Make sure this method is safe to use in place.

Returns:: The modified gradient
Return type:: DataContainer

We also have a number of already provided pre-conditioners

class cil.optimisation.utilities.preconditioner.Sensitivity(operator)[source]#

Sensitivity preconditioner class.

In each call to the preconditioner the gradient is multiplied by \(1/(A^T \mathbf{1})\) where \(A\) is an operator, \(\mathbf{1}\) is an object in the range of the operator filled with ones.

Parameters:: operator (CIL Operator) – The operator used for sensitivity computation.

compute_preconditioner_matrix()[source]#: Compute the sensitivity. \(A^T \mathbf{1}\) where \(A\) is the operator and \(\mathbf{1}\) is an object in the range of the operator filled with ones. Then perform safe division by the sensitivity to store the preconditioner array \(1/(A^T \mathbf{1})\)

apply(algorithm, gradient, out=None)[source]#

Update the preconditioner.

Parameters:

algorithm (object) – The algorithm object.
gradient (DataContainer) – The calculated gradient to modify
out (DataContainer,) – Container to fill with the modified gradient

Returns:

The modified gradient

Return type:

DataContainer

class cil.optimisation.utilities.preconditioner.AdaptiveSensitivity(operator, delta=1e-06, max_iterations=100, reference=None)[source]#

Adaptive Sensitivity preconditioner class.

In each call to the preconditioner the gradient is multiplied by \((x+\delta) /(A^T \mathbf{1})\) where \(A\) is an operator, \(\mathbf{1}\) is an object in the range of the operator filled with ones. The point \(x\) is the current iteration, or a reference image, and \(\delta\) is a small positive float.

Parameters:

operator (CIL object) – The operator used for sensitivity computation.
delta (float, optional) – The delta value for the preconditioner.
reference (DataContainer e.g. ImageData, default is None) – Reference data, an object in the domain of the operator. Recommended to be a best guess reconstruction. If reference data is passed the preconditioner is always fixed.
max_iterations (int, default = 100) – The maximum number of iterations before the preconditoner is frozen and no-longer updates. Note that if reference data is passed the preconditioner is always frozen and iterations is set to -1.

Note

A reference for the freezing of the preconditioner can be found: Twyman R., Arridge S., Kereta Z., Jin B., Brusaferri L., Ahn S., Stearns CW., Hutton B.F., Burger I.A., Kotasidis F., Thielemans K.. An Investigation of Stochastic Variance Reduction Algorithms for Relative Difference Penalized 3D PET Image Reconstruction. IEEE Trans Med Imaging. 2023 Jan;42(1):29-41. doi: 10.1109/TMI.2022.3203237. Epub 2022 Dec 29. PMID: 36044488.

apply(algorithm, gradient, out=None)[source]#

Update the preconditioner.

Parameters:

algorithm (object) – The algorithm object.
gradient (DataContainer) – The calculated gradient to modify
out (DataContainer,) – Container to fill with the modified gradient

Returns:

The modified gradient

Return type:

DataContainer

Block Framework#

To be able to express more advanced optimisation problems we developed the Block Framework, which provides a generic strategy to treat variational problems in the following form:

\[\min \text{Regulariser} + \text{Fidelity}\]

The block framework consists of:

BlockDataContainer
BlockFunction
BlockOperator

The block framework allows writing more advanced optimisation problems. Consider the typical Tikhonov regularisation:

\[\underset{u}{\mathrm{argmin}}\begin{Vmatrix}A u - b \end{Vmatrix}^2_2 + \alpha^2\|Lu\|^2_2\]

where,

\(A\) is the projection operator
\(b\) is the acquired data
\(u\) is the unknown image to be solved for
\(\alpha\) is the regularisation parameter
\(L\) is a regularisation operator

The first term measures the fidelity of the solution to the data. The second term measures the fidelity to the prior knowledge we have imposed on the system, operator \(L\).

This can be re-written equivalently in the block matrix form:

\[\underset{u}{\mathrm{argmin}}\begin{Vmatrix}\binom{A}{\alpha L} u - \binom{b}{0}\end{Vmatrix}^2_2\]

With the definitions:

\(\tilde{A} = \binom{A}{\alpha L}\)
\(\tilde{b} = \binom{b}{0}\)

this can now be recognised as a least squares problem which can be solved by any algorithm in the cil.optimisation which can solve least squares problem, e.g. CGLS.

\[\underset{u}{\mathrm{argmin}}\begin{Vmatrix}\tilde{A} u - \tilde{b}\end{Vmatrix}^2_2\]

To be able to express our optimisation problems in the matrix form above, we developed the so-called, Block Framework comprising 4 main actors: BlockGeometry, BlockDataContainer, BlockFunction and BlockOperator.

BlockDataContainer#

BlockDataContainer holds DataContainer as column vector. It is possible to do basic algebra between BlockDataContainer s and with numbers, list or numpy arrays.

\[ \begin{align}\begin{aligned}x = [x_{1}, x_{2} ]\in (X_{1}\times X_{2})\\y = [y_{1}, y_{2}, y_{3} ]\in(Y_{1}\times Y_{2} \times Y_{3})\end{aligned}\end{align} \]

class cil.framework.BlockDataContainer(*args, **kwargs)[source]#

Class to hold DataContainers as column vector

Provides basic algebra between BlockDataContainer’s, DataContainer’s and subclasses and Numbers

algebra between `BlockDataContainer`s will be element-wise, only if the shape of the 2 `BlockDataContainer`s is the same, otherwise it will fail
algebra between BlockDataContainer`s and `list or numpy array will work as long as the number of rows and element of the arrays match, independently on the fact that the BlockDataContainer could be nested
algebra between BlockDataContainer and one DataContainer is possible. It will require all the DataContainers in the block to be compatible with the DataContainer we want to operate with.
algebra between BlockDataContainer and a Number is possible and it will be done with each element of the BlockDataContainer even if nested

A = [ [B,C] , D] A * 3 = [ 3 * [B,C] , 3* D] = [ [ 3*B, 3*C] , 3*D ]

__iter__()[source]#: BlockDataContainer is Iterable

next()[source]#: python2 backwards compatibility

is_compatible(other)[source]#: basic check if the size of the 2 objects fit

add(other, *args, **kwargs)[source]#

Algebra: add method of BlockDataContainer with number/DataContainer or BlockDataContainer

Parameters:

other (number, DataContainer or subclasses or BlockDataContainer)
out (BlockDataContainer, optional) – Provides a placeholder for the result

subtract(other, *args, **kwargs)[source]#

Algebra: subtract method of BlockDataContainer with number/DataContainer or BlockDataContainer

Parameters:

other (number, DataContainer or subclasses or BlockDataContainer)
out (BlockDataContainer, optional) – Provides a placeholder for the result

multiply(other, *args, **kwargs)[source]#

Algebra: multiply method of BlockDataContainer with number/DataContainer or BlockDataContainer

Parameters:

other (number, DataContainer or subclasses or BlockDataContainer)
out (BlockDataContainer, optional) – Provides a placeholder for the result

divide(other, *args, **kwargs)[source]#

Algebra: divide method of BlockDataContainer with number/DataContainer or BlockDataContainer

Parameters:

other (number, DataContainer or subclasses or BlockDataContainer)
out (BlockDataContainer, optional) – Provides a placeholder for the result

power(other, *args, **kwargs)[source]#

Algebra: power method of BlockDataContainer with number/DataContainer or BlockDataContainer

Parameters:

other (number, DataContainer or subclasses or BlockDataContainer)
out (BlockDataContainer, optional) – Provides a placeholder for the result

maximum(other, *args, **kwargs)[source]#

Algebra: maximum method of BlockDataContainer with number/DataContainer or BlockDataContainer

Parameters:

other (number, DataContainer or subclasses or BlockDataContainer)
out (BlockDataContainer, optional) – Provides a placeholder for the result

minimum(other, *args, **kwargs)[source]#

Algebra: minimum method of BlockDataContainer with number/DataContainer or BlockDataContainer

Parameters:

other (number, DataContainer or subclasses or BlockDataContainer)
out (BlockDataContainer, optional) – Provides a placeholder for the result

sapyb(a, y, b, out=None, num_threads=2)[source]#

performs axpby element-wise on the BlockDataContainer containers

Does the operation .. math:: a*x+b*y and stores the result in out, where x is self

Parameters:

a (scalar or BlockDataContainer)
b (scalar or BlockDataContainer)
y (compatible (Block)DataContainer)
out (BlockDataContainer, optional) – Provides a placeholder for the result

Example

>>> a = 2
>>> b = 3
>>> ig = ImageGeometry(10,11)
>>> x = ig.allocate(1)
>>> y = ig.allocate(2)
>>> bdc1 = BlockDataContainer(2*x, y)
>>> bdc2 = BlockDataContainer(x, 2*y)
>>> out = bdc1.sapyb(a,bdc2,b)

binary_operations(operation, other, *args, **kwargs)[source]#

Algebra: generic method of algebric operation with BlockDataContainer with number/DataContainer or BlockDataContainer

Provides commutativity with DataContainer and subclasses, i.e. this class’s reverse algebraic methods take precedence w.r.t. direct algebraic methods of DataContainer and subclasses.

This method is not to be used directly

unary_operations(operation, *args, **kwargs)[source]#

Unary operation on BlockDataContainer:

generic method of unary operation with BlockDataContainer: abs, sign, sqrt and conjugate

This method is not to be used directly

copy()[source]#: alias of clone

__radd__(other)[source]#

Reverse addition

to make sure that this method is called rather than the __mul__ of a numpy array the class constant __array_priority__ must be set > 0 https://docs.scipy.org/doc/numpy-1.15.1/reference/arrays.classes.html#numpy.class.__array_priority__

__rsub__(other)[source]#

Reverse subtraction

__rmul__(other)[source]#

Reverse multiplication

__rdiv__(other)[source]#

Reverse division

__rtruediv__(other)[source]#

Reverse truedivision

__rpow__(other)[source]#

Reverse power

__iadd__(other)[source]#: Inline addition

__isub__(other)[source]#: Inline subtraction

__imul__(other)[source]#: Inline multiplication

__idiv__(other)[source]#: Inline division

__itruediv__(other)[source]#: Inline truedivision

__neg__()[source]#: Return - self

__weakref__#: list of weak references to the object

Block Function#

BlockFunction acts on BlockDataContainer as a separable sum function:

\[ \begin{align}\begin{aligned}f = [f_1,...,f_n] \newline\\f([x_1,...,x_n]) = f_1(x_1) + .... + f_n(x_n)\end{aligned}\end{align} \]

\[ \begin{align}\begin{aligned}\begin{split}Y = \begin{bmatrix} y_{1}\\ y_{2}\\ y_{3}\\ \end{bmatrix}, \quad F = [ f_{1}, f_{2}, f_{3} ]\end{split}\\F(Y) : = f_{1}(y_{1}) + f_{2}(y_{2}) + f_{3}(y_{3})\end{aligned}\end{align} \]

class cil.optimisation.functions.BlockFunction(*functions)[source]#

BlockFunction represents a separable sum function \(F\) defined as

\[F:X_{1}\times X_{2}\cdots\times X_{m} \rightarrow (-\infty, \infty]\]

where \(F\) is the separable sum of functions \((f_{i})_{i=1}^{m}\),

\[F(x_{1}, x_{2}, \cdots, x_{m}) = \overset{m}{\underset{i=1}{\sum}}f_{i}(x_{i}), \mbox{ with } f_{i}: X_{i} \rightarrow (-\infty, \infty].\]

A nice property (due to it’s separability structure) is that the proximal operator can be decomposed along the proximal operators of each function \(f_{i}\).

\[\mathrm{prox}_{\tau F}(x) = ( \mathrm{prox}_{\tau f_{i}}(x_{i}) )_{i=1}^{m}\]

In addition, if \(\tau := (\tau_{1},\dots,\tau_{m})\), then

\[\mathrm{prox}_{\tau F}(x) = ( \mathrm{prox}_{\tau_{i} f_{i}}(x_{i}) )_{i=1}^{m}\]

__init__(*functions)[source]#

property L#

Lipschitz of the gradient of function f.

L is positive real number, such that \(\|f'(x) - f'(y)\| \leq L\|x-y\|\), assuming \(f: IG \rightarrow \mathbb{R}\)

__call__(x)[source]#

Returns the value of the BlockFunction \(F\)

\[F(x) = \overset{m}{\underset{i=1}{\sum}}f_{i}(x_{i}), \mbox{ where } x = (x_{1}, x_{2}, \cdots, x_{m}), \quad i = 1,2,\dots,m\]

Parameter:

x : BlockDataContainer and must have as many rows as self.length

returns ..math:: sum(f_i(x_i))

convex_conjugate(x)[source]#

Returns the value of the convex conjugate of the BlockFunction at \(x^{*}\).

\[F^{*}(x^{*}) = \overset{m}{\underset{i=1}{\sum}}f_{i}^{*}(x^{*}_{i})\]

Parameter:

x : BlockDataContainer and must have as many rows as self.length

proximal(x, tau, out=None)[source]#

Proximal operator of the BlockFunction at x:

\[\mathrm{prox}_{\tau F}(x) = (\mathrm{prox}_{\tau f_{i}}(x_{i}))_{i=1}^{m}\]

Parameter:

x : BlockDataContainer and must have as many rows as self.length

gradient(x, out=None)[source]#

Returns the value of the gradient of the BlockFunction function at x.

\[F'(x) = [f_{1}'(x_{1}), ... , f_{m}'(x_{m})]\]

Parameter:

x : BlockDataContainer and must have as many rows as self.length

proximal_conjugate(x, tau, out=None)[source]#

Proximal operator of the convex conjugate of BlockFunction at x:

\[\mathrm{prox}_{\tau F^{*}}(x) = (\mathrm{prox}_{\tau f^{*}_{i}}(x^{*}_{i}))_{i=1}^{m}\]

Parameter:

x : BlockDataContainer and must have as many rows as self.length

__rmul__(other)[source]#

Define multiplication with a scalar

Parameters:: other – number

Returns a new BlockFunction containing the product of the scalar with all the functions in the block

Block Operator#

BlockOperator represent a block matrix with operators

\[\begin{split}K = \begin{bmatrix} A_{1} & A_{2} \\ A_{3} & A_{4} \\ A_{5} & A_{6} \end{bmatrix}_{(3,2)} * \quad \underbrace{\begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix}_{(2,1)}}_{\textbf{x}} = \begin{bmatrix} A_{1}x_{1} + A_{2}x_{2}\\ A_{3}x_{1} + A_{4}x_{2}\\ A_{5}x_{1} + A_{6}x_{2}\\ \end{bmatrix}_{(3,1)} = \begin{bmatrix} y_{1}\\ y_{2}\\ y_{3} \end{bmatrix}_{(3,1)} = \textbf{y}\end{split}\]

Column: Share the same domains \(X_{1}, X_{2}\)

Rows: Share the same ranges \(Y_{1}, Y_{2}, Y_{3}\)

\[K : (X_{1}\times X_{2}) \rightarrow (Y_{1}\times Y_{2} \times Y_{3})\]

\(A_{1}, A_{3}, A_{5}\): share the same domain \(X_{1}\) and \(A_{2}, A_{4}, A_{6}\): share the same domain \(X_{2}\)

\[\begin{split}A_{1}: X_{1} \rightarrow Y_{1} \\ A_{3}: X_{1} \rightarrow Y_{2} \\ A_{5}: X_{1} \rightarrow Y_{3} \\ A_{2}: X_{2} \rightarrow Y_{1} \\ A_{4}: X_{2} \rightarrow Y_{2} \\ A_{6}: X_{2} \rightarrow Y_{3}\end{split}\]

For instance with these ingredients one may write the following objective function,

\[\alpha ||\nabla u||_{2,1} + ||u - g||_2^2\]

where \(g\) represent the measured values, \(u\) the solution \(\nabla\) is the gradient operator, \(|| ~~ ||_{2,1}\) is a norm for the output of the gradient operator and \(|| x-g ||^2_2\) is least squares fidelity function as

\[ \begin{align}\begin{aligned}\begin{split}K = \begin{bmatrix} \nabla \\ \mathbb{1} \end{bmatrix}\end{split}\\F(x) = \Big[ \alpha \lVert ~x~ \rVert_{2,1} ~~ , ~~ || x - g||_2^2 \Big]\\w = [ u ]\end{aligned}\end{align} \]

Then we have rewritten the problem as

\[F(Kw) = \alpha \left\lVert \nabla u \right\rVert_{2,1} + ||u-g||^2_2\]

Which in Python would be like

op1 = GradientOperator(ig, correlation=GradientOperator.CORRELATION_SPACE)
op2 = IdentityOperator(ig, ag)

# Create BlockOperator
K = BlockOperator(op1, op2, shape=(2,1) )

# Create functions
F = BlockFunction(alpha * MixedL21Norm(), 0.5 * L2NormSquared(b=noisy_data))

class cil.optimisation.operators.BlockOperator(*args, **kwargs)[source]#

A Block matrix containing Operators

Parameters:

*args (Operator) – Operators in the block.
**kwargs (dict) – shape (tuple, optional): If shape is passed the Operators in vararg are considered input in a row-by-row fashion.

Note

The Block Framework is a generic strategy to treat variational problems in the following form:

\[\min Regulariser + Fidelity\]

BlockOperators have a generic shape M x N, and when applied on an Nx1 BlockDataContainer, will yield and Mx1 BlockDataContainer.

Note

BlockDatacontainer are only allowed to have the shape of N x 1, with N rows and 1 column.

User may specify the shape of the block, by default is a row vector

Operators in a Block are required to have the same domain column-wise and the same range row-wise.

Examples

BlockOperator(op0,op1) results in a row block

BlockOperator(op0,op1,shape=(1,2)) results in a column block

__init__(*args, **kwargs)[source]#

column_wise_compatible()[source]#: Operators in a Block should have the same domain per column

row_wise_compatible()[source]#: Operators in a Block should have the same range per row

get_item(row, col)[source]#: Returns the Operator at specified row and col :param row: The row index required. :type row: int :param col: The column index required. :type col: int

norm()[source]#: Returns the Euclidean norm of the norms of the individual operators in the BlockOperators

get_norms_as_list()[source]#: Returns a list of the individual norms of the Operators in the BlockOperator

set_norms(norms)[source]#

Uses the set_norm() function in Operator to set the norms of the operators in the BlockOperator from a list of custom values.

Parameters:: norms (list) – A list of positive real values the same length as the number of operators in the BlockOperator.

direct(x, out=None)[source]#

Direct operation for the BlockOperator

Parameters:

x (BlockDataContainer) – The input BlockDataContainer to apply the BlockOperator on. Can be a DataContainer if the domain geometry permits.
out (BlockDataContainer, optional) – The output BlockDataContainer to store the result of the operation. If not provided, a new BlockDataContainer is created. Can be a DataContainer if the range geometry permits.

Note

BlockOperators work on BlockDataContainers, but they will also work on DataContainers and inherited classes by simple wrapping the input in a BlockDataContainer of shape (1,1)

adjoint(x, out=None)[source]#

Adjoint operation for the BlockOperator

Parameters:

x (BlockDataContainer) – The input BlockDataContainer to apply the BlockOperator adjoint on. Can be a DataContainer if the range geometry permits.
out (BlockDataContainer, optional) – The output BlockDataContainer to store the result of the operation. If not provided, a new BlockDataContainer is created. Can be a DataContainer if the domain geometry permits.

Note

BlockOperator may contain both LinearOperator and Operator This method exists in BlockOperator as it is not known what type of Operator it will contain.

BlockOperators work on BlockDataContainers, but they will also work on DataContainers and inherited classes by simple wrapping the input in a BlockDataContainer of shape (1,1)

Raises: ValueError if the contained Operators are not linear

is_linear()[source]#: Returns whether all the elements of the BlockOperator are linear

get_output_shape(xshape, adjoint=False)[source]#

Returns the shape of the output BlockDataContainer :param xshape: :type xshape: BlockDataContainer :param adjoint: :type adjoint: bool

Examples

A(N,M) direct u(M,1) -> N,1

A(N,M)^T adjoint u(N,1) -> M,1

__rmul__(scalar)[source]#

Defines the left multiplication with a scalar. Returns a block operator with Scaled Operators inside.

Parameters:: scalar (number or iterable containing numbers)

property T#

Returns the transposed of self.

Recall the input list is shaped in a row-by-row fashion

domain_geometry()[source]#

Returns the domain of the BlockOperator

If the shape of the BlockOperator is (N,1) the domain is a ImageGeometry or AcquisitionGeometry. Otherwise it is a BlockGeometry.

range_geometry()[source]#: Returns the range of the BlockOperator

__getitem__(index)[source]#: Returns the index-th operator in the block irrespectively of it’s shape

get_as_list()[source]#: Returns the list of operators

Return Home

References#

[1] (1,2,3)

Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183–202, 2009. URL: https://doi.org/10.1137/080716542, arXiv:https://doi.org/10.1137/080716542, doi:10.1137/080716542.

[2] (1,2,3)

Amir Beck and Marc Teboulle. Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Transactions on Image Processing, 18(11):2419–2434, 2009. doi:10.1109/TIP.2009.2028250.

[3] (1,2)

Antonin Chambolle and Thomas Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 40(1):120–145, May 2011. URL: https://doi.org/10.1007/s10851-010-0251-1, doi:10.1007/s10851-010-0251-1.

[4]

Patrick L. Combettes and Valérie R. Wajs. Signal recovery by proximal forward-backward splitting. Multiscale Modeling & Simulation, 4(4):1168–1200, 2005. URL: https://doi.org/10.1137/050626090, arXiv:https://doi.org/10.1137/050626090, doi:10.1137/050626090.

[5]

Ernie Esser, Xiaoqun Zhang, and Tony F. Chan. A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science. SIAM Journal on Imaging Sciences, 3(4):1015–1046, 2010. URL: https://doi.org/10.1137/09076934X, arXiv:https://doi.org/10.1137/09076934X, doi:10.1137/09076934X.

[6]

Osman Güler. New proximal point algorithms for convex minimization. SIAM Journal on Optimization, 2(4):649–664, 1992.

[7]

J. S. Jörgensen, E. Ametova, G. Burca, G. Fardell, E. Papoutsellis, E. Pasca, K. Thielemans, M. Turner, R. Warr, W. R. B. Lionheart, and P. J. Withers. Core imaging library - part i: a versatile python framework for tomographic imaging. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 379(2204):20200192, 2021. URL: https://royalsocietypublishing.org/doi/abs/10.1098/rsta.2020.0192, arXiv:https://royalsocietypublishing.org/doi/pdf/10.1098/rsta.2020.0192, doi:10.1098/rsta.2020.0192.

[8]

Avinash C. Kak and Malcolm Slaney. Principles of Computerized Tomographic Imaging. Society for Industrial and Applied Mathematics, January 2001. URL: https://doi.org/10.1137/1.9780898719277, doi:10.1137/1.9780898719277.

[9]

Yurii Nesterov. Introductory lectures on convex optimization: A basic course. Volume 87. Springer Science & Business Media, 2003.

[10]

Evangelos Papoutsellis, Evelina Ametova, Claire Delplancke, Gemma Fardell, Jakob S. Jörgensen, Edoardo Pasca, Martin Turner, Ryan Warr, William R. B. Lionheart, and Philip J. Withers. Core imaging library - part ii: multichannel reconstruction for dynamic and spectral tomography. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 379(2204):20200193, 2021. URL: https://royalsocietypublishing.org/doi/abs/10.1098/rsta.2020.0193, arXiv:https://royalsocietypublishing.org/doi/pdf/10.1098/rsta.2020.0193, doi:10.1098/rsta.2020.0193.

[11]

Julian Rasch and Antonin Chambolle. Inexact first-order primal–dual algorithms. Computational Optimization and Applications, 76(2):381–430, Jun 2020. URL: https://doi.org/10.1007/s10589-020-00186-y, doi:10.1007/s10589-020-00186-y.

[12]

Mingqiang Zhu, Stephen J. Wright, and Tony F. Chan. Duality-based algorithms for total-variation-regularized image restoration. Computational Optimization and Applications, 47(3):377–400, Nov 2010. URL: https://doi.org/10.1007/s10589-008-9225-2, doi:10.1007/s10589-008-9225-2.

Table of Contents

Optimisation framework#

Algorithms (Deterministic)#

Base class#

GD#

CGLS#

SIRT#

ISTA/PGD#

FISTA#

APGD#

PDHG#

LADMM#

PD3O#

Algorithms (Stochastic)#

SPDHG#

Approximate gradient methods#

Stochastic Gradient Descent Example#

Note#

Memory requirements#

Operators#

Operator base classes#

Trivial operators#

GradientOperator#

Parameters:#

Parameters:#

WaveletOperator#

**kwargs#

Functions#

Base classes#

Simple functions#

Composition of operator and a function#

Indicator box#

Example:#

Example:#

KullbackLeibler#

L1 Norm#

L2 Norm Squared#

Least Squares#

L1 Sparsity#

Mixed L21 norm#

Smooth Mixed L21 norm#

Mixed L11 norm#

Total variation#

Function of Absolute Value#

Approximate Gradient base class#

Stochastic Gradient function#

SAG function#

SAGA function#

Stochastic Variance Reduced Gradient Function#

Loopless Stochastic Variance Reduced Gradient Function#

Reference#

Utilities#

Samplers#

Reference#

Callbacks#

Step size methods#

Reference#

Reference#

Preconditioners#

Block Framework#

BlockDataContainer#

Block Function#

Block Operator#

References#