LDPC Edge Distribution Optimization I: Local Optimization
Published:
One can locally or globally optimize LDPC edge distribution. This post briefly discuss how to locally optimize edge distribution following Richardson’s work.
Notations
We want to optimize variable node edge distribution \(\lambda(x)=\sum_{i=2}^{d_v}\lambda_i x^{i-1}\) and check node edge distribution \(\rho(x)\) such that threhold is maximized under specific channels.
Threshold is defined as the supremum of all channel parameters for which the probability of error under density evolution (DE)converges to zero.
In optimization, we set a finite maximum iteration \(m\) and a target error probability \(\epsilon\), therefore the threshold computed is a lower bound of true threshold.
As pointed out in the paper, the implementation of DE inevitably involves quantization. This makes quantization error which accumlates in each iteration and fianlly makes the computation invalid. However,
By carefully performing the quantization, one can ensure that the quantized density evolution corresponds to the exact density evolution of a quantized message passing algorithm.
Hill-Climbing Approach
This section talks about the implementation of local optimization. Richard’s paper introcuced several local optimization methods, here we show two. One uses gradiant descent approach and the other one uses linear programming. \(\lambda(x)\) will be used as an example to illustrate optimization.
Gradient Descent Approach
Let’s define \(L(\lambda)\) is the number of iterations to decrease error probability to \(\epsilon\) under \(\lambda(x)\). If \(L(\lambda)\) is continuous w.r.t. \(\lambda\), then we only need to find \(h=-\frac{dL}{d\lambda}\) and take \(\tilde{\lambda}=\lambda+\eta\).
Denote \(\{p_l\}_{l=0}^m\) be a sequence of error probabilities as a result of density evolution. \(p_0\) is raw probability and \(p_l\) is the error probability after \(l^{th}\) iteration. Especially, we have \(p_m\leq\epsilon\leq p_{m-1}\). (The \(\epsilon\) here is some error probability that between $m-1$ and $m$ iteration, we can also see this optimization problem as minimize \(\epsilon\) to our target. There may have a notation abuse, but I would still use \(\epsilon\) here.)
Let define \(A_{l,j}\), \(1\leq l \leq m\), \(2\leq j \leq d_v\). \(A_{l,j}\) is the error probability after \(l^{th}\) iteration and suppose in this iteration the variable node has only degree \(j\) (In the previous iteration, the edge degree is still \(\lambda(x)\)). \(A_{l,j}\) and \(\{p_l\}_{l=0}^m\) have the following relationship:
\[\begin{aligned} p_l=\sum_{j=2}^{d_v}A_{l,j}\lambda_j \end{aligned}\]\(\frac{d}{d\lambda_j}L(\lambda)\) is given by:
\[\begin{aligned} \frac{d}{d\lambda_j}L(\lambda)=\sum_{l=1}^{m}\frac{A_{l,j}-p_l}{p_{l-1}-p_l} \end{aligned}\]Then, we can use gradiant descent to get \(\tilde{\lambda}\).
However, we still need to make sure that \(\tilde{\lambda}\) is a valid distribution and has the desired code rate, hence there are two constraints :
\[\begin{aligned} \text{valid distribution} \sum_j\tilde{\lambda}_j =1 \end{aligned}\] \[\begin{aligned} \text{invariant rate} \sum_j\frac{\tilde{\lambda}_j}{j} = \sum_j\frac{\lambda_j}{j}\end{aligned}\]To satisfy these two constraints, after finding \(\tilde{\lambda}\), we project it onto a vector that is closest to \(\tilde{\lambda}\) in Euclidean distance:
The first projection is an orthogonal projection projecting \(\tilde{\lambda}\) onto a subspace determined by \(\sum_j h_j=0\) and \(\sum_j\frac{1}{j}h_j=0\);
The section projection is letting \(\eta h_j = - \lambda_j\) if , prior to the projection, \(\eta h_j+\lambda_j <0\).
(Computation suggestion : One can first find the projected \(\tilde{\lambda}\)) and then adjust until second projection is not satified). But Richardson in his paper said:
One can then compute the maximum step size \(\eta\) for which the constraints remain satisfied and then recompute the projection at that point.
However, I didn’t fully understand this sentance and I think if compute \(\eta\) this way, then once you compute the projection, there maybe terms less than 0, then you need to compute the projection, then the first projection will not be satisfied .
Linear Programming Approach
Suppose new distribution is \(\tilde{\lambda}_j\), we use old \(A\) to approximate error probability sequences \(\{\tilde{p}_l\}\):
\[\begin{align} \tilde{p}_l := \sum_{j=2}^{d_v}A_{l,j}\tilde{\lambda}_j \end{align}\]Then,
\[\begin{align} L(\lambda) \simeq \sum_{l=1}^m\frac{\tilde{p}_l-p_l}{\tilde{p}_{l-1}-p_{l-1}} \end{align}\]To guarantee this approximation is valid, \(\lambda\) doesn’t differ from \(\tilde{\lambda}\) too much, which induce the following to constraints:
$\begin{align} \max_l \frac{\mid p_l-\tilde{p}_ l \mid }{p_{l-1}-p_l}&<\delta,\quad \delta \ll 1 \newline \tilde{p}_l < p _{l-1}, 1&\leq l\leq m \end{align}$
Therefore, linear programming problem is :
$\begin{align} & \min_{\tilde{\lambda}} \sum_{l=1}^m \frac{ \tilde{p}_ l-p_ l}{p_ {l-1}-p_ l}\newline s.t. & \max_l \frac{\mid p_l-\tilde{p}_ l \mid }{p_{l-1}-p_l}<\delta,\quad \delta \ll 1 \newline & \tilde{p}_ l < p_ {l-1}, 1\leq l\leq m \newline & \sum_ j\frac{\tilde{\lambda}_ j}{j} = \sum_ j\frac{\lambda_ j}{j} \newline & \sum_ j\tilde{\lambda}_ j =1 \newline \end{align}$
These LP will be repeatedly processed until a good edge distribution is found.