LDPC Edge Distribution Optimization II: Global Optimization
Published:
Introduction
Global optimization approach can also be used in finding good edge distributions with desired coderate. Ref [1] used differential evolution (DE) to find optimal edge distribution under erasure channel, Ref [2] mentioned that DE can be used to design edge distribution under AWGN channel. There are also other papers that utilize DE to desige edge distributions for NB-LDPC code (NB-LDPC code will be introduced in the future). In this post, we first introduce differential evolution[3], then we will discuss how to implement DE to design LDPC code.
P.S. There are also somthing that I couldn’t fully understand, I will post my opinions and thoughts, you are very welcome to discuss with me via my email.
Differential Evolution
Differential evolution (DE) is a simple but robust and fast-converge global optimization approach. One advantage of DE is that it can be implemented on non-linear, not differentiable circumstances. Suppose we want to minimize a cost function $L(w)$, where $w$ is a $D$-dimensional vector with continuous vector space. In each iteration, we have $M$ population vectors. You can think these population vectors as candidates minimizing $L(w)$. In each iteration, a new set of population vectors will be derived from the previous ones. More specifically, in each iteration, $M$ parameter vectors are calculated by adding weighted difference between two population vecotors and a third vector. That’s where differential comes from. This step is called mutation. Next step is crossover the elements of $M$ parameter vectors will be mixed with populated vectors and generate trial vectors. Now, each trial vector will be compared with population vectors with same index and replace the population vector if it has smaller cost function. This step is called selection. Now let’s discuss these three steps in more details.
Mutation
Let’s say in iteration $G$, the $M$ population vectors are denoted by :
$\begin{align} x_ {i,G}, \quad i=1,…,M\quad ,G=1,…,N_T \end{align}$
Where $N_T$ is predefined maximum iteration time. We denote $j^{th}$ element in $x_ {i,G}$ by $x_ {ji,G}$. $x_ {0,G}$ can be generated in different ways, Ref [3] says:
In case a preliminary solution is available, the initial population might be generated by adding normally distributed random deviations to the nominal solution $x_ {norm,0}$.
Denote parameter vecotors in $G+1$ iteration by $v_ {i,G+1}$, $i=1,…,M$. Parameter vectors can be computed by:
$\begin{align} v_ {i,G+1} = x_ {r_ 1,G} + F\cdot(x_ {r_ 2,G}-x_ {r_ 3, G}) \end{align}$
$r_ 1$, $r_ 2$ and $r_ 3$ are mutuallly different and randomly chosen from \(\{1,...,M\}/\{i\}\). Therefore, in order to implement mutation, $M\geq 4$. Finally, $F$ is a real constant and $F\in[0,2]$
Crossover
In iteration $G+1$, crossover generates $M$ trial vectors $u_ {i,G+1}$, \(i\in\{1,2,...,M\}\). The reason why this step is called crossover is that $u_ {ji, G+1}$ is either $v_ {ji,G+1}$ or $x_ {ji, G}$. Yes, a trial vector is a hybrid of a parameter vector and population vector. $u_ {ji,G+1}$ can be generated by:
$\begin{align} u_ {ji,G+1} =
\begin{cases} v_ {ji, G+1} & \text{if $randb(j)$ $\leq$ CR or $j=rnbr(i)$} \newline x_{ji, G} & \text{if $randb(j)$ $>$ CR and $j\neq rnbr(i)$ } \newline \end{cases}
\end{align}$
$randb(j)$ generate a random number ranging 0 to 1. CR is called “crossover constant” which is a constant number from 0 to 1 and defined by the user. Finally, $rnber(i)$ is a random number between 1, 2, … $D$. It ensures that at least one element of a trial vector comes from the parameter vector.
Selection
Now in the iteration $G+1$, we have generated $M$ trial vectors, we will evaluate these vectors and generate population vectors $x_ {i,G+1}$ by:
$\begin{align} x_ {i,G+1} =
\begin{cases} u_ {i, G+1} & \text{if } L(u_ {i, G+1})< L(x_ {i, G}) \newline x_{i, G} & \text{otherwise} \newline \end{cases}
\end{align}$
Variant : $DE/x/y/z$
There are variants of DE states above, and a specific DE can be represented by $DE x/y/z$, these three parameters have the following meanings:
$x$: tells how to choose $r_ 1$ in mutation step. “rand”: $r_1$ is chosen randomly; “best”: $r_1=\arg\min_iL(x_ {i,G})$.
$y$: is the number of diffence vector used. In the previous example, $y=1$. If $y=2$: $\begin{align} v_ {i,G+1} = x_ {r_ 1,G} + F(x_ {r_ 2,G}+x_ {r_ 3,G}-x_ {r_ 4,G}-x_ {r_ 5,G}) \end{align}$
$z$: specifies the crossover scheme, the privious section uses “bin”.
The DE introduced in the previous section can be expressed by $DE/\text{rand}/1/\text{bin}$.
DE for design edge distrbuion
It is straightforward to fit DE to designing edge distribution:
If we want to solely optimize $\lambda(x)$, we form the $\lambda_ i$ into a vector. After we finished optimizing $\lambda(x)$ then we use similar way to optimize $\rho(x)$. This process will be repeated untill the result (e.g. threshold) is coverged.
Also, we can jointly optimize $\lambda(x)$ and $\rho(x)$, we just put all free coefficients into a single vector and we optimize this single vector using DE.
The word free coefficients means the coeffients that we can play with — it also means that we would like to freeze some $\lambda_i$ and $\rho_i$ as 0. As Ref [1] and [2] mentioned:
For variable node edge distribution, we chose the degrees 2, 3, a highest degree and one degree in between. For check node edge distribution, we chose two consecutive degrees (such as 7 and 8, or 8 and 9).
However, a valid variable node and check node degree have the following constraints:
$\begin{align} \sum_ i \lambda_ i &= \sum_ i \rho_ i =1, \newline \sum_i \frac{1}{i} \lambda_ i &= \sum_ i \frac{1}{i}\rho_ i = \text{code rate}, \end{align}$
The first constraint makes sure a valid probability distribution and the second constraint confines the core rate we desired for. Hence, personally, when I am trying to implement DE, there are two problems I need to tackle:
How to generate valid initial poluation vectors ?
How to generate valid trial vectors in each iteration ?
For the first question, Ref [1] gave a solution:
For doing this, note first that the conditions relating the coefficients of $\lambda(x)$ and $\rho(x)$ force the free coefficients of these polynomials to lie in a finite polytope. Our first task is then to choose random elements from this polytope. To achieve this, we implemented a different strategy, known as the “Queen’s move”:we started with some point inside the polytope constructed deterministically, and repeated the following procedure between 50 and 100 times: we randomly selected a line through the point, and randomly selected a point on that line inside the polytope. This gave us one population member. For the next members, we repeated the whole procedure again, until all the (initial) population members were generated.
However, it is still vague to me. Besides I don’t think the points are lied in polytope but a hyperplane $\mathcal{G}$, determined by equation (6), (7) and some basic constriants such as non-negativity of each element (However, I think it’s me who didn’t clearly get the point of author, because this paper is an ISIT paper and citation is high). So I would have the following way to construct initial vectors, for each vector $x_{i,0}$:
- $x_{ji,0}\sim Uni[0,1)$
- find a point \(x_ {i,0}^*\) on the hyperplane, which has minimum distance with $x_ {i,0}$ under some distance metric: $\begin{align} x^*_ {i,0} = \arg\min_{y\in \mathcal{G}}d(y,x_{i,0}) \end{align}$
Similarly, for the second question, we can use equation (8) to solve.
However, I haven’t implemented this program, so there maybe more practical problems than I thought. I will update it once I have progress on this.
s
sss
Reference
- Shokrollahi, A., and R. Storn. 2000. “Design of Efficient Erasure Codes with Differential Evolution.” In 2000 IEEE International Symposium on Information Theory (Cat. No.00CH37060), 5 –.
- Richardson, T. J., M. A. Shokrollahi, and R. L. Urbanke. 2001. “Design of Capacity-Approaching Irregular Low-Density Parity-Check Codes.” IEEE Transactions on Information Theory / Professional Technical Group on Information Theory 47 (2): 619–37.
- Storn, Rainer, and Kenneth Price. 1997. “Differential Evolution – A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces.” Journal of Global Optimization 11 (4): 341–59.