Multisource Objective Function

Due to frequency division, only a subset of the spectrum will be covered at each source at each iteration, and so ringy migration artifacts are expected. An effective method to reduce migration artifacts (Nemeth et al., 1999; Duquet et al., 2000) is lsm, which works by iteratively updating a trial model in order to minimize a data misfit function. A widely adopted misfit function is the the norm squared of data error. In addition, regularization with Cauchy norm (Wang and Sacchi, 2007; Sacchi, 1997; Amundsen, 1991) is used in this chapter. In the Bayesian framework (Aster et al., 2005; Debski, 2010), the regularization corresponds to a negative logarithm of the a priori distribution of the model. The choice of Cauchy distribution is meant to capture the sparse nature of typical reflectivity models. Following the Bayesian approach, I write the regularization as

$\displaystyle R({\bf {m}})$	$\displaystyle = -\ln p_{c}({\bf {m}}) = -\ln \left[ \prod_i \frac{c}{\pi (c^2 + m_i^2)} \right]$	(2.33)
	$\displaystyle = \sum_{i} \ln \left( c^2 + m_i^2 \right) + \textrm{~constants},$	(2.34)

where $p_{c}({\bf {m}})$ is a 0-median Cauchy distribution with parameter

; and I write the misfit function as

$\displaystyle e({\bf {m}})$

$\displaystyle = -\ln g_{\sigma^2}(\widetilde{\mathfrak{d}}\vert{\bf {m}}) = \fr... ...hfrak{d}}- \widetilde{\mathfrak{L}}{\bf {m}}\vert\vert^2 + \textrm{~constants},$

(2.35)

where $g_{\sigma^2}(\cdot)$ is 0-mean Gaussian distribution with variance $\sigma^2$ . The probabilistic formulations allow us to determine the parameters

and $\sigma^2$ by maximum likelihood estimation (MLE). In equations 2.34 and 2.35 the constants are independent of ${\bf {m}}$ . In equation 2.35, $\widetilde{\mathfrak{d}}\in \mathbb{C}^{n_{htot} n_\omega M_{ga}}$ and $\widetilde{\mathfrak{L}}\in \mathbb{C}^{n_{htot} n_\omega M_{ga}\times M}$ are respectively formed by concatenating $\tilde{{\bf {d}}}^{(\gamma,j)}$ and $\tilde{{\bf {L}}}^{(\gamma,j)}$ along the column dimension in dictionary order of $(\gamma,j)$ , where $\gamma=1,\dots,M_{ga}$ is the supergather index, with Mga being the number of supergathers, and $j=1,\ldots,n_\omega$ is the frequency index. Here, the descriptor $(\gamma,j)$ explicates the fact that $\tilde{{\bf {d}}}$ and $\tilde{{\bf {L}}}$ as defined in equations 2.6 and 2.8, respectively, are specific to a particular supergather and frequency. Note that in the case of marine streamer acquisition, the first dimension of $\tilde{{\bf {d}}}$ and $\tilde{{\bf {L}}}$ is extended from

to $n_{htot}$ . In contrast, in the standard approach of a single shot gather, the counterparts of $\widetilde{\mathfrak{d}}$ and $\widetilde{\mathfrak{L}}$ would be of sizes $\mathbb{C}^{n_h n_\omega S_{tot}}$ and $\mathbb{C}^{n_h n_\omega S_{tot}\times M}$ , respectively, where $\gls{Stot} = \gls{S} \gls{Mga}$ is the total number of sources.

The objective function is then constructed as

$\displaystyle J({\bf {m}})$

$\displaystyle = \sigma^2 (e({\bf {m}}) + R({\bf {m}})) = \frac{1}{2} \vert\vert... ...ak{L}}{\bf {m}}\vert\vert^2 + \sigma^2 \sum_{i} \ln \left( c^2 + m_i^2 \right),$

(2.36)

where additive constants have been dropped. Its negative gradient is given as

$\displaystyle {\bf {g}}\stackrel{\mathrm{def}}{=}-\nabla_{{\bf {m}}}J({\bf {m}})$	$\displaystyle = \widetilde{\mathfrak{L}}^\dagger(\widetilde{\mathfrak{d}}- \widetilde{\mathfrak{L}}{\bf {m}}) - 2\sigma^2 \sum_{i} Q(m_i) m_i,$	(2.37)
$\displaystyle \noalign{where} Q(m_i)$	$\displaystyle = \frac{1}{c^2 + m_i^2}.$	(2.38)

Note that the shape of the objective function $J({\bf {m}})$ typically changes over iteration step

because every iteration typically requires a new pass of frequency selection encodings for the Mga supergathers to generate $\widetilde{\mathfrak{d}}$ and to effect $\widetilde{\mathfrak{L}}$ . That the objective function depends on

is a topic that is studied in stochastic optimization (Spall, 2003). Our problem (albeit of much larger size) is similar to the `stochastic bowl' studied by Schraudolph and Graepel (2002), because as shown in Appendix A the Hessian of the misfit function pertaining to frequency selection encoded supergathers consists of terms sampled from the standard full Hessian.

As frequency selection encoding could significantly alter the Hessian, the conjugacy condition of cg cannot be maintained if supergathers are formed with a new frequency selection encoding at each iteration, a strategy known as `dynamic encoding'. On one hand, in order to accelerate convergence, and on the other, in order to reduce I/O cost, I adopt a strategy of a hybrid CG (termed `CG within mimi-batch' in Schraudolph and Graepel, 2002), whereby supergathers are encoded anew every $K_{CGit}$ iterations. $K_{CGit}=3$ is chosen in this study. Given fixed supergathers and a fixed defined in equation 2.38, $K_{CGit}$ iterations are carried out by a CG scheme (outlined in Algorithm 1 in Appendix C). Then supergathers are randomly encoded again, 's are updated, which is known as the `Iterative Reweighted Least-Squares' method (Scales et al., 1988), the parameters and $\sigma^2$ of the probability distributions are re-estimated through MLE, and the search direction of CG is reset to negative gradient.

Yunsong Huang 2013-09-22