A few tricks in probability and statistics (1): log-derivative trick
some tricks are well-known while some are less
How do you estimate the gradient of $\mathbb{E}[f(x)]$ when the parameter appears in the density, assuming that you can sample $x$:
\[\nabla_{\theta} \mathbb{E}_{ x \sim p_{\theta}(x) } [f(x)] = ?\]Following basic calculus,
\[\nabla_{\theta} \mathbb{E}_{ x \sim p_{\theta}(x) } [f(x)] = \int f(x) \nabla_{\theta} p_{\theta}(x) \, dx,\]but this would not lead us further. On the other hand, the log-derivative trick notes that
\begin{equation}\label{eqn:grad.log} \nabla_{\theta} \log p_{\theta}(x) = \frac{\nabla_{\theta} p_{\theta}(x)}{p_{\theta}(x)}. \end{equation}
Hence, substituting \eqref{eqn:grad.log} into the integral, we have
\[\int f(x) \nabla_{\theta} p_{\theta}(x) \, dx = \int f(x) \nabla_{\theta} \log p_{\theta}(x) \cdot p_{\theta}(x) \, dx.\]In other words,
\[\boxed{\nabla_{\theta} \mathbb{E}_{ x \sim p_{\theta}(x) } [f(x)] = \mathbb{E}_{ x \sim p_{\theta}(x) } [f(x) \nabla_{\theta} \log p_{\theta}(x)],}\]which allows us to take Monte Carlo approximation:
\[\nabla_{\theta} \mathbb{E}_{ x \sim p_{\theta}(x) } [f(x)] \approx \frac{1}{N} \sum_{i=1}^N f(x_i) \nabla_{\theta} \log p_{\theta}(x_i) \quad\text{where}\quad x_i \sim p_{\theta} \text{ for } i = 1, \ldots, N.\]Enjoy Reading This Article?
Here are some more articles you might like to read next: