A few tricks in probability and statistics (1): log-derivative trick

How do you estimate the gradient of $\mathbb{E}[f(x)]$ when the parameter appears in the density, assuming that you can sample $x$:

\[\nabla_{\theta} \mathbb{E}_{ x \sim p_{\theta}(x) } [f(x)] = ?\]

Following basic calculus,

\[\nabla_{\theta} \mathbb{E}_{ x \sim p_{\theta}(x) } [f(x)] = \int f(x) \nabla_{\theta} p_{\theta}(x) \, dx,\]

but this would not lead us further. On the other hand, the log-derivative trick notes that

\begin{equation}\label{eqn:grad.log} \nabla_{\theta} \log p_{\theta}(x) = \frac{\nabla_{\theta} p_{\theta}(x)}{p_{\theta}(x)}. \end{equation}

Hence, substituting \eqref{eqn:grad.log} into the integral, we have

\[\int f(x) \nabla_{\theta} p_{\theta}(x) \, dx = \int f(x) \nabla_{\theta} \log p_{\theta}(x) \cdot p_{\theta}(x) \, dx.\]

In other words,

\[\boxed{\nabla_{\theta} \mathbb{E}_{ x \sim p_{\theta}(x) } [f(x)] = \mathbb{E}_{ x \sim p_{\theta}(x) } [f(x) \nabla_{\theta} \log p_{\theta}(x)],}\]

which allows us to take Monte Carlo approximation:

\[\nabla_{\theta} \mathbb{E}_{ x \sim p_{\theta}(x) } [f(x)] \approx \frac{1}{N} \sum_{i=1}^N f(x_i) \nabla_{\theta} \log p_{\theta}(x_i) \quad\text{where}\quad x_i \sim p_{\theta} \text{ for } i = 1, \ldots, N.\]

Enjoy Reading This Article?