$$
  \ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad
l_n = \left( x_n - y_n \right)^2,
$$
where \(N\) is the batch size. If reduction is not 'none'
(default 'mean'), then:
$$
  \ell(x, y) =
  \begin{array}{ll}
\mbox{mean}(L), &  \mbox{if reduction} = \mbox{'mean';}\\
\mbox{sum}(L),  &  \mbox{if reduction} = \mbox{'sum'.}
\end{array}
$$
\(x\) and \(y\) are tensors of arbitrary shapes with a total
of \(n\) elements each.
The mean operation still operates over all the elements, and divides by \(n\).
The division by \(n\) can be avoided if one sets reduction = 'sum'.