In the simplest case, the output value of the layer with input size \((N, C, H, W)\),
output \((N, C, H_{out}, W_{out})\) and kernel_size \((kH, kW)\)
can be precisely described as:
$$
\begin{array}{ll}
out(N_i, C_j, h, w) ={} & \max_{m=0, \ldots, kH-1} \max_{n=0, \ldots, kW-1} \\
& \mbox{input}(N_i, C_j, \mbox{stride[0]} \times h + m,
\mbox{stride[1]} \times w + n)
\end{array}
$$
If padding is non-zero, then the input is implicitly zero-padded on both sides
for padding number of points. dilation controls the spacing between the kernel points.
It is harder to describe, but this link has a nice visualization of what dilation does.
The parameters kernel_size, stride, padding, dilation can either be:
a single int -- in which case the same value is used for the height and width dimension
a tuple of two ints -- in which case, the first int is used for the height dimension,
and the second int for the width dimension