If you have a single channel image and are running a 3x3 conv on it, why is opting for 8 channels problematic?
Answer
8 channels are too many - effectively we go from 9 numbers describing a patch to 8, the decrease is too small (we are not doing any useful computation, all we are doing is reordering the numbers)
Relevant part of lecture
supplementary material
An interesting discussion on this subject arose on Twitter. Some of the highlights are: