Outputting can be done by bitbanging, ie writing them clocked by the instruction clock of the processor.
The problem is that we won’t have much time left to do anything else, and while the main CPU is perfectly good at outputting bytes or halfwords, it really is much more powerful so all those cycles could be spent doing more useful things such as adding 4 bytes in parallel or running nice effects. It would be nice if we had a small bit of silicon on the MCU able to move data from memory to a peripheral (here GPIOs).
As a matter of fact, we do! It’s called a DMA for direct memory access. The stm32f4 has two of them.
The only thing we need to do is :
- generate with a clock-based interrupt line hsyncs at 31khz (see VGA generation posts and VGA timings references)
- for some of those lines, generate vsync
- for the actual lines,
- point the DMA to a part of memory, tell it the pace / width of output,
- let it run in the background
- fill another place of memory with the next line of pixels (or the whole screen)
- return from the interrupt ASAP letting the processor to interesting things in the foreground.
- In foreground, process user input, calculating the next frame or decompressing a nice purple tentacle from a PNG to RAM, ...