Pico serial bootloader

Serial bootloader for the Raspberry Pi Pico (RP2040)

My Pi Wars 2022 entry, M0o+, is powered by a Raspberry Pi Pico, which itself is powered by the RP2040 chip.

The RP2040 has a built-in boot ROM with a USB bootloader, which allows the chip to show up as a USB flash drive for uploading code. This is usually a very convenient interface, making it extremely quick and simple to upload code to the device.

However, in a Pi Wars robot, I don’t want to have to bring the Pico back to my computer and plug it in to iterate on the code - I see it as essential to be able to download new code to the robot wirelessly without needing to touch it.

I knew I’d be using an ESP32 to provide Bluetooth connectivity, and so decided to write a simple serial bootloader which allows code upload to the Pico via a UART, and then use the ESP32 as a WiFi-to-UART bridge to upload code wirelessly.

What’s a bootloader?

A bootloader is a program whose purpose is to load (and run) other programs. They run as one of the first things on boot and load the code to run.

Bootloaders are often described in terms of stages. Typically the “first stage bootloader” is difficult or impossible to change (for example, baked into non-modifiable ROM), and is therefore kept very simple to minimise the potential for bugs.

The job of the first stage is just to start the next stage. The next stage might be the program itself, or it might be another (“second stage”) bootloader, which is more complicated/capable and (crucially), modifiable.

On a “full size” Raspberry Pi, there’s a “first stage” bootloader baked into some ROM in the chip. It has one job: find the second stage bootloader (bootcode.bin) on the SD card1 and run it. The second stage bootloader has a lot more capability, it provides functionality like USB and network boot, as well as understanding config.txt. Its job is to load the next stage - usually the Linux kernel - and boot that. Because the second stage is on the SD card, it can be modified, fixed and updated easily, and this has been used throughout the Raspberry Pi’s life to add functionality even to devices with “old” chips.

Pico boot sequence

As mentioned above, the Pico also has a first stage bootloader baked into ROM2. This is always the very first thing that runs when the RP2040 starts up, and after the physical chip is manufactured, it can never be changed.

The built-in bootloader has a fixed sequence, described in full in the datasheet. In summary, it checks if the BOOTSEL button is pressed, and if so, enters the USB mass storage mode for code upload. If the BOOTSEL button is not pressed, and it looks like the flash contains a valid program, then it starts executing the “program” from flash.

The Pico SDK actually puts a second stage bootloader at the start of every program (called boot2), which we’ll discuss further below, so when the first stage starts your program, what it’s actually doing is starting boot2, and then that will start the program.

My bootloader will act like a “third stage” bootloader, executing after boot2 and allowing code to be uploaded over UART, before finally executing the actual program.

RP2040 Address Map

Second stage bootloader: boot2

Unlike more traditional microcontrollers, the RP2040 uses an external flash chip to store program code. This has become more common in recent years, with notable other examples being the ESP8266 and ESP32 chips from Espressif.

The RP2040 datasheet says:

RP2040 is a stateless device, with support for cached execute-in-place from external QSPI memory. This design decision allows you to choose the appropriate density of non-volatile storage for your application, and to benefit from the low pricing of commodity Flash parts.

This presumably helps keep the size (and therefore price) of the RP2040 chip down and allows a single chip variant to serve different applications (through different sizes of flash chip). I suspect that there are also technical reasons not to put the flash on the same silicon die as the logic, which may factor in to the decision not to put any flash in the RP2040.

There is a standard3 for SPI flash chips, which makes it possible to read data from almost any flash chip using the same protocol. This protocol is baked in to the Pico bootrom, allowing it to read the second stage boot2 bootloader from flash, without needing to know what specific brand or size of flash is connected.

However, different flash manufacturers and products have different protocols for configuring their chips to provide the best performance for reading and executing code. boot2’s job is to know which specific flash chip is connected, and exactly how to properly configure it for high speed, efficient, code access.

It wouldn’t be sensible to put this chip-specific flash setup code into the hard-coded, non-modifiable boot ROM on the chip, because then the variants of flash chip that can be supported by RP2040 would be set in stone.

Instead, when you build a program in the Pico SDK, it selects an appropriate second stage bootloader based on what kind of board you are building for. There are several versions of boot2 for different flash chips, and each one is exactly 256 bytes of code which is put right at the start of the eventual program binary. This code configures the flash chip using the appropriate commands for that specific chip, and then runs the main program which directly follows boot2 in flash.

Bootloader implementation

Code: https://github.com/usedbytes/rp2040-serial-bootloader

I’ve prioritised simple code and ease of development over minimising the size of my bootloader, and as a result it takes up nearly 12 kB of flash!

The flash can only be erased in 4kB chunks, so the first 12 kB (3 chunks) of flash are used for boot2 and my bootloader, followed by a 4 kB page (1 chunk) which stores an image header describing the program, and then the program can be written to any other area of flash (for example starting at 16 kB in the diagram below).

Flash addresses

It is absolutely possible to write a serial bootloader for the Pico in a fraction of this size - for example I found this example (rhulme/pico-flashloader) which looks like it was being written about the same time I was writing mine, and fits in < 4 kB; or this one (dwelch67/raspberrypi-pico/bootloader10) which is very minimal indeed.

When I started out, as is often the case, I couldn’t find anything which met my requirements, so I invented my own. The code is available on GitHub.

Mine implements a simple command state machine, with a set of commands to enable erasing, writing, reading and verifying the flash. The Pico SDK provides helper functions for flash erasure and programming, making that part straightforward.

Similar to the bootrom first stage bootloader, mine will enter programming mode if a button is pressed, if some special values are present in the watchdog registers, or if there is no valid program already loaded.

The actual program can be written anywhere in flash (as long as it’s not in the first 16 kB), and an image header is stored which describes the start address and size of the program, and a CRC checksum of its data for validation. This image header is what the code uses to determine if there’s a valid program to be run.

The command set is relatively simple:

CMD_SYNC

The bootloader is using a UART, so there’s no synchronisation like SPI chip-select or i2c start conditions, so there needs to be a way for the “host” and the Pico to synchronise.

The CMD_SYNC command exists for this purpose. The idea is that the host just keeps sending the 4 bytes SYNC over and over until the bootloader replies: PICO.

Whenever the bootloader encounters an error state, it goes back to “sync” mode, and it won’t do anything until the host re-synchronises.

CMD_READ

This isn’t actually used for anything in my eventual set-up, but it was the first command I implemented because it’s simple.

CMD_READ has two arguments: An address and a size. It will simply return the data from that address.

The total transfer size is limited to 1024 bytes, so you can only read up to 1 kB at a time.

CMD_CSUM

CMD_CSUM takes two arguments: address and size, and computes a checksum. It uses the RP2040’s DMA engine “sniffer” to add up all of the values of all of the bytes in the provided range, and returns the result (modulo 2^32).

Initially I couldn’t get the sniffer CRC to match a “reference” implementation, so I implemented this as a stop-gap. That issue turned out to be my own mistake.

This is/was used to check the integrity of data transferred.

CMD_CRC

Just like CMD_CSUM, but instead of a checksum, it computes a CRC. Still using the DMA sniffer. It calculates values matching the IEEE802.3 CRC algorithm, which seems widely supported (e.g. the crc32 command-line utility, the go hash/crc32 library).

CMD_ERASE

Erases a page of flash. Takes an address (which must be in the “XIP” region) and a size to erase. The address and size must be aligned to the flash “sector size”, which is 4 kB.

It doesn’t let you wipe the bootloader itself.

CMD_WRITE

This is the real business function. It writes data to flash, given an address (which must be in the “XIP” region), size and data. The address and data length must be aligned to the flash page size - 256 bytes. The maximum data length is 1 kB.

After writing the data, this calculates the CRC of the written data and returns it, so that the host can determine if everything was written correctly.

CMD_SEAL

This is how the host indicates that it’s done writing a program and sets the program header.

The host provides:

  • The program start address
  • The program length
  • The expected CRC of the program

The implementation will calculate the CRC of the specified program range, and if successful, store the settings as the program header.

CMD_GO

CMD_GO just jumps to an address provided by the host. It has one argument: the address. It performs no validation, just resets a bunch of peripherals, sets the VTOR to the provided address, and jumps!

CMD_INFO

CMD_INFO lets the host query the parameters it needs to know to be able to use the bootloader:

  • Flash (XIP) start address
  • Flash size
  • Erase alignment (4 kB)
  • Write alignment (256 btyes)
  • Max data length (1024 bytes) (not including command opcode and arguments)

CMD_REBOOT

This triggers a reboot. It takes one argument - if the argument is non-zero then it sets the watchdog registers to stay in the bootloader, instead of starting the user application (if there is one).

Host side

On the “host” side, I’ve written some go code to communicate with the bootloader, including a simple command line application allowing writing an .elf or .bin file over a serial port.

There’s not too much to say about this. The application is used like so:

./serial-flash /dev/ttyUSB0 firmware.elf

…but really, I’m just using the library functions it implements and embedding the functionality in my UI application (a future post!).

Building programs to work with the bootloader

As described above, the Pico SDK by default places boot2 at the start of the program binary, and will build the program so that it expects to run from the start of flash.

To make programs compatible with my bootloader, we need to skip adding boot2 (not strictly necessary) and set them up so that they can be written to a section of flash after my bootloader (starting at least 16 kB from the start of flash).

This is easy to do with a custom linker script, which differs from the default one in two ways:

  1. Modify the FLASH configuration to set the start address to 16 kB, and the size to (2 MB - 16 kB)
  2. Don’t add boot2
diff --git a/pico-sdk/src/rp2_common/pico_standard_link/memmap_default.ld b/blink_noboot2.ld
index 07d5812..448f834 100644
--- a/pico-sdk/src/rp2_common/pico_standard_link/memmap_default.ld
+++ b/blink_noboot2.ld
@@ -21,9 +21,10 @@
     __stack (== StackTop)
 */

+/* Skip 16kB at the start of flash, that's where our bootloader is */
 MEMORY
 {
-    FLASH(rx) : ORIGIN = 0x10000000, LENGTH = 2048k
+    FLASH(rx) : ORIGIN = 0x10000000 + 16k, LENGTH = 2048k - 16k
     RAM(rwx) : ORIGIN =  0x20000000, LENGTH = 256k
     SCRATCH_X(rwx) : ORIGIN = 0x20040000, LENGTH = 4k
     SCRATCH_Y(rwx) : ORIGIN = 0x20041000, LENGTH = 4k
@@ -33,30 +34,11 @@ ENTRY(_entry_point)

 SECTIONS
 {
-    /* Second stage bootloader is prepended to the image. It must be 256 bytes big
-       and checksummed. It is usually built by the boot_stage2 target
-       in the Raspberry Pi Pico SDK
-    */
-
     .flash_begin : {
         __flash_binary_start = .;
     } > FLASH

-    .boot2 : {
-        __boot2_start__ = .;
-        KEEP (*(.boot2))
-        __boot2_end__ = .;
-    } > FLASH
-
-    ASSERT(__boot2_end__ - __boot2_start__ == 256,
-        "ERROR: Pico second stage bootloader must be 256 bytes in size")
-
-    /* The second stage will always enter the image at the start of .text.
-       The debugger will use the ELF entry point, which is the _entry_point
-       symbol if present, otherwise defaults to start of .text.
-       This can be used to transfer control back to the bootrom on debugger
-       launches only, to perform proper flash setup.
-    */
+    /* boot2 would go here, but we don't want it */

     .text : {
         __logical_binary_start = .;

You can tell the SDK to use this custom linker script with the pico_set_linker_script command in CMakeLists.txt:

pico_set_linker_script(blink_noboot2 ${CMAKE_CURRENT_SOURCE_DIR}/blink_noboot2.ld)

I’ve put a simple “blinky” example which is compatible with my bootloader on GitHub here: https://github.com/usedbytes/pico-blink-noboot2

Next…

So this all allows me to upload code over UART. The next step is to hook the UART up to an ESP32, allowing code upload over WiFi, which will be the next post.


  1. The Raspberry Pi 4 is slightly different, having the second stage bootloader in an EEPROM instead of on the SD card. ↩︎

  2. Note that the ROM contains other code in addition to the first stage bootloader. Things like optimised mathematical routines and flash programming functions. ↩︎

  3. I can’t find a decent reference for what that standard actually is, but it’s referred to as “Standard SPI” in various places, and everyone seems to have agreed on how it works. ↩︎