In this post, I will explain how we can implement a very simple blink program using only ARM Assembly language.

Hardware

I will use STM32F407VG Discovery board for this example but you can use any other STM32 board as well by changing the STM32F407VG specific parts.

STM32F407VG Discovery Board

STM32F407VG is based on ARM Cortex-M4 core which is a 32-bit RISC architecture. It supports the ARMv7-M architecture and the Thumb-2 instruction set. Please note that Cortex-M4 only supports the Thumb instruction set, not the ARM instruction set. However, ARM architecture supports dynamic switching between ARM and Thumb instruction sets. Therefore, when we write assembly code for Cortex-M4, we need to specify that we are targeting the Thumb instruction set. This information is encoded in the Program Counter (PC) register’s least significant bit (LSB). If the LSB is set to 1, the processor is in Thumb mode; if it is set to 0, the processor is in ARM mode. Since Cortex-M4 only supports Thumb mode, the LSB of the PC register is always set to 1.

There are four fundamental documents that you need to refer to when writing assembly code for STM32F407VG Discovery board:

  1. ARMv7-M Architecture Reference Manual: This document describes the ARMv7-M architecture in detail. It includes all the information about the instruction set, registers, and exception handling.
  2. STM32F4 Reference Manual: This document describes the STM32F4 series microcontrollers in detail. It includes information about the memory map, peripheral registers, and clock configuration.
  3. STM32F407VG Datasheet: This document provides the specifications of the STM32F407VG microcontroller, including pin configuration and electrical characteristics.
  4. STM32F4 Discovery Schematic: This document provides the schematic of the STM32F4 Discovery board. It includes information about the connections between the microcontroller and other components on the board.

Prerequisites

We will need an ARM assembler to compile our assembly codes and a linker to link the object files. I will use arm-none-eabi-as and arm-none-eabi-ld from the GNU Arm Embedded Toolchain. In addition to the assembler and linker, we will also need arm-none-eabi-objcopy to convert the linked ELF file to a binary file. All of these tools are distributed together in the GNU Arm Embedded Toolchain. You can download it from here or you can install it via your package manager brew install --cask gcc-arm-embedded on macOS or sudo apt install gcc-arm-none-eabi on Ubuntu.

Also we will need a tool to flash the binary to the STM32 device. You can use STM32CubeProgrammer or any other flashing tool that you prefer. I will use st-flash command from the open source implementation of the ST-Link utility called stlink. You can download it from here or you can install it via your package manager brew install stlink on macOS or sudo apt install stlink-tools on Ubuntu.

Outline

Writing an embedded application is very much like writing a regular application. Each application can be divided into two main parts:

  1. The part that handles the logic of the application
  2. The part that handles the I/O operations

Writing the application logic is completely independent from which MCU you are using. It just depends on the instruction set architecture (ISA) of the MCU. Therefore, it is enough to understand the ARM ISA to write the application logic.

On the other hand, I/O operations are highly dependent on the MCU you are using. Each MCU has its own memory map and peripheral registers. Generally, MCUs use memory-mapped I/O, which means that peripherals are mapped to specific memory addresses. This means that, performing an I/O operation is just a matter of reading from or writing to specific memory addresses.

Therefore, if we we understand the ISA’s instructions and the memory map of the MCU, we can write an embedded application for a bare-metal system using only Assembly language (or any other low-level language). In this post, I will not explain neither the ARM ISA nor the STM32 memory map in detail. I will just give a brief overview of the necessary parts to get you started.

For STM32F407VG Discovery Board, the leds on the board are connected to GPIOD pins 12, 13, 14 and 15. To initialize the GPIOD peripheral we need to do the following steps:

  1. Enable the clock for GPIOD peripheral by writing 0x00000008 to the RCC_AHB1ENR register at address 0x40023830. (Please check RCC_AHB1ENR)
  2. Set the mode of pins 12, 13, 14 and 15 of GPIOD to output mode by writing 0x55000000 to the GPIOD_MODER register at address 0x40020C00. (Please check GPIOx_MODER)

To change the state of the leds, we need to do the following steps:

  1. To turn on the leds, we need to write 0x0000F000 to the GPIOD_ODR register at address 0x40020C14.
    (Please check GPIOx_ODR)
  2. To turn off the leds, we need to write 0x00000000 to the GPIOD_ODR register at address 0x40020C14. (Please check GPIOx_ODR)

Let’s implement these steps in assembly language.

Initialization of Peripherals

Before using our peripherals, we need to initialize them first. We can do this by writing the following assembly code:

.global init
.section .text

init:
  // Write 0x00000008 to 0x40023830
  mov r0, #0x08
  movw r1, #0x3830
  movt r1, #0x4002
  str r0, [r1, #0]

  // Write 0x55000000 to 0x40020C00
  mov r0, #0
  movt r0, #0x5500
  mov r1, #0
  movw r1, #0x0C00
  movt r1, #0x4002
  str r0, [r1, #0]
  bx lr

With the first assembler directive .global init, we are making the init label visible to the linker so that we can call it from other object files. The second assembler directive .section .text indicates that the following code should be placed in the .text section of the object file, which is the section that contains the executable code. Other parts of the code are straightforward and self-explanatory. We can compile this assembly code with the following command:

arm-none-eabi-as -m thumb init.s -o init.o

The -m thumb flag indicates that we are targeting the ARM Thumb instruction set. We need to indicate it because when we talk about ARM architecture, there are two instruction sets: ARM and Thumb. Thumb is a more compact instruction set that uses 16-bit instructions instead of 32-bit instructions. Our target ISA is Thumb because all Cortex-M processors only support the Thumb instruction set.

Blinking

Right now, we can write the main blinking logic. The main logic is very simple:

  1. Initialize the peripherals
  2. Turn on the leds
  3. Wait for some time
  4. Turn off the leds
  5. Wait for some time
  6. Repeat from step 2

Lets implement this logic in assembly:

 .global _start
 .section .text

 _start:
   bl init
 loop:
   bl led_on
   bl delay
   bl led_off
   bl delay
   b loop

 // Consume some cycles
 delay:
   movw r0, #0xFFFF
   movt r0, #0x0004
 delay_loop:
   sub r0, r0, #1
   cmp r0, #0
   bne delay_loop
   bx lr

 led_on:
   // Write 0x0000F000 to GPIOD_ODR (0x40020C14)
   mov r0, #0
   movw r0, #0xF000
   mov r1, #0
   movw r1, #0x0C14
   movt r1, #0x4002
   str r0, [r1, #0]
   bx lr

 led_off:
   // Write 0x00000000 to GPIOD_ODR (0x40020C14)
   mov r0, #0
   mov r1, #0
   movw r1, #0x0C14
   movt r1, #0x4002
   str r0, [r1, #0]
   bx lr

Again, we need to compile this assembly code to generate a relocatable object file.

arm-none-eabi-as -m thumb main.s -o main.o

Defining the Interrupt Vector Table (IVT)

Then, we need to define an IVT table that maps the interrupt vectors to the corresponding interrupt handlers.

 .section .isr_vector

 .word 0x00000000 // Initial Stack Pointer
 .word _start + 1 // Reset Handler
 // Other interrupt vectors can be defined here also

Why we are adding 1 to the _start label? Because the LSB of the PC register indicates whether the processor is in Thumb mode or ARM mode. We can also do this by using the .thumb_func directive before the _start label in the main assembly code. But I prefer this for some educational purposes.

Again, we need to compile the IVT assembly code to generate a relocatable object file.

arm-none-eabi-as -m thumb ivt.s -o ivt.o

Linking the Object Files and Creating the Final Binary

Right now, we have three object files: main.o, init.o and ivt.o. We need to link them together to create the final binary. But linking is not as straightforward as compiling. We need to specify how the memory is laid out in the embedded device. In addition to that, we need to specify how the sections in the object files are mapped to the memory regions. Here is out link.ld linker script that defines the memory layout and section mapping:

MEMORY {
  FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 1M
  RAM   (rwx): ORIGIN = 0x20000000, LENGTH = 128K
}

SECTIONS {
  .text : {
    KEEP(*(.isr_vector*))
    *(.text*)
  } > FLASH
}

The MEMORY block defines two memory regions: FLASH and RAM. The FLASH region starts at address 0x08000000 and has a length of 1MB. The RAM region starts at address 0x20000000 and has a length of 128KB. These are just definitions. We will use these definitions in the SECTIONS block to map the sections in the object files to the memory regions.

The SECTIONS block defines how the sections in the object files are mapped to the memory regions. It’s syntax is a bit weird, let me explain it step by step.

First we are defining the sections of the resulting object file along with the address of the target memory region, .text : { ... } > FLASH indicates that the resulting object file will have a .text section and it will be placed in the FLASH memory region.

In the curly braces, we need to define which sections will be merged under that section. Also, we can specify the name of the object file. For example, xxx.o(.text*) indicates that all sections that start with .text from the xxx.o object file.

In the first line of the curly braces, we are using the KEEP command. Keep command is used to prevent the linker from removing unused sections. By default, the linker removes any sections that are not referenced by other sections. This is useful for reducing the size of the final binary. However, in embedded systems, some sections are required to be present in the final binary even if they are not referenced by other sections. For example, the interrupt vector table must be present in the final binary even if it is not referenced by any other section. Therefore, we use the KEEP command to prevent the linker from removing these sections.

The first line simply indicates that we want to place all sections that start with .isr_vector in the final binary. Also we know that .text section will always start at the beginning of the FLASH memory region. Therefore, we are telling the linker to keep the .isr_vector sections at the beginning of the .text section.

Then, we are collecting all sections that start with .text from all object files and placing them in the .text section of the final object file.

That’s it! We can now link the object files using the following command:

arm-none-eabi-ld -T link.ld main.o init.o ivt.o -o firmware.elf

However, the resulting firmware.elf file is not suitable for flashing to the device. We need to convert it to a binary file first. We can do this using the following command:

arm-none-eabi-objcopy -O binary firmware.elf firmware.bin

Flashing the Binary to the Device

We can now flash the resulting firmware.bin file to the STM32 device using the following command:

st-flash --reset write firmware.bin 0x08000000

This command will write the firmware.bin file to the device starting at address 0x08000000, which is the start address of the FLASH memory region defined in the linker script. The --reset flag will reset the device after flashing and we should see the leds blinking on the board.

Why we are indicating the start address again? Because the st-flash tool does not know any sections or memory regions. Because, they are dropped during the conversion from ELF to binary. Therefore, we need to specify the start address of the binary manually.