Aug 132016
 

Sometimes I wonder.  Take this evening for example.

I recently purchased some microcontrollers to evaluate for a project, some Atmel ATTiny85s, because they have a rather nice PLL function which means they can do VHF-speed PWM, and some NXP LPC810s, because they happen to be the only DIP-package ARM chip on the market I know of.

The project I’m looking at is a re-work of my bicycle horn… the ATMega32U4 works well, but the LeoStick boards are expensive compared to a bare DIP MCU, and the wiring inside the original prototype is a mess.  I also never got USB working on them, so there’s no point in a USB-capable MCU.

I initially got ATMega1284s owing to the flash storage, but these being 40-pin DIPs, they’re bigger than anticipated, and the fact they’ve got dual USARTs, lots of GPIOs and plenty of storage space, I figured I’d put them aside for another project.

What to use?  Well I have some AT89C2051s from way back (but no programmer for them), some ATTiny24As which I bought for my solar cluster project, an ATMega8L from another project, a LeoStick (Arduino Leonardo clone).  The LeoStick I’m in the process of turning into a debugWire debugger so that I can figure out what the ADCs are doing in my cluster’s power controller (ATTiny24A).

I started building a programmer for the ‘2051s using my ATMega8L last weekend.  The MAX232 IC I grabbed for serial I/O was giving me jibberish, and today I confirmed it was misbehaving.  The board in general is misbehaving in that after flashing the MCU, it seems to stay in reset, so I’ve got more work to do.  If I got that going, I was thinking I could have PCM recordings in an I²C EEPROM and use port 1 on the ‘2051 with an R2R ladder DAC to play sound.  (These chips do not feature PWM.)

Thinking this morning, I thought the LPC810 might be worth a shot.  It only has 4kB of flash, half that of the ATTiny85, and doesn’t have as impressive PWM capabilities, but is good enough.  I really need about 16kB to store the waveforms in flash.  I do have some I²C EEPROMs, mostly <2kB ones that are sourced off old motherboards, but also a handful of 32kB ones that I had just bought especially for this… but then left behind on my desk at work.

I considered audio compression, and experimenting with ADPCM-style techniques, came to the conclusion that I didn’t like the reduced audio quality.  It really sounded harsh.  (Okay, I realise 4-bits per sample is never going to win over the audiophiles!)

Maybe instead of PCM, I could do a crude polyphonic synthesizer?  My horn effect is in fact synthesized using a Python script: the same can be done in C, and the chip probably has the CPU grunt to do it.  It’d save the flash space as I’d be basically doing “poor man’s MIDI” on the thing.  Similar has been done before on lesser hardware.

I did some rough design of data structures.  I figured out a data structure that would allow me to store the state of a “voice” in 8 bytes, and could describe note and timing events in 8-byte blocks.  So in a 2kB EEPROM, I’d store 256 notes, and could easily accommodate 8 or 16 voices in RAM, provided the CPU could keep up at 30MHz.

So, I pull a chip out, slap it in my breadboard, and start hooking it up to power, and to my shiny new USB-TTL serial cable.  Fire up lpc21isp and, nothing, no response from the chip.  Huh?  Check wiring, probe around, still nothing.  Tried different baud rates, etc.  No dice.

This stubborn chip was not going to talk to lpc21isp.  Okay, let’s see if it’ll do SWD.  I dig out my STLink/V2 and hook that up.

OpenOCD reports no response from the device.

Great, maybe a dud chip.  After a good hour or so of fruitless poking and prodding, I pull it out of the breadboard and go to get another from the tube it came from when I notice “Atmel” written on the tube.

I look closer at the chip: it was an ATTiny85!  Different pin-out, different ISP procedure, and even if the .hex file had uploaded, it almost certainly would not have executed.

Swap the chip for an actual LPC810, and OpenOCD reports:

Open On-Chip Debugger 0.10.0-dev-00120-g7a8915f (2015-11-25-18:49)
Licensed under GNU GPL v2
For bug reports, read
http://openocd.org/doc/doxygen/bugs.html
Info : auto-selecting first available session transport "hla_swd". To override use 'transport select '.
Info : The selected transport took over low-level target control. The results might differ compared to plain JTAG/SWD
adapter speed: 10 kHz
adapter_nsrst_delay: 200
Info : Unable to match requested speed 10 kHz, using 5 kHz
Info : Unable to match requested speed 10 kHz, using 5 kHz
Info : clock speed 5 kHz
Info : STLINK v2 JTAG v23 API v2 SWIM v4 VID 0x0483 PID 0x3748
Info : using stlink api v2
Info : Target voltage: 2.979527
Warn : UNEXPECTED idcode: 0x0bc11477
Error: expected 1 of 1: 0x0bb11477
in procedure 'init'
in procedure 'ocd_bouncer'

I haven’t figured out the cause of this yet, whether the ST programmer doesn’t like talking to a competitor’s part. It’d be nice to get SWD going since single-stepping code and peering into memory really spoils a developer like myself. I try lpc21isp again.

Success!  I see a LED blinking, consistent with the demo .hex file I loaded.  Of course now the next step is to try building my own, but at least I can load code onto the device now.

Sep 272015
 

Well, lately I’ve been doing a bit of work hacking the firmware on the Rowetel SM1000 digital microphone.  For those who don’t know it, this is a hardware (microcontroller) implementation of the FreeDV digital voice mode: it’s a modem that plugs into the microphone/headphone ports of any SSB-capable transceiver and converts FreeDV modem tones to analogue voice.

I plan to set this unit of mine up on the bicycle, but there’s a few nits that I had.

  • There’s no time-out timer
  • The unit is half-duplex

If there’s no timeout timer, I really need to hear the tones coming from the radio to tell me it has timed out.  Others might find a VOX feature useful, and there’s active experimentation in the FreeDV 700B mode (the SM1000 currently only supports FreeDV 1600) which has been very promising to date.

Long story short, the unit needed a more capable UI, and importantly, it also needed to be able to remember settings across power cycles.  There’s no EEPROM chip on these things, and while the STM32F405VG has a pin for providing backup-battery power, there’s no battery or supercapacitor, so the SM1000 forgets everything on shut down.

ST do have an application note on their website on precisely this topic.  AN3969 (and its software sources) discuss a method for using a portion of the STM32’s flash for this task.  However, I found their “license” confusing.  So I decided to have a crack myself.  How hard can it be, right?

There’s 5 things that a virtual EEPROM driver needs to bear in mind:

  • The flash is organised into sectors.
  • These sectors when erased contain nothing but ones.
  • We store data by programming zeros.
  • The only way to change a zero back to a one is to do an erase of the entire sector.
  • The sector may be erased a limited number of times.

So on this note, a virtual EEPROM should aim to do the following:

  • It should keep tabs on what parts of the sector are in use.  For simplicity, we’ll divide this into fixed-size blocks.
  • When a block of data is to be changed, if the change can’t be done by changing ones to zeros, a copy of the entire block should be written to a new location, and a flag set (by writing zeros) on the old block to mark it as obsolete.
  • When a sector is full of obsolete blocks, we may erase it.
  • We try to put off doing the erase until such time as the space is needed.

Step 1: making room

The first step is to make room for the flash variables.  They will be directly accessible in the same manner as variables in RAM, however from the application point of view, they will be constant.  In many microcontroller projects, there’ll be several regions of memory, defined by memory address.  This comes from the datasheet of your MCU.

An example, taken from the SM1000 firmware, prior to my hacking (stm32_flash.ld at r2389):

/* Specify the memory areas */
MEMORY
{
  FLASH (rx)      : ORIGIN = 0x08000000, LENGTH = 1024K
  RAM (rwx)       : ORIGIN = 0x20000000, LENGTH = 128K
  CCM (rwx)       : ORIGIN = 0x10000000, LENGTH = 64K
}

The MCU here is the STM32F405VG, which has 1MB of flash starting at address 0x08000000. This 1MB is divided into (in order):

  • Sectors 0…3: 16kB starting at 0x08000000
  • Sector 4: 64kB starting at 0x0800c000
  • Sector 5 onwards: 128kB starting at 0x08010000

We need at least two sectors, as when one fills up, we will swap over to the other. Now it would have been nice if the arrangement were reversed, with the smaller sectors at the end of the device.

The Cortex M4 CPU is basically hard-wired to boot from address 0, the BOOT pins on the STM32F4 decide how that gets mapped. The very first few instructions are the interrupt vector table, and it MUST be the thing the CPU sees first. Unless told to boot from external memory, or system memory, then address 0 is aliased to 0x08000000. i.e. flash sector 0, thus if you are booting from internal flash, you have no choice, the vector table MUST reside in sector 0.

Normally code and interrupt vector table live together as one happy family. We could use a couple of 128k sectors, but 256k is rather a lot for just an EEPROM storing maybe 1kB of data tops. Two 16kB sectors is just dandy, in fact, we’ll throw in the third one for free since we’ve got plenty to go around.

However, the first one will have to be reserved for the interrupt vector table that will have the space to itself.

So here’s what my new memory regions look like (stm32_flash.ld at 2390):

/* Specify the memory areas */
MEMORY
{
  /* ISR vectors *must* be placed here as they get mapped to address 0 */
  VECTOR (rx)     : ORIGIN = 0x08000000, LENGTH = 16K
  /* Virtual EEPROM area, we use the remaining 16kB blocks for this. */
  EEPROM (rx)     : ORIGIN = 0x08004000, LENGTH = 48K
  /* The rest of flash is used for program data */
  FLASH (rx)      : ORIGIN = 0x08010000, LENGTH = 960K
  /* Memory area */
  RAM (rwx)       : ORIGIN = 0x20000000, LENGTH = 128K
  /* Core Coupled Memory */
  CCM (rwx)       : ORIGIN = 0x10000000, LENGTH = 64K
}

This is only half the story, we also need to create the section that will be emitted in the ELF binary:

SECTIONS
{
  .isr_vector :
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector))
    . = ALIGN(4);
  } >FLASH

  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
	*(.eh_frame)

    KEEP (*(.init))
    KEEP (*(.fini))

    . = ALIGN(4);
    _etext = .;        /* define a global symbols at end of code */
    _exit = .;
  } >FLASH…

There’s rather a lot here, and so I haven’t reproduced all of it, but this is the same file as before at revision 2389, but a little further down. You’ll note the .isr_vector is pointed at the region called FLASH which is most definitely NOT what we want. The image will not boot with the vectors down there. We need to change it to put the vectors in the VECTOR region.

Whilst we’re here, we’ll create a small region for the EEPROM.

SECTIONS
{
  .isr_vector :
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector))
    . = ALIGN(4);
  } >VECTOR


  .eeprom :
  {
    . = ALIGN(4);
    *(.eeprom)         /* special section for persistent data */
    . = ALIGN(4);
  } >EEPROM


  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */

THAT’s better! Things will boot now. However, there is still a subtle problem that initially caught me out here. Sure, the shiny new .eeprom section is unpopulated, BUT the linker has helpfully filled it with zeros. We cannot program zeroes back into ones! Either we have to erase it in the program, or we tell the linker to fill it with ones for us. Thankfully, the latter is easy (stm32_flash.ld at 2395):

  .eeprom :
  {
    . = ALIGN(4);
    KEEP(*(.eeprom))   /* special section for persistent data */
    . = ORIGIN(EEPROM) + LENGTH(EEPROM) - 1;
    BYTE(0xFF)
    . = ALIGN(4);
  } >EEPROM = 0xff

Credit: Erich Styger

We have to do two things. One, is we need to tell it that we want the region filled with the pattern 0xff. Two, we need to make sure it gets filled with ones by telling the linker to write one as the very last byte. Otherwise, it’ll think, “Huh? There’s nothing here, I won’t bother!” and leave it as a string of zeros.

Step 2: Organising the space

Having made room, we now need to decide how to break this data up.  We know the following:

  • We have 3 sectors, each 16kB
  • The sectors have an endurance of 10000 program-erase cycles

Give some thought as to what data you’ll be storing.  This will decide how big to make the blocks.  If you’re storing only tiny bits of data, more blocks makes more sense.  If however you’ve got some fairly big lumps of data, you might want bigger blocks to reduce overheads.

I ended up dividing the sectors into 256-byte blocks.  I figured that was a nice round (binary sense) figure to work with.  At the moment, we have 16 bytes of configuration data, so I can do with a lot less, but I expect this to grow.  The blocks will need a header to tell you whether or not the block is being used.  Some checksumming is usually not a bad idea either, since that will clue you in to when the sector has worn out prematurely.  So some data in each block will be header data for our virtual EEPROM.

If we don’t care about erase cycles, this is fine, we can just make all blocks data blocks, however it’d be wise to track this, and avoid erasing and attempting to use a depleted sector, so we need somewhere to track this.  256 bytes gives us enough space to stash an erase counter and a map of what blocks are in use within that sector.

So we’ll reserve the first block in the sector to act as this index for the entire sector.  This gives us enough room to have 16-bits worth of flags for each block stored in the index.  That gives us 63 blocks per sector for data use.

It’d be handy to be able to use this flash region for a few virtual EEPROMs, so we’ll allocate some space to give us a virtual ROM ID.  It is prudent to do some checksumming, and the STM32F4 has a CRC32 module, so in that goes, and we might choose to not use all of a block, so we should throw in a size field (8 bits, since the size can’t be bigger than 255).  If we pad this out a bit to give us a byte for reserved data, we get a header with the following structure:

15 14 13 12 11 10 19 8 7 6 5 4 3 2 1 0
+0 CRC32 Checksum
+2
+4 ROM ID Block Index
+6 Block Size Reserved

So that subtracts 8 bytes from the 256 bytes, leaving us 248 for actual program data. If we want to store 320 bytes, we use two blocks, block index 0 stores bytes 0…247 and has a size of 248, and block index 1 stores bytes 248…319 and has a size of 72.

I mentioned there being a sector header, it looks like this:

15 14 13 12 11 10 19 8 7 6 5 4 3 2 1 0
+0 Program Cycles Remaining
+2
+4
+6
+8 Block 0 flags
+10 Block 1 flags
+12 Block 2 flags

No checksums here, because it’s constantly changing.  We can’t re-write a CRC without erasing the entire sector, we don’t want to do that unless we have to.  The flags for each block are currently allocated accordingly:

15 14 13 12 11 10 19 8 7 6 5 4 3 2 1 0
+0 Reserved In use

When the sector is erased, all blocks show up as having all flags set as ones, so the flags is considered “inverted”.  When we come to use a block, we mark the “in use” bit with a zero, leaving the rest as ones.  When we erase, we mark the entire flags block as zeros.  We can set other bits here as we need for accounting purposes.

Thus we have now a format for our flash sector header, and for our block headers.  We can move onto the algorithm.

Step 3: The Code

This is the implementation of the above ideas.  Our code needs to worry about 3 basic operations:

  • reading
  • writing
  • erasing

This is good enough if the size of a ROM image doesn’t change (normal case).  For flexibility, I made my code so that it works crudely like a file, you can seek to any point in the ROM image and start reading/writing, or you can blow the whole thing away.

Constants

It is bad taste to leave magic numbers everywhere, so constants should be used to represent some quantities:

  • VROM_SECT_SZ=16384:
    The virtual ROM sector size in bytes.  (Those watching Codec2 Subversion will note I cocked this one up at first.)
  • VROM_SECT_CNT=3:
    The number of sectors.
  • VROM_BLOCK_SZ=256:
    The size of a block
  • VROM_START_ADDR=0x08004000:
    The address where the virtual ROM starts in Flash
  • VROM_START_SECT=1:
    The base sector number where our ROM starts
  • VROM_MAX_CYCLES=10000:
    Our maximum number of program-erase cycles

Our programming environment may also define some, for example UINTx_MAX.

Derived constants

From the above, we can determine:

  • VROM_DATA_SZ = VROM_BLOCK_SZ – sizeof(block_header):
    The amount of data per block.
  • VROM_BLOCK_CNT = VROM_SECT_SZ / VROM_BLOCK_SZ:
    The number of blocks per sector, including the index block
  • VROM_SECT_APP_BLOCK_CNT = VROM_BLOCK_CNT – 1
    The number of application blocks per sector (i.e. total minus the index block)

CRC32 computation

I decided to use the STM32’s CRC module for this, which takes its data in 32-bit words.  There’s also the complexity of checking the contents of a structure that includes its own CRC.  I played around with Python’s crcmod module, but couldn’t find some arithmetic that would allow it to remain there.

So I copy the entire block, headers and all to a temporary copy (on the stack), set the CRC field to zero in the header, then compute the CRC. Since I need to read it in 32-bit words, I pack 4 bytes into a word, big-endian style. In cases where I have less than 4 bytes, the least-significant bits are left at zero.

Locating blocks

We identify each block in an image by the ROM ID and the block index.  We need to search for these when requested, as they can be located literally anywhere in flash.  There are probably cleverer ways to do this, but I chose the brute force method.  We cycle through each sector and block, see if the block is allocated (in the index), see if the checksum is correct, see if it belongs to the ROM we’re looking for, then look and see if it’s the right index.

Reading data

To read from the above scheme, having been told a ROM ID (rom), start offset and a size, the latter two being in byte sand given a buffer we’ll call out, we first need to translate the start offset to a sector and block index and block offset.  This is simple integer division and modulus.

The first and last blocks of our read, we’ll probably only read part of.  The rest, we’ll read entire blocks in.  The block offset is only relevant for this first block.

So we start at the block we calculate to have the start of our data range.  If we can’t find it, or it’s too small, then we stop there, otherwise, we proceed to read out the data.  Until we run out of data to read, we increment the block index, try to locate the block, and if found, copy its data out.

Writing and Erasing

Writing is a similar affair.  We look for each block, if we find one, we overwrite it by copying the old data to a temporary buffer, copy our new data in over the top then mark the old block as obsolete before writing the new one out with a new checksum.

Trickery is in invoking the wear levelling algorithm on an as-needed basis.  We mark a block obsolete by setting its header fields to zero, but when we run out of free blocks, then we go looking for sectors that are full of obsolete blocks waiting to be erased.  When we encounter a sector that has been erased, we write a new header at the start and proceed to use its first data block.

In the case of erasing, we don’t bother writing anything out, we just mark the blocks as obsolete.

Implementation

The full C code is in the Codec2 Subversion repository.  For those who prefer Git, I have a git-svn mirror (yes, I really should move it off that domain).  The code is available under the Lesser GNU General Public License v2.1 and may be ported to run on any CPU you like, not just ST’s.