Archive for the ‘gettingstartedwithxmega’ Category


Xmega fractional baud-rate source code


Earlier I posted a spreadsheet I created that calculated the BSEL and BSCALE for the Xmega’s fractional baud-rate generator.  This works well to determine what the potential is for getting your chip to run a viable baud-rate for a given clock, but isn’t so useful when you actually want to write a configurable piece of code.

Since then I’ve developed two methods for generating the appropriate register settings for a given baud rate.  The first method was designed around the original constraints I had, which were that the CPU frequency and baud-rate were set statically in the source code, and never changed or dealt with programmatically.  As such, it’s a set of macros that determine the best available BSEL and BSCALE:

#ifndef __XMEGA_BAUD_H__
#define __XMEGA_BAUD_H__

#define _BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,bscale) (                \
((bscale) < 0) ?                                                      \
  (int)((((float)(f_cpu)/(8*(float)(baud)))-1)*(1<<-(bscale)))        \
: (int)((float)(f_cpu)/((1<<(bscale))*8*(float)(baud)))-1 )

#define _BSCALE(f_cpu,baud) (                                         \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,-7) < 4096) ? -7 :              \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,-6) < 4096) ? -6 :              \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,-5) < 4096) ? -5 :              \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,-4) < 4096) ? -4 :              \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,-3) < 4096) ? -3 :              \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,-2) < 4096) ? -2 :              \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,-1) < 4096) ? -1 :              \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,0) < 4096) ? 0 :                \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,1) < 4096) ? 1 :                \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,2) < 4096) ? 2 :                \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,3) < 4096) ? 3 :                \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,4) < 4096) ? 4 :                \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,5) < 4096) ? 5 :                \
(_BAUD_BSEL_FROM_BAUDSCALE(f_cpu,baud,6) < 4096) ? 6 :                \
7 )

#define BSEL(f_cpu,baud)                                              \

#define BSCALE(f_cpu,baud) ((_BSCALE(f_cpu,baud)<0) ? (16+_BSCALE(f_cpu,baud)) : _BSCALE(f_cpu,baud))

#endif /* __XMEGA_BAUD_H__ */

(beware the line continuations!) Basically, the BSCALE macro steps through the +-7 range hunting for the highest legal (12-bit) BSEL value, and the BSEL macro uses that to generate the right divider.  A typical usage would be something like this:

#define F_CPU 32000000
#define BAUDRATE 115200


More recently I’ve been developing a more “object oriented” set of routines that allow me to stack one thing on top of another (more about that later).  As a result, I needed to develop a form of the above code that would work at runtime.  As you can see from the first macro in the above code, a naive approach would bring a microcontroller to its knees in a matter of seconds (as in: it could take entire seconds to calculate…).  In order to solve this problem I took a look at the problem from a different perspective, and ended up with the following code:

#define F_CPU 32000000

uint8_t xmega_usart_setspeed (USART_t *usart, uint32_t baud) {
  uint32_t div1k;
  uint8_t bscale = 0;
  uint16_t bsel;

  if (baud > (F_CPU/16)) return 0;

  div1k = ((F_CPU*128) / baud) - 1024;
  while ((div1k < 2096640) && (bscale < 7)) {
    div1k <<= 1;

  bsel = div1k >> 10;

  usart->BAUDCTRLA = bsel&0xff;
  usart->BAUDCTRLB = (bsel>>8) | ((16-bscale) << 4);

  return 1;

The above code will result in the best available baud rate, calculated with 0.1% precision (but does not guarantee 0.1% baud-rate accuracy), using only a single 32-bit divide.  My current headache prevents me from properly explaining how it works, but the clever reader should be able to puzzle it out pretty quickly.  I’ll try to replace this excuse with an actual explanation at some point in the future.  If I haven’t yet, write a comment reminding me….

If you’re running on a system with a variable system clock (e.g. stepping the clock up for a burst of performance and back down for a long sleep), you could easily modify the function to take the F_CPU as a function parameter rather than a #define.  Replacing the (F_CPU/16) with (F_CPU>>4) and (F_CPU*128) with (F_CPU<<7) might be necessary to hint the compiler, but everything else should work the same.  You could then precalculate and store the BAUDCTRL values for each clock speed, and swap them in as needed, or if your clock is more variable than that, just run the calculation each time.

I haven’t profiled the runtime of the code yet, but I suspect it’s well under 1000 cycles, dominated by the 32-bit divide.


Getting started with Xmega: differences from ATmega (part 2)


In Part 1 I explained some of the high-level differences between the older ATmega and the newer Xmega chips.  This includes things like pinout cleanliness, enhanced peripheral count, and much less arbitrary overlap between functions.  In Part 2, I’ll be delving deeper into the architectural changes that result from this design, and how they make writing software for the Xmega much more manageable.

The issue at hand now is not where the peripherals are placed physically on the chip, but how they’re interacted with by software, logically.  As with any other microcontroller, this is done via registers.  These are specific locations in memory (or sometimes a “third” address space, besides memory and code) that when read from or written to will cause some behavior within the peripheral that the register is associated with.  For instance, writing to a USART data register will typically push the written byte into a temporary buffer and start transmitting that byte over the serial port.  Reading from the same register will pull from a different temporary buffer and retrieve the byte that was most recently received.  Other registers contain flags, such as the Transmit Enable flag in one of the USART’s control registers.

To start off with, we’ll again go back to the venerable ATmega*8 as used in the Arduino.  Let’s list all the registers that have anything to do with any of the Port D pins, and what their register address is:

  • 0xC6 UDR0
  • 0xC5 UBRR0H
  • 0xC4 UBRR0L
  • 0xC2 UCSR0C
  • 0xC1 UCSR0B
  • 0xC0 UCSR0A
  • 0xB4 OCR2B
  • 0xB3 OCR2A
  • 0xB2 TCNT2
  • 0xB1 TCCR2B
  • 0xB0 TCCR2A
  • 0x7F DIDR1
  • 0x7B ADCSRB
  • 0x70 TIMSK2
  • 0x6E TIMSK1
  • 0x6D PCMSK2
  • 0x69 EICRA
  • 0x50 ACSR
  • 0x48 OCR0B
  • 0x47 OCR0A
  • 0x46 TCNT0
  • 0x45 TCCR0B
  • 0x44 TCCR0A
  • 0x3D EIMSK
  • 0x3E EIFR
  • 0x3B PCIFR
  • 0x37 TIFR2
  • 0x35 TIFR0
  • 0x2B PORTD
  • 0x2A DRD
  • 0x29 PIND

That’s a lot of registers!  Now while I’m not going to claim that the Xmega uses particularly fewer registers than the ATmega, I challenge you to tell me quickly what every one of those registers does…  In comparison, the registers needed for Port D on an Xmega:

  • PORTD.
    • IN
  • TCD0.
    • CTRLA
    • CTRLE
    • TEMP
  • TCD1.*
  • USARTD0.
    • DATA
    • STATUS
  • USARTD1.*
  • TWID.
    • CTRL
    • MASTER.
      • STATUS
      • BAUD
      • ADDR
      • DATA
    • SLAVE.
      • STATUS
      • ADDR
      • DATA
      • ADDRMASK
  • SPID.
    • CTRL
    • STATUS
    • DATA

Now this is somewhat more comprehensible.  Yes, there are a metric ton more registers, but they all represent significantly enhanced capabilities.  More importantly, they’re all grouped very clearly by module.  If you want to use the first USART on Port D, you start by setting USARTD0.CTRLA, and work from there, rather than trying to remember UCSR0A.  Good luck remembering which UCSR* goes with which port on a bigger chip like the ATmega128…

You’ll notice that both TCD1 and USARTD1 aren’t enumerated, but just listed with a *.  That’s because they have the exact same registers as their D0 counterparts (except that TCx1 drop the 2nd and 3rd compare registers).  Compared to the ATmega, that’s a major bonus: all the peripherals are the same, both between multiple instances in the same chip and between chips in the series.

Delving even deeper, let’s look at how the SPI port is described first in the ATmega8 header file:

/* SPI */
#define SPCR    _SFR_IO8(0x0D)
#define SPSR    _SFR_IO8(0x0E)
#define SPDR    _SFR_IO8(0x0F)

…and now how it’s defined in the Xmega headers:

/* Serial Peripheral Interface */
typedef struct SPI_struct
    register8_t CTRL;  /* Control Register */
    register8_t INTCTRL;  /* Interrupt Control Register */
    register8_t STATUS;  /* Status Register */
    register8_t DATA;  /* Data Register */
} SPI_t;
#define SPIC    (*(SPI_t *) 0x08C0)  /* Serial Peripheral Interface C */
#define SPID    (*(SPI_t *) 0x09C0)  /* Serial Peripheral Interface D */

In the Xmega, every peripheral is given a block of register space, and all the individual registers are allocated within that block.  The SPID register block looks exactly like the SPIC, and SPIE, and SPIF register blocks, except for the starting address.  Thus, the only difference between the ATxmega*A4 and ATxmega*A3 is the following:

#define SPIE    (*(SPI_t *) 0x0AC0)  /* Serial Peripheral Interface E */
#define SPIF    (*(SPI_t *) 0x0BC0)  /* Serial Peripheral Interface F */

A major side effect of all these structures is that you can now easily construct functions and other code structures that can operate on one of these peripherals purely by address:

void spi_init(SPI_t *port) {
    port->CTRL = 0xd0;
    port->STATUS = 0x80;

With simply dot notation, you can significantly simplify your hardware configuration:

#define LED_PORT     PORTC
#define LED_RED_bp   0
#define LED_GREEN_bp 0
#define LED_BLUE_bp  0

// . . .

In comparison, the same for the ATmega8 would be:

#define LED_PORT    PORTC
#define LED_DIR     DDRC
// . . .

If you are using peripherals more complex than just an output port, you can imagine how having to #define all the various registers to keep track of which of potentially several similar peripherals (e.g. USARTs) is used by that logical device would get rather obnoxious.

This particular feature has saved me uncounted hours on the product I’m developing, by allowing me to keep the same codebase across multiple revisions of the hardware.  “hardware.h” contains a switchout that loads “hardware-rev1.h” or “hardware-rev2.h” or whichever.  All these files list the same board-level peripherals (the LED, the debug and RS-485 ports, I2C for the clock PLL, etc.), and as long as I use them exclusively in my actual code, all I have to do when I switch around the hardware layout is to generate a new file and change which ports and such are referenced.

(Part 3: how this peripheral interchangeability can make your project design radically more flexible)


Getting started with Xmega: differences from ATmega (part 1)


To start out this series on getting started with Atmel’s new Xmega chips, I first need to explain what it is that makes it an upgrade from the original AVR ATmega chips.  While there are a lot of common elements, the combination of a large number of peripherals and the mechanisms Atmel provides to connect them all together makes for a very powerful chip.  The Xmega are capable of things that an ordinary AVR can only dream about.

For reference, let’s start with the configuration of the ever-popular ATmega*8, the core of the Arduino series:

Here we have a color-coded diagram showing the pins with all their alternate functions.  There are a total of 3 ports, only two of which have all 8 bits.  Port C is missing PC7 entirely, and PC6 is generally unavailable as it is multiplexed with the RESET pin, required to reprogram the device.  Port B will be lacking PB6 and PB7 in most applications, as they are multiplexed with the crystal driver.  In addition, notice that the pins of a given port are not only scattered around in various places on the chip, but not necessarily even in order.  The ATmega*8 does better than some, and certainly light-years better than any PIC I’ve seen, but it’s still a routing challenge waiting to happen.

Even more than the pin orderings, notice the fact that there’s only one serial port, one I2C (sorry, TWI) port, and one SPI port.  Three timers give a total of 6 potential PWM outputs if you don’t need the timers for anything else, and you don’t need the SPI port that overlaps 2 of them.  Analog is spread between the 6 ADC pins on Port C (two of which are lost if you need I2C), and the comparator steals another PWM output from a different port entirely.  However a major upgrade to the ATmega*8 series versus previous generations is the addition of the PCINT* capability.  Instead of being stuck with just INT0 and INT1 for external interrupts, every single pin can be configured to trigger one of a cluster of interrupts.

Now let’s look at another popular AVR in a bigger package, the ATmega1284:

This looks a lot better, due in part to the larger package.  Not only do we get all 8 bits of every port, but they’re actually all in order.  We gain an additional serial port (TXD/RXD1) though without synchronous capability (no XCK1).  A couple more ADC pins are available, since Port A is complete and the TWI pins have moved elsewhere.  We’re still stuck with only 6 PWM’s, but only one of them is potentially unavailable, and only if the SPI module is used in slave mode (since SS# can be moved anywhere when the chip is in master mode).  The RESET# and XTAL pins have also moved to their own dedicated pins, so that’s even fewer lost pins, though with the drawback that we end up “losing” two pins if there’s no crystal attached.

Now let’s take a look at the ATxmega*A4, the smallest of the new line:

Right off the bat we notice a slight change: the package is no longer DIP, but TQFP.  This is the main drawback of the chips: they’re only available in surface-mount package.  However, I’ve rectified that by developing (and selling) adapters that convert the chips into standard DIP pinouts: (insert link here).

The next thing you should notice is a preponderance of highlighted pin functions.  Instead of 9 or 11 “major” alternate functions (serial, TWI, SPI), we have 27.  This chip has 5 serial ports, 2 TWI ports, and 2 SPI ports.  Even better, every single one is identical from a software perspective, but more on that later.  We also see a total of 12 ADC inputs, and even two DAC outputs!  Spread between ports C through E we find 16 PWM outputs, and the diagram doesn’t even bother showing the “PCINT” functionality, because every single pin of every single port is capable of various types of interrupts.  The crystal pins are available for use as a normal port (R) if you only need the internal oscillators.

A key feature is the fact that the programming pins are completely dedicated to the task.  Marked in purple above, RESET# and the CLK/DATA pins are all that are needed to program the Xmega chips (besides reference power and ground).  These pins are never multiplexed with anything else, so no more careful wiring of the SPI port so you can still flash the chip…

On the bigger end of things, we have the ATxmega*A1 chips:

Being the largest chip in the series it has 100 pins.  You should be able to click on the above image to get a larger one you might be able to read the labels on…

Working from ports A to R, this chip has: 16 inputs on 2 separate ADCs, 4 outputs on 2 separate DACs, 8 serial ports, 4 TWI ports, 4 SPI ports, 24 PWM outputs, and a memory interface capable of both SRAM and SDRAM up to 16MB.  A “timer” crystal connection is available on the extra 4 pins at the top just in case.

The pin arrangement is very clean, with every port in order around the chip, all contiguous, and all running in the same pin order (though the same can’t really be said of the BGA version, Atmel has been made aware of the serious flaws in pin placement there…).  There are power and ground pins for every port, capable of 200mA each.  In particular, that makes the chip capable of driving 20mA on every single pin simultaneously, a potential boon for those using discrete LEDs.

(Part 2: structural differences in how registers are managed make the plethora of peripherals more manageable)


Xmega USART fractional baud-rate speadsheet


The default baud rate on the Bluetooth adapters I’m using to program and debug the current generation of board for my main contract is 115.2K.  That’s actually rather slow when shoving 50-80KB onto the chip every time I make a code change.  The adapters are capable of up to 921.6K, but even at 32MHz a normal USART baud-rate generator ends up with a particularly ugly error percentage (8.5% as it happens, well outside the allowed 2.0%).  However, the Xmega has a fractional baud-rate generator.  I’m not actually sure how it operates, but I know it’s capable of generating much more accurate serial clocks.

Because the calculations are rather tedious, I designed a simple spreadsheet to tell you what the usable BSEL and BSCALE values are for a given rate.  Plug in your main clock rate and target baud rate, and it’ll show you the viable combinations.  For 32MHz 921.6Kbaud, the BSCALE has to be set to -2 or lower, with -7 providing a combination that’s only off 0.1% of nominal.


Getting started with Xmega: clock management



The Xmega has made some major advances in clock management, likely as a conceptual follow-on to the picoPower AVR. While clock selection used to be both limited to a few options and mostly configured via fuses (at programming time), the Xmega has a very wide range of clocking choices that are entirely run-time configured.

The Xmega has the following clock sources available:

– 2MHz internal RC
– 32MHz internal ring oscillator
– 32.768 internal precision RC
– 400kHz – 20MHz external crystal
– 32.768 external crystal
– External (square wave) clock

In addition there are not only the usual clock dividers, but a PLL capable of anywhere from 2x to 31x multiplication.

All chips power up off the 2MHz internal RC at all times, because it’s the first to come up and is guaranteed to be available. All user code that wishes to use something else must configure it before continuing.

Each clock source has both an enable bit and a ready flag, since various clocks take different (and sometimes relatively long) periods of time to stabilize. The external crystal drivers also have failure detectors with dedicated interrupts that fire after the clock is switched back to the 2MHz fallback.

Several of the key registers are covered by the Configuration Change Protection (CCP) mechanism, which is nearly identical to the mechanism used to change or disable the watchdog timer on conventional AVRs. To write the protected register one must first write the CCP register, then the desired register within some short number of cycles (typically 4). This is to protect mostly from rogue memory pointers drifting into critical registers.

There are a total of 3 dividers, which allow the clock to be dropped by as much as 4096(?)x from the base frequency, in powers of 2. However, the first two have more specific purpose: they allow higher peripheral clocks to specific modules. CLKper4 and CLKper2 drive the high-res timer and External Bus Interface modules, respectively.

To make use of these higher frequencies, the (???) register can be set to predivide the input clock in order to bring it down to the range in which the actual CPU can operate. For instance: 8MHz crystal with 16x PLL gives 128MHz, which when divided /2,/2, gives 32MHz to the CPU, 64MHz to the EBI, and 128MHz to the highres timer extention.

Additionally for those who need the lower parts count you get from running off one of the internal oscillators, the Xmega theoretically offers two DFLL’s (Digital Frequency-Locked Loop) that are supposed to adjust the timings (a la OSCCAL) relative to a 32.768kHz reference, either internal or external. However, this module appears to be particularly buggy, and for the most part can be treated as if it doesn’t exist, at least in current silicon.

I will be posting a form of my clock routines sometime tomorrow to help get people boot-strapped on something other than ~2MHz…


Getting started with Xmega: toolchain


(first edit of iPhone version)

Another major advantage of the Xmega is the fact that it is running the exact same CPU core as the original AVR, which means the toolchain is almost exactly the same. The only major things new to these chips are the linker scripts (since flash addresses are different) and the header files that define the registers for the various peripherals.  However, it is still the case that if your toolchain doesn’t have the necessary changes and is thus aware of the particular chips you’re using, you’re sunk.  The situation is still somewhat inconsistent enough that you may end up with a few challenges between you and writing code for the Xmega.

GCC is the default compiler in the open-source world, and it has been made to work with the AVR even though it’s not strictly the right kind of processor (GCC dislikes Harvard-architecture chips as a rule).  The avr-gcc project maintains the patches necessary to compile for AVR chips, and thus is responsible for managing the patches that make the Xmega work as well.  The current problem is that the various distributors of avr-gcc are responsible for actually bringing those patches into their builds of the software, and that’s where the breakdown is right now.  avr-gcc is in the process of getting their patches folded into the GCC mainline, which will eliminate the problem, but that could still be in the works in 2011…

Xmega support was initially added back when the silicon was still a glimmer in the hobbyists’ eye, and began in earnest when early silicon was delivered to select developers (to te best of my knowledge). As such, full support of the whole range of devices took a little while to fully materialize.  The current set of patches (as of roughly early June 2010) finally contains support for all the Xmega devices Atmel has actually physically shipped, plus a few extras still in the pipeline or available only in select (export) markets.

The Win-AVR bundle, which has recently been discontinued in anticipation of upgrades to AVRStudio, was generally the most up-to-date toolchain you can get.  The “most official” place to get patches would be either the Win-AVR site, or the FreeBSD repository, seemingly dependent on the weather.  Folding into GCC mainline will resolve that problem, but Win-AVR is still the “easiest” place to find what you need.

Unfortunately, distributions like Ubuntu (and its underlying Debian) haven’t been as careful in updating.  The Ubuntu 10.04 avr-gcc packages are no different than the year-old 9.04 packages, and thus contain only small fragments of Xmega support.  A number of scripts and other documents are out there that purport to solve the problem, but the several that I’ve looked at have the same problem of having been inconsistently updated.  As a result it took me longer than it should have to piece together a toolchain for my machine.  I will be posting the .deb’s of those packages in the near future, hopefully before the Open Lab on June 27th.


Getting started with Xmega: programming software


(first edit of iPhone version)

Once you have the required programming hardware for your Xmega, you need to figure out how to actually use it to load your code.  You can’t just shove it at the programmer and hope it does something intelligent (though that would be nice), you have to talk the particular language of each programmer.  At least for the Atmel programmers there’s one main protocol (STK500[v2]), but there are so many variations even within that one protocol as to make anybody’s head spin.  Luckily, that’s why people write programming tools: to hide the (sometimes unnecessary) complexity of the programming hardware.

Right off the bat, I’ll list the default AVRStudio as the most integrated solution. However, that’s about all I can tell you, since I only use it for reference and recovery when I run into problems or complications with my usual method. Hopefully somebody who already uses it or learns how can post further details, which I can fold into this tutorial if desired.

AVRDude is the other major piece of software, and the only one I use to actually load code. It’s an open-source tool capable of using any of several dozen types of programming hardware to flash most every AVR out there (and then some). It works best on Unix-style systems (Linux, Mac) but presumably runs just fine on Windows as well, as it’s packaged with the Win-AVR bundle.

When I put projects together, I typically start with a simple shell script that compiles, preps and downloads the firmware in one go. This is somewhat peculiar to my work patterns as I don’t use an IDE of any sort (not even emacs).  Makefiles are the more common but also somewhat more involved method.  Most projects I create that use more than one source file are implemented with Make, but more on that later.

The actual command to flash the chip is something along the lines of:

avrdude -pavrisp2 -Pusb -cx192a3 -U w:flash:myprog.srec

The -p argument sets the programming tool to be used, and -P tells avrdude which port to use.  usb is the connection method for all of the currently available Xmega-compatible programmers, though my current project uses a bootloader that implements the avr910 protocol over a standard serial port, e.g. /dev/ttyUSB0.  The chip being programmed is given by the -c, and -U specifies an upload operation, in this case writing to flash the contents of myprog.srec.


Get every new post delivered to your Inbox.