Skip to content

Latest commit

 

History

History
84 lines (56 loc) · 20.9 KB

ModernAVR_Overview.md

File metadata and controls

84 lines (56 loc) · 20.9 KB

An overview of modern AVR

for users of classic AVRs These terms are unofficial. A "modern" AVR is any 8-bit AVR released in 2016 or later with 2k or more flash (the 0.5-1k flash parts have major differences in design and use a different variant of the AVR instruction set called AVRrc). At the time of writing this includes the tinyAVR (ATtiny) 0/1/2-series (the part number will have a 0, 1, or 2 in the tens place, and a 2, 4, 6 or 7 (but not any other number - the 828 is a classic AVR) in the ones place, with the hundreds and thousands place indicating flash size), the megaAVR 0-series (ATmega xx08 and xx09), and the AVR Dx-series and upcoming EA-series. The latter two use a new numbering scheme - AVRffDxpp where: ff is flash size in kb, x is a capital letter indicating the featureset, and pp is the pincount - for example, the AVR128DA64 has 128k of flash, 64 pins, and it has a different set of features than the AVR128DB64 (specifically, it loses the PTC, but gains a block of MVIO pins, INLVL setting, 3 on-chip opamps, and support for a crystal, and is priced about 20% higher).

Registers

Registers are now named like PERIPHERAL0.REGISTERNAME, being members of a struct of type PERIPHERAL_t. This is mostly all shorthand (and they also #define PERIPHERAL0_REGISTERNAME so you can test for existence of regisrters with preprocessor macros and use in cases where the . would cause problems), except when you have multiples of one kind of periopheral. Then it becomes incredibly powerful; each instance of, say, HardwareSerial can just keep a pointer to it's instance of USART and refer to it's members easilly. Bits within registers are defined with PERIPHERAL_BITNAME_bp (bit position, like how register names were defined on classic parts) and PERIPHERAL_BITNAME_bm (bitmask, ie, 1 << bitposition). This generally makes for more readable code. Where there's a group of bits that together defines something, they have s PERIPHERAL_BITFIELD_gm (group mask) defined, and an enum (not a define - you can't use in preprocessor macros!) containing PERIPHERAL_BITFIELD_OPTION_gc (group code) for each option. Code is much more readable and easier to follow and understand... but I now have to keep the io header file open in a text editor to keep checking exactly how these are all named. As before, you are also always free to use numeric values and write completely incomprehensible code.

Pins

Pins remain in ports of 8 (0-7) pins named by letters. PA7 refers to pin 7, which is the eighth pin on port A. PORTx.IN, PORTx.OUT, PORTx.DIR replace PINx, DDRx, and PORTx registers. There is also a PORTx.INTFLAGS, replacing PCIFn register. Each pin also has it's own PORTx.PINnCTRL register to control pullups (replacing the "input set high" of most classic AVRs, and PUEx register of the others) or digital-input-disabling (replacing the DIDRn registers). This also has a bit to invert all pin reads + writes, and on MVIO parts (DB-series and DD-series), one to set the input levels to either the normal schmitt trigger based on supply voltage, or "TTL" levels which have a fixed thrshold, and are ideal for interfacing with lower voltage logic.

Since these PORTx registers are no longer in the low IO space |= and &= operations on a single bit aren't atomic anymore. They provide two solutions: OUTSET, OUTCLR, OUTTGL, DIRSET, DIRCLR and DIRTGL registers, alloweing multiple pins which may not be known at compile time to be flipped atomically, but requireing more flash (since it requires an LDI to load a constant value, and then a 2-word STS to write it). VPORTx registers that ARE in the low IO space: VPORTx.IN, OUT, DIR, and INTFLAGs; setting a bit in IN toggles it's output (but doesn't do that for PORTx.IN). In fact, the only registers in the low IO space are these (down to PORTG), and 4 GPIORn or GPR.GPRn general purpose registers.

Some parts (Dx-series so far) can set the PINnCTRL registers for multiple pins at a time, these same parts also usually have a PORTCTRL register which can turn on slew rate limiting for the whole port (it cannot be set for indvidual pins)

Sleep Modes.

Both modern and classic AVRs have powerdown sleep mode, in which only pin interrupts, a low speed timer (WDT on classic AVRs PIT on modern ones) and the TWI slave address match interrupt can wake the part (didn't know that about classic AVRs? Me neither). Like classic AVRs, a pin change on any pin can wake the part from all sleep modes. Unlike classic AVRs, ALL pins can also do the low level wake, and 2 pins per port are termed "fully asynchronous" and can even wake from powerdown sleep mode on rising or falling edge (as opposed to "either"; those pins also can react to events that are shorter than 1 system clock cycle, which may be good or bad depending on whether that brief spike is signal or noise. The PIT

On classic AVRs... that was pretty much your only choice. There was an Idle mode that didn't save much power at all (that's still here), and a standby the kept the oscillator running and used many times more power than power-down mode without much benefit. We still have a standby mode - but it's not that old thing.

The standby mode on modern AVRs is far, far more powerful, and flexible. You can configure almost any peripheral to run in standby, and everything not enabled... won't run. If nothing that needs the main clock is enabled, it won't be run, savign a ton of power. If something that needs it is, well, standby won't save as much power (though it's still not bad at all). On tinyAVR parts, standby with nothing's RUNSTBY bit set is the same current as power-down sleep; on the larger parts, standby takes a little more power. I'm unsure why this is, and rather curious.

Interrupts

We get a more modern interrupt controller than we have had in the past. Nothing crazy, and probably nothing you want your code to need - but it is there if you do.

Interrupts do not always clear the corresponding INTFLAG! The conditions under which an INTFLAG is cleared is stated under register description in datasheet; be sure to read that. Failing to clear an interrrupt flag will result in your code executing one instruction outside of the ISR for every iteraction of the ISR, as it will be immediately called after it returns. Particularly when other code is very when this is going wrong, this can be very hard to recognize when it happens.

As before, misspelled interrupt vector names are compile warnings, not compile errors, which is no less stupid now than it ever was, but you can't blame Microchip for this; it's endemic to C.

Pin interrupts - INT0 and PCINT hybrid

All pins can now be configured like INT0 could be, that is for low level, rising, falling, or change, instead of just the one. Inverting the pin will turn low level interrupt into high level, at least according to the block diagrams... All pin interrupts are in banks per port, like old PCINTs were. There are a few other peripherals with options to interrupt on rising, falling or any change - the analog comparator(s) and CCL outputs come to mind.

Scheduling and prioirity

There is now an option to specify one interrupt as (priority 1), allowing it to interrupt other (priority 0) interrupts. Opposite the spoken convention of "first priority" being more important than "second priority" here, priority 1 is higher priority than priority 0. The global interrupt bit is no longer turned on or off automatically when running ISRs - it stays in it's current state (this doesn't change anything about how you use it though - library and core functions that need to disable interrupts to ensure atomicity still need to save and restore the state because they could be called with interrupts disabled - it's just a less common occurence). There is now a bit in CPUINT.STATUS for whether it's currently running an interrupt of a given priority. Traditionally, interrupts have had only one scheduling strategy. When interrupts are enabled, the lowest number interrupt that has it's interrupt bit set is serviced. This is still the default, but they now have the option of round robin scheduling - the interrupt most recently serviced (assuming it isn't the priority 1 interrupt) has the lowest priority next time. This allows you to ensure that, say, a serial port RX interrupt is not abandoned for so long that it loses characters, or a timer interrupt for so long that you're losing time, during intense interrupt pressure - but you would like very badly to never be in that situation, since it means your interrupts are not executing quickly enough, or are coming in at an hopelessly fast pace: this only comes into play when either two interrupts are triggered simultaneously, or when one interrupt finishes, there's more than one waiting in line, and it's only important in the second case - imagine two lower numbered interrupts that end up trading off control of the system, each being triggered whiel the other is executing, blocking execution of a serial RX or other key interrupt.

Other notes

There is a nonmaskable interrupt that can only be triggered by the CRC scan system, and overrides the opriority of everything else, and can be forced on by fuses. This is clearly intended for the safety-critical use cases, where you don't want to risk the part having corrupted firmware which appears to work, but does not perform as expected in some situation, and is irrelevant to Arduino users. There is also a "Compact Vector Table" option, also probably not relevant to Arduino users, which allows you to condense the whole vector table to one vector each for NMI, priority 1 interrupt, and everything else, to save the flash used for the vector table. With the worst case of a tiny212 with 2k of flash and a vector table wih 26 entries (52 bytes, this saves... 46 bytes - but you've gotta set that bit, which will likely be LDI, STS - 6 bytes of flash). I guess, sure, that's still about 2% of the flash, but if you've got much more than 2 sources of interrupts, you'll lose most of the flash you saved sorting out which one was triggered.

Clock Sources

Only DB-series, and the upcoming DD and EA-series parts support an extenal crystal. Everything else can only use the internal oscillator or external clock. Dx-series parts' internal can run at 1, 2, 3, 4, 8, 12, 16, 20, 24 and (unofficially) 28 and 32 MHz, which can be "autotuned" against a connected watch crystal, but only to within around 0.5%; Internal oscillator ion the Dx-series parts has virtually no dependance on supply voltage becasue it runs from an internal regulator, along with most of the other logic; if opperating in varying temoperatures the autotuning feature may be more useful; in a room at a comfortable temperature for humans, it doesn't do much. Other parts must choose from a 16 or 20 MHz base clock (which does have temperature dependance too). They all attempt to correct for temperature via a stored temperature cal value, and they all have a prescaler. The ones with the 16/20 MHz clock option have a calibration byte with 64 different settings (128 on 2-series and likely EA-series) - the 20 MHz one can typically be cranked up to over 30 MHz (though it will usually crash before reaching it's maximum - if the pattern continued, 2-series parts would reach in the area of 35-38 MHz - if they didn't crash first, which they do) or down below 14, while the 16 MHz one can run from 10-11 up to the mid-high 20's (2-series parts can often just reach 30) - this is much like the classic AVR OSCCAL byte - except there seems to be a shortage of cal bits now or something. The 1 or 2 read-only high bits of the calibration value now indicate which speed oscillator is in use.

DB and DD series parts have clock failure detection (DxCore uses this to generate a crude blink code to indicate a clock failure). Everything else just stops if the clock stops.

When told to switch clock sources, the SOSC bit is set, and within the startup time specified for that type of clock source, assuming the clock source is working, it will switch over tio the other one. Otherwise, it continues on the old speed if told to use an external clock that is not present. Cores will typically busywait for the switch to occur after requesting the switch during startup.

Timers

All parts have a Timer/Counter type A (TCA0), some have a TCA1. This can be used as either a 3 channel 16-bit PWM timer, or a pair of 3 channel 8-bit PWM timers with very few features. DxCore and megaTinyCore configure in that "split mode" because analogWrite() supports... 8-bit PWM with very few features. These can be clocked from events, and in 16-bit mode, they can count in either direction, single or dual slope mode, and support buffering of the period and duty cycle registers, so they are updated only at the end of a cycle thus preventing single-cycle glitches. On tinyAVR parts, each of the 6 channels can be pointed at one of two pins using the PORTMUX. They can be targeted individually.

On megaAVR and Dx-series parts, TCA0 can be pointed at pins 0 - 5 of any one port. TCA1 is far less flexible - there's a PB0-5 and a PC4-6 option in 48 and 64 pin parts with the 64-pin parts adding a PG0-5 and PE4-6 option, and an addition of each on the 64's.

All parts have at least one Timer/Counter Type B (TCB), often 2 or more. These are 16-bit timers that excel as utility timers. Millis is more accurate and more efficient running from these. They can generate event triggered pulses of a specified time, and do input capture in a bunch of ways. They are also the timers to use for Servo and Tone, for this reason. They lack prescaling options though - you get system clock, system clock divided by 2, or the clock prescaled by a TCA. They can also output 8-bit PWM on a single pin, if you must. On Dx and 2-series parts, 2 of these can be combined for 32-bit input capture.

tinyAVR 1-series and Dx-series parts have a single Type D timer, TCD0. It can be clocked from the EXT_CLK input, from the system clock, or from the system clock before the prescaler. On the Dx-series, it can also be clocked at 2, 3 or (unofficially) 4 times the unprescaled internal oscillator or and EXT_CLK via an on-chip PLL. It is rated for 48 MHz, but functions far higher at room temperature. In any event, TCD0 is asynchronous because of this clocking feature. This is a powerful feature if you need bizarro PWM for motor control or your homebreq DC-DC converter project (what, you don't build your own buck converters instead of using the dirt cheap just-add-inductor chips?). For all the rest of us, it's a right pain in the backside - You want to read the current timing count? Just write the software capture bit in the command register, and poll the status register to see when the current count has been synchronized to the main clock domains. Almost all settings are "enable protected" (that is, they can't be changed without briefly disabling the timer (and don't forget to check the enable ready bit in status when you weant to turn it back on). Because it's such a pain in the arse to use, it's the default timer for millis on megaTinyCore 1-series parts, because the other timers are easy to use and likely to be in demand, while people cross to the other side of the road when they see TCD0 walking in their direction).

Regardless - it's a 12-bit timer with a wierd two-part prescaler and 2 output channels which can each be directed to any of 4 pins (that is, you can have it generate 2 different duty cycles of the same frequency, and output each of them on 1 or more out of that set of 4 pins. Smaller tinies only have 2 of the pins, while on the Dx-series, It is supposed to support moving this around (between 4 different sets of pins), but that doesn't work in currently available parts due to a silicon bug). You can use it's advanced event generation capabilities ("PROGEVENT") with the event system and CCL peripheral to squeeze another 8-bit PWM channel out of it, albeit with far less choice of prescaling factors, on any CCL output pin. It can also react to events in many different ways (all optimized for specific power conversion and motor control applications). Oh, abnd you can have it skip a clock some cycles - resolution is 1/16th of a clock, if you need to get PWM at an exact frequency.

Full-speed timer roles

While, as on classic AVRs, there is some overlap between what the timers are good for here, there's significantly less. The TCD, if you have it, is ill-suited to anything other than PWM (though it can be used for timing, it's clearly not meant for it). TCA is the workhorse timer, meant mainly for pwm, though it can be used for other timing and counting tasks and has some niche roles there (notably, not only is it the only one that can count on event in 0/1-series parts, it is the only one that can generate compare output based on event counts higher than 256, unless you're piping the input in as an external clock on the dedicated pin for TCD or something weird like that), and the type B timers are used for all things timing that are not PWM.

RTC/PIT (the new WDT-interrupt)

Additionally, all parts have a 2-function RTC/PIT - the RTC lets you use a 32k internal oscillator or watch crystal to keep time in sleep modes - it's got both a compare and period registers, with interrupts and events on each, and a 15-bit prescaler, so the ticks can range from 30us to 1 second, and 16 bit resolution. It also has a selectable interrupt driven by one bit of ther prescaler, and bits 6 through 14 can drive event channels. It fills a role in sleep mode like the old WDT did - only, equipped with a crystal (or even without) it's more accurate; The PIT runs even in power-down sleep mode, giveing you an analog of the classic WDT-on-longest-period configuration.

Watchdog Timer

These parts also have a watchdog timer, but it is actually what it says on the package - it doesn't fire interrupts, all it will ever do is reset the chip if you dont reset it for too long. It can also be set to reset the chip if it is reset too soon, so that if it gets stuck in a tight loop where it keeps executing WDR it can get you out of that as well. You can still use it to trigger a software reset as before (slightly faster, if you set for windowed mode and then call watchdog reset in a loop), but these have a software reset function so you don't need that (though it is useful it you want to signal a different type of reset to the bootloader or startup code; for example, a watchdog reset will never run optiboot - optiboot exits by doing watchdog reset as always, and always jumps to app if it sees that). Best of all, the watchdog timer isn't forced on after any watchdog reset, sparing you from that headache.

Instruction Set (AVRe/AVRe+ vs AVRxt)

The classic AVR devices all use the venerable AVRe (ATtiny) or AVRe+ (ATmega) instruction set (AVRe+ differs from AVRe in that it has hardware multiplication and supports devices with more than 64k of flash; most classic ATmega devices are AVRe+, while classic tinyAVR is all AVRe). Modern AVR devices (with the exception of ones with minuscule flash and memory, such as the ATtiny10, which use the reduced core AVRrc) all use the latest iteration of the AVR instruction set, AVRxt. AVRxt has much in common with AVRxm (used in XMega parts) in terms of instruction timing - and in the few places where they differ, AVRxt is faster (SBIC, as well as LDD, and LD with predecrement, are all 1 clock slower on AVRxm vs AVRxt or AVRe), however AVRxt doesn't have the single-instruction-two-clock read-and-write instructions for memory access LAT, LAC, LAS, and XCH. The difference between subspecies of the AVR instruction set is unimportant for 99.9% of users - but if you happen to be working with hand-tuned assembly (or are using a library that does so, and are wondering why the timing is messed up), the changes are:

  • Like AVRe+ and unlike AVRe (used in older tinyAVR), these do have the hardware multiplication.
  • PUSH is 1 cycle vs 2 on classic AVR (POP is still 2)
  • CBI and SBI are 1 cycle vs 2 on classic AVR
  • LDS is 3 cycles vs 2 on classic AVR 😞 LD and LDD are still two cycle instructions.
  • RCALL and ICALL are 2 cycles vs 3 on classic AVR
  • CALL is 3 cycles instead of 4 on classic AVR
  • ST and STD is 1 cycle vs 2 on classic AVR (STS is still 2 - as any two word instruction must be)

Note that the improvement to PUSH can make interrupts respond significantly faster (since they often have to push the contents of registers onto the stack at the beginning of the ISR), though the corresponding POP's at the end aren't any faster. The change with ST impacted tinyNeoPixel. Prior to my realizing this, the library worked on SK6812 LEDs (which happened to be what I tested with) at 16/20 MHz, but not real WS2812's. However, once I discovered this, I was able to leverage it to use a single tinyNeoPixel library instead of a different one for each port like was needed with ATTinyCore (for 8 MHz, they need to use the single cycle OUT on classic AVRs to meet timing requirements, the two cycle ST was just too slow; hence the port had to be known at compile time, or there must be one copy of the routine for each port, an extravagance that the ATtiny parts cannot afford. But with single cycle ST, that issue vanished).