diff --git a/ContributionGuidelines.md b/ContributionGuidelines.md index 19d3bbea..d2175c95 100644 --- a/ContributionGuidelines.md +++ b/ContributionGuidelines.md @@ -234,18 +234,18 @@ Sometimes codesize and performance are in opposition. How do you know which is m ### Constants in PROGMEM and different cores Code is constantly ported between DxCore and megaTinyCore. You need to be aware of one particularly important difference between parts with 32k of flash or less and those with more. With 32k of flash or less, all flash is mapped, and you do not have to (and should not) use the PROGMEM keyword for a table of constants which you want to stay in flash - the compiler will automatically know to leave the table in flash, and it can be accessed normally. But when the flash is larger, the compiler cannot do that, because only one section of the flash (defaulting to the highest one) is mapped at any given time, and that can be changed (though it rarely is). There are three ways to deal with this: - * Use PROGMEM keyword and pgm_read_x_near() everywhere. This initially looks to be of equal performance - lpm is 3 clocks, ld is 2, but you have to add one because it's talking to NVM. But in fact, it is slower in any case more complicated than that; for example, consider looping over a 10 element array, reading each value in sequence. If it's located in memory mapped flash, it will likely be accessed with ld X+ or ld Z+, taking 3 clocks per byte. If it's located in PROGMEM and accessed using pgm_read_byte_near(), although there is an lpm Z+ instruction, pgm_read_byte_near() can't use it - pgm_read_byte_near() is just a macro that gets transformed into a tidbit of asm that executes lpm Z. So it would turn into `ld rxx, Z; adiw r30, 1 ld rxy, Z; adiw r30, 1;` For n bytes read like that, it will take 3n + 2(n-1) or 5n-2 clocks, instead of 3n clocks (not counting the time to set the pointer up, which is the same for both methods). Or consider a one-off access to a byte; if it's in mapped flash, this can be done in 4 clock cycles and two words - the read gets rendered as lds, which runs in 3 clocks + 1 because nvm. If it's in PROGMEM, and is being accessed with pgm_read_byte_near() the address must be loaded into the Z register (2 words 2 clocks - or up to 6 words and 8 clocks if Z is already in use, counting cleanup), and then lpm executed (3 clocks, 1 word). For a total of 5-11 clocks and 3-8 words vs 4 clocks 2 words. See why the mapped flash is a big deal now? - * **Not permitted, will not be merged in this form** - Declare the array as PROGMEM_MAPPED. This places it in the mapped section of flash **provided the user does not change what that is**. If they do, they'll get garbage from elsewhere in the flash when they try to read their table, hence this isn't acceptable in the core. Application code is welcome to do this, but only because in that case the user is in control of their code, and if they are using PROGMEM_MAPPED, and changing FLMAP, they deserve what they get, and have violated an explicitly documented prohibition (see Ref_PROGMEM.md). Currently tinyNeoPixel does this on DxCore, but that will not be the case after a planned enhancement for DxCore is implemented. This enhancement will do the following: - * Add a menu to tools that will default to "No", allowing to specify whether changing FLMAP is permitted. - * If "No", we will lock the FLMAP bits before any user code runs, and #define PROGMEM_MAPPED to direct code to that section of flash, and #define FLMAP_LOCKED. - * If "Yes", we will not lock the FLMAP bits. PROGMEM_MAPPED will not be defined. - * There may one or more additional options - "No, but set FLMAP to section __" - the only time I see this being useful is on the 128k parts when address-constrained spm-from-app is enabled, and you want the memory mapped section to NOT overlap with where you can write to from the app (usually you want that overlap). This will also define PROGMEM_MAPPED appropriately and FLMAP_LOCKED. - * A solution that will be sound now and acceptable in the core or libraries without compatibility issues thus is: - * If __AVR_ARCH__ == 103, then the part has fully memory mapped flash. Just declare the variable const and access normally, it won't be copied to ram like it would be where __AVR_ARCH__ != 103. - * If FLMAP_LOCKED is defined, this change is implemented, and you are on a part has only partially mapped flash, but the user has not asked for FLMAP to be mutable, so it will be locked while app code is running. PROGMEM_MAPPED will be defined, and you can declare with PROGMEM_MAPPED and access them without the pgm_... macros (remember that this will result in flash with a big hole in the middle, and may cause programs to return inaccurate flash sizes during compilation and upload). - * If PROGMEM_MAPPED is defined but FLMAP_LOCKED is not, this change has not been implemented yet, so FLMAP is mutable. You cannot safely use PROGMEM mapped in core or library code - that code could be used by people who change FLMAP, and it would then break horribly; you must declare the array PROGMEM and access with pgm_... macros. Note that this does not apply to your application, because there you control the application, and if you use PROGMEM_MAPPED and change FLMAP, you will get exactly what you deserve - severe breakage, as you have violated the rules of the core. - * If neither PROGMEM_MAPPED not FLMAP_LOCKED are defined, then either it's a classic AVR, or it's a modern AVR where __AVR_ARCH__ != 103, and the user has told us that they intend to change FLMAP. Either way, you need to do the same thing - PROGMEM and pgm_... macros. - * Any code going into either core (unless it is specific to tinyAVR only and hence we know it won't be ported, and I agree with you about that), - as core or library - must either use only PROGMEM and pgm_... macros, or must handle all cases appropriately using #ifs to conditionally compile the appropriate version, as shown below. +* Use PROGMEM keyword and pgm_read_x_near() everywhere. This initially looks to be of equal performance - lpm is 3 clocks, ld is 2, but you have to add one because it's talking to NVM. But in fact, it is slower in any case more complicated than that; for example, consider looping over a 10 element array, reading each value in sequence. If it's located in memory mapped flash, it will likely be accessed with ld X+ or ld Z+, taking 3 clocks per byte. If it's located in PROGMEM and accessed using pgm_read_byte_near(), although there is an lpm Z+ instruction, pgm_read_byte_near() can't use it - pgm_read_byte_near() is just a macro that gets transformed into a tidbit of asm that executes lpm Z. So it would turn into `ld rxx, Z; adiw r30, 1 ld rxy, Z; adiw r30, 1;` For n bytes read like that, it will take 3n + 2(n-1) or 5n-2 clocks, instead of 3n clocks (not counting the time to set the pointer up, which is the same for both methods). Or consider a one-off access to a byte; if it's in mapped flash, this can be done in 4 clock cycles and two words - the read gets rendered as lds, which runs in 3 clocks + 1 because nvm. If it's in PROGMEM, and is being accessed with pgm_read_byte_near() the address must be loaded into the Z register (2 words 2 clocks - or up to 6 words and 8 clocks if Z is already in use, counting cleanup), and then lpm executed (3 clocks, 1 word). For a total of 5-11 clocks and 3-8 words vs 4 clocks 2 words. See why the mapped flash is a big deal now? +* **Not permitted, will not be merged in this form** - Declare the array as PROGMEM_MAPPED. This places it in the mapped section of flash **provided the user does not change what that is**. If they do, they'll get garbage from elsewhere in the flash when they try to read their table, hence this isn't acceptable in the core. Application code is welcome to do this, but only because in that case the user is in control of their code, and if they are using PROGMEM_MAPPED, and changing FLMAP, they deserve what they get, and have violated an explicitly documented prohibition (see Ref_PROGMEM.md). Currently tinyNeoPixel does this on DxCore, but that will not be the case after a planned enhancement for DxCore is implemented. This enhancement will do the following: + * Add a menu to tools that will default to "No", allowing to specify whether changing FLMAP is permitted. + * If "No", we will lock the FLMAP bits before any user code runs, and #define PROGMEM_MAPPED to direct code to that section of flash, and #define FLMAP_LOCKED. + * If "Yes", we will not lock the FLMAP bits. PROGMEM_MAPPED will not be defined. + * There may one or more additional options - "No, but set FLMAP to section `__`" - the only time I see this being useful is on the 128k parts when address-constrained spm-from-app is enabled, and you want the memory mapped section to NOT overlap with where you can write to from the app (usually you want that overlap). This will also define PROGMEM_MAPPED appropriately and FLMAP_LOCKED. +* A solution that will be sound now and acceptable in the core or libraries without compatibility issues thus is: + * If `__AVR_ARCH__` == 103, then the part has fully memory mapped flash. Just declare the variable const and access normally, it won't be copied to ram like it would be where `__AVR_ARCH__` != 103. + * If FLMAP_LOCKED is defined, this change is implemented, and you are on a part has only partially mapped flash, but the user has not asked for FLMAP to be mutable, so it will be locked while app code is running. PROGMEM_MAPPED will be defined, and you can declare with PROGMEM_MAPPED and access them without the pgm_... macros (remember that this will result in flash with a big hole in the middle, and may cause programs to return inaccurate flash sizes during compilation and upload). + * If PROGMEM_MAPPED is defined but FLMAP_LOCKED is not, this change has not been implemented yet, so FLMAP is mutable. You cannot safely use PROGMEM mapped in core or library code - that code could be used by people who change FLMAP, and it would then break horribly; you must declare the array PROGMEM and access with pgm_... macros. Note that this does not apply to your application, because there you control the application, and if you use PROGMEM_MAPPED and change FLMAP, you will get exactly what you deserve - severe breakage. + * If neither PROGMEM_MAPPED not FLMAP_LOCKED are defined, then either it's a classic AVR, or it's a modern AVR where __AVR_ARCH__ != 103, and the user has told us that they intend to change FLMAP. Either way, you need to do the same thing - PROGMEM and pgm_... macros. +* Any code going into either core (unless it is specific to tinyAVR only and hence we know it won't be ported, and I agree with you about that), as core or library - must either use only PROGMEM and pgm_... macros, or must handle all cases appropriately using #ifs to conditionally compile the appropriate version, as shown below. ```c #if __AVR_ARCH__ == 103