Skip to content

Recent pgmspace.h broke backward compability for pgm_read_xxx() macros #4572

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bperrybap opened this issue Mar 26, 2018 · 13 comments · Fixed by #4619
Closed

Recent pgmspace.h broke backward compability for pgm_read_xxx() macros #4572

bperrybap opened this issue Mar 26, 2018 · 13 comments · Fixed by #4619

Comments

@bperrybap
Copy link

There were some updates to pgmspace.h in mid 2017 that changed the way pgm_read_byte() and pgm_read_word() worked and were implemented.
When the dust settled, the final implementation breaks some use cases as it is no longer fully backward compatible with the AVR version.
The AVR version did not require the address argument to be any particular type and neither did the esp version until these recent updates.
This issue can be traced back to the commit on April 17, 2017
0b67266#diff-2b6ff2e1583a8a247dc1070d846b5160
That is the first version of code where the addr argument must be a specific type.
Then after that update, future updates built on that and ended up with the macros pgm_read_byte() and pgm_read_word() simply calling functions which expected an argument type.
This can be seen in the june 5 commit:
c52b0da#diff-2b6ff2e1583a8a247dc1070d846b5160
with these two AVR compatibility macros:

// Make sure, that libraries checking existence of this macro are not failing
#define pgm_read_byte(addr) pgm_read_byte_inlined(addr)
#define pgm_read_word(addr) pgm_read_word_inlined(addr)

While those macros solved a big concern in the discussion of having the needed macros to test for PROGMEM support at compile time, they break AVR compatibility by not casting the addr argument.
This means that any code that is not using the argument type expected by pgm_read_byte_inlined() or pgm_read_word_inlined() (which is a void *) will fail from a type mismatch.
There is quite a bit of code out there that was not using void * for this argument and some of that code was previously working on esp until these updates were done.
A simple solution is add a cast.

// Make sure, that libraries checking existence of this macro are not failing
// and support any type for addr argument for backward compatibility
#define pgm_read_byte(addr) pgm_read_byte_inlined((const void*)addr)
#define pgm_read_word(addr) pgm_read_word_inlined((const void*)addr)
@bperrybap
Copy link
Author

Actually the cast needs to be like this:

// Make sure, that libraries checking existence of this macro are not failing
// and support any type for addr argument for backward compatibility
#define pgm_read_byte(addr) pgm_read_byte_inlined((const void*)(addr))
#define pgm_read_word(addr) pgm_read_word_inlined((const void*)(addr))

@igrr
Copy link
Member

igrr commented Apr 7, 2018

@bperrybap do you have a code snippet which fails to compile with the current implementation? There's a PR now open which adds the casts, and I wanted to make sure we have a test for this.

@bperrybap
Copy link
Author

Not sure why a test is needed as the issue and the code to fix it is seems pretty obvious.

Here is some sample test code that should compile without issues or warnings.
It fails to compile without the casting patch.

#define DATA_ADDR 100

int data_addr = DATA_ADDR;

uint8_t *data8_ptr = (uint8_t*) DATA_ADDR;
uint16_t *data16_ptr = (uint16_t*) DATA_ADDR;

uint8_t data8[4];
uint8_t data16[4];

void setup(void)
{
	data8[0] = pgm_read_byte(DATA_ADDR); // #define
	data8[0] = pgm_read_byte(data_addr); // int
	data8[0] = pgm_read_byte(data8_ptr); // uint8_t *
	data8[1] = pgm_read_byte(&data8_ptr[1]); // uint8_t *
	data8[2] = pgm_read_byte(data8_ptr + 2); // uint8_t * pointer math

	data16[0] = pgm_read_word(DATA_ADDR); // #define
	data16[0] = pgm_read_word(data_addr); // int
	data16[0] = pgm_read_word(data16_ptr); // uint16_t *
	data16[1] = pgm_read_word(&data16_ptr[1]); // uint16_t *
	data16[2] = pgm_read_word(data16_ptr + 2); // uint16_t * pointer math
}
void loop(void){}

@igrr
Copy link
Member

igrr commented Apr 8, 2018

The purpose of having a test is to prevent future regressions, should someone need to modify pgm_read_X macros/functions again.

About your test cases. These compile with the current code and do not produce errors or warnings:

  data8[0] = pgm_read_byte(data8_ptr); // uint8_t *
  data8[1] = pgm_read_byte(&data8_ptr[1]); // uint8_t *
  data8[2] = pgm_read_byte(data8_ptr + 2); // uint8_t * pointer math

  data16[0] = pgm_read_word(data16_ptr); // uint16_t *
  data16[1] = pgm_read_word(&data16_ptr[1]); // uint16_t *
  data16[2] = pgm_read_word(data16_ptr + 2); // uint16_t * pointer math

These do not compile, and i don't think they should. The reason that such code compiles for AVR is not strong enough justification to allow silent casts from integer to pointer type.

  data8[0] = pgm_read_byte(DATA_ADDR); // #define
  data8[0] = pgm_read_byte(data_addr); // int
  data16[0] = pgm_read_word(DATA_ADDR); // #define
  data16[0] = pgm_read_word(data_addr); // int

For example, the following is a clear usage error, and should not compile:

const uint8_t pgm_array[] PROGMEM = {1, 2, 3, 4};
void setup() {
  uint8_t data = pgm_read_byte(pgm_array[1]); // user forgot the address-of operator
}

but with the proposed change, this would compile and crash.

I don't think that applying pgm_read_byte/pgm_read_word to integer constants, as in your example, is a valid use case either. Where would such constant come from, other than from a declaration of variable with PROGMEM attribute? And if we have a variable with PROGMEM attribute, then (a pointer to) it can be silently converted to const void*.

@bperrybap
Copy link
Author

The example code was very simplistic intended to test against the old esp8266 core, vs the latest, vs the PR. While it does demonstrate the real-world argument type issue, it should not be taken as an example of the actual code that compiled and worked on AVR and used to work on esp8266 but fails now fails to compile on the latest esp8266 core.
I added some of the pointer math examples to catch a missing set of parens around the addr parameter in the pgm_read_xxx() macros as shown in my initial post. Without the parens you will get warnings.

The actual code that I have seen failing is part of larger projects that are doing much more complex operations but end up creating/using integer types that get passed to pgm_read_xxx() functions.
It works on AVR and it used to work on esp8266 until this latest core.

The main issue that that integer types can be passed in on AVR and older esp8266 cores but fail to compile on the latest esp8266 core.
The macros in AVR do explicit casts:

#define pgm_read_byte_near(address_short) __LPM((uint16_t)(address_short))
#define pgm_read_byte(address_short)    pgm_read_byte_near(address_short)

You can see the forced cast and there is no type checking.
And because of this cast you really can't force users to use pointers without breaking compatibility with existing code..
That simply wasn't part of the original AVR progmem API and because of this not all existing code used pointers.

In terms of where would these constants come from other than progmem variables?....
In some cases users are fetching addresses from progmem and moving them into integer types and then using the integer type to fetch other data.
Definitely not good a practice but that is what they did and it used to work.

But here is one that I think users might expect to work: (and it did in the past)
In some real-world cases they may nest the pgm_read_xxx() functions.
For example they may use pgm_read_byte(pgm_read_dword(address))
To get data from a table that is referenced by another table of addresses stored in progmem.
This will fail with the new core but worked on the older core and works again with the casts.
Here is a line of code from real life example of something from a graphic library that used to work on the older esp8266 cores but doesn't work with the latest core: (but works again with the casts)

width = pgm_read_byte( pgm_read_dword( &(fontdata[textfont].widthtbl ) ) + uniCode-32 );

Yes they could fix it with a cast, but they didn't have to do that in the past.

So the real issue is that integer types used to work but now they don't.
To force using pointers in the esp8266 version of the AVR progmem read_pgm_xxx() functions breaks backward compatibility with some existing code since neither the AVR API nor the older esp8266 code required arguments to be pointers.


Another thing related to pgmspace compatibility (but totally unrelated to this issue) is that pgmspace.h should really be down in a directory called avr. (other cores put it there for compatibility)
This is because on REALLY OLD IDEs pgmspace.h was not included so some sketches/libries included it.
But they would do so as:

#include <avr/pgmspace.h>

This breaks on the esp8266 core.
I should create another issue for this as it is unrelated to this pgmspace compatibility issue.

@bperrybap
Copy link
Author

I do understand the dilemma. It is a difficult question of what is more important.
Full backward compatibility with existing code or argument type checking?
One or the other has to take priority.
There is no easy answer since it is not possible to do both.
The esp8266 core can move towards the type checking direction, but it means that some existing code (that used to work on the esp8266 core) will no longer work on the esp8266 core until it is modified.

@igrr
Copy link
Member

igrr commented Apr 8, 2018

My opinion is that we should maintain compatibility in all reasonable use cases. Automatic casting of integer types to pointers is not reasonable, in my opinion.
That's just my opinion; i welcome other maintainers to comment.
/cc @devyte @d-a-v @earlephilhower.

The issue you mention (nested pgm_read calls) seems to be already fixed in the library, by the way: Bodmer/TFT_eSPI#15.

@d-a-v
Copy link
Collaborator

d-a-v commented Apr 8, 2018

While I made the PR for this in the sake of backward compatibility, @igrr came with very valid arguments.
Plus this is not clever of me, after having in the past forced people to update their code because of lwip2 coming, to propose here a way to keep bad habits.
Priority should indeed go to argument type checking.

@devyte
Copy link
Collaborator

devyte commented Apr 8, 2018

I also agree with @igrr on this. It's a similar case as we already had with the min/max macros. We switched to the std functions, and issues rained due to other code not compiling anymore. On inspection, all those cases were due to bad habits, and they had to be fixed, which was a Good Thing.
Here, silent casts should never be allowed, it is a bad practice that allows runtime bugs which turn out to be very hard to find. If other code needs to be fixed to remove these bad habits, so be it.

@bperrybap
Copy link
Author

Yes the specific example code I posted was from one of Bodmer's libraries. I talked to him about the issue a few weeks ago. But this issue was actually in his older version of the library, TFT_ILI9341_ESP and it came up because a few folks were trying to run one of his Arduino sketches that used the older library, that was not ported to his new version of the graphics library (he has just recently done that as well).
That said, this is isn't the only existing code that breaks with the newer pgmspace.

When I was younger, I did have a much more rigid approach to trying to force people to do the right thing, I've mellowed out over the decades, and while I do agree that supporting bad/sloppy programming habits is definitely not ideal, I do have much more sympathy to supporting them when it comes to trying maintain an existing programming interface behavior in order to support previously existing code.
Especially when it comes to Arduino code, which tends to have much less technical users who are often not capable of fixing even very simple issues.

While I'm all for good type checking to try to catch unintentional coding errors, this particular case does seem a bit special and a bit less clear as to which way to go given it is emulating a proprietary API defined by an alternate tool set.
My assumption/belief has always been that progmem and associated functions (as defined by pgmspace.h header file) is a proprietary AVR thing (not used on other processors) that was invented by the AVR gcc developers given the limitations in the AVR processor combined with C not supporting multiple address spaces.
As such any other Arduino processor core that chooses to support AVR progmem in their core is really simply trying to emulate the AVR progmem types, functions, and functionality provided by avr libC pgmspace.h macros/functions in order to support porting of existing AVR code that used them.
The question becomes just how far to go when trying to emulate this AVR specific functionality.

As I mentioned earlier, one of the areas of compatibility already currently not supported in the esp8266 core is the location of the pgmspace.h header. It should be in a subdirectory called avr for maximum compatibility with existing older AVR Arduino code that included <avr/pgmspace.h>
(which is where other Arduino cores place it)

But some of the functional/compilation issues around progmem stuff is in the "official" AVR pgmspace.h code itself.
The AVR gcc supplied pgmspace header has always been a bit "sloppy".
It has had issues with types and declarations and has had them for close to 10 years. Even they have had to do some backtracking through the years and even break lots of existing AVR code since the way they were doing things was not compatible with newer versions of avr-gcc and the more rigid type checking of C++.
However, they did attempt to provide a way to continue to support older existing code by providing a special define that can be turned on to enable the older types if you REALLY REALLY want/need it to work the old way.
Not "out of the box" support for the the previous behavior, but a simple way to re-enable the old behavior should it be needed.

Perhaps having some sort of conditional in the esp8266 pgmspace.h that could be enabled is a reasonable compromise here?
Maybe it could use the __PROG_TYPES_COMPAT__ for the backward compatibility, just like the AVR pgmspace.h uses, to enable better AVR pgmspace compatibility.

I wouldn't be upset if after the discussion about this issue, the decision is made to not add the casts, and change the status to "won't fix" - knowing the implications of that decision.

I'm just trying to encourage the discussion to make everyone aware that the recent changes do break some existing code, including code that used to work on previous versions of the esp8266 core.

@Makuna
Copy link
Collaborator

Makuna commented Apr 8, 2018

I think Ivan has stated a very compelling case with "Automatic casting of integer types to pointers is not reasonable". Novice or advanced programmers should not be doing this unless they know exactly what they are doing.

@bperrybap
Copy link
Author

bperrybap commented Apr 8, 2018

I think Ivan has stated a very compelling case with "Automatic casting of integer types to pointers is not reasonable". Novice or advanced programmers should not be doing this unless they know exactly what they are doing.

I don't agree that automatic casting is necessarily unreasonable - in this particular case I think it is not so clear.

IMO, the progmem stuff has always been very ugly and messy, but the entire point of having pgmspace.h in the esp8266 core is to emulate the functionality/behavior of the AVR proprietary progmem stuff, and the latest esp8266 core doesn't fully do that for some real-world use cases, while the older cores did.
And that is what I'm pointing out.

The primary issue is that it isn't possible to have type checking and full compatibility with the sloppy AVR pgmspace implementation.

In some cases advanced programmers knew exactly what they were doing and the new esp8266 pgmspace code breaks their code that used to work on the esp8266.

Also consider that pgm_read_byte() and pgm_read_word() are not functions.
There was a desire to make them functions instead of macros, but this is example of one of many odd and ugly things that must be done for compatibility reasons - in this case they are macros since some existing code uses them as conditionals to detect their existence.

It could be decided that type checking is more important than full AVR pgmspace progmem emulation, to ensure better and more reliable code moving forward, and that is ok with me, but then it brings up what I mentioned earlier and that is the possibility of creating a backward compatibility define just like what the AVR did guys when they broke their original pgmspace behavior - perhaps even a define with the same define name the AVR guys used.

@igrr
Copy link
Member

igrr commented Apr 9, 2018

#4619 now adds optional casts, gated by preprocessor macro. I'm okay with that solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants