-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove use of utf8_encode (deprecated in PHP 8.2) #598
base: MOODLE_310_STABLE
Are you sure you want to change the base?
Remove use of utf8_encode (deprecated in PHP 8.2) #598
Conversation
I'm doing some more testing on this locally so it might be best to hold off doing anything with this until I finish that next week (will comment again). Haven't found anything wrong, just realised that I don't think my previous testing actually ran the code in question :) |
Hi @sammarshallou, HTH, |
Wow, this change actually does break it. Ooops. I made a file called こんにちは.jpg Before change it sends me the file with this header: Content-Disposition: inline;filename="ã��ã��ã�«ã�¡ã�¯.jpg" The URL for this file included this parameter: ?response-content-disposition=inline%3Bfilename%3D%22%C3%A3%C2%81%C2%93%C3%A3%C2%82%C2%93%C3%A3%C2%81%C2%AB%C3%A3%C2%81%C2%A1%C3%A3%C2%81%C2%AF.jpg%22 As you can see this is not really valid but it appears to work (Firefox at least shows the right name in header). After change it is like this: ?response-content-disposition=inline%3Bfilename%3D%22%E3%81%93%E3%82%93%E3%81%AB%E3%81%A1%E3%81%AF.jpg%22 And results in error:
So, it looks like the reason the current code works is that by calling utf8_encode it is treating UTF-8 as if it is a longer ISO-8859-1 string (because multiple bytes per character from original UTF-8, now counted as single ISO-8859-1 characters), and then encoding those single characters as UTF-8. Later on, something else converts those characters into URL-encoded format. Then browsers receive the header with ISO-8859-1 characters that are actually UTF-8 and decides to play nice and turn it into UTF-8, because there isn't a filename*. The easiest fix is probably to use a non-deprecated function to do the same hideous encoding (UTF-8 broken into single bytes and treated as ISO-8859-1 then turned to UTF-8) then it will not change behaviour. Might be nice to do filename* as well though... |
44e4ac9
to
9e0101e
Compare
Our test server deployment system chose today to throw a massive wobbly, so I haven't actually been able to test my fix yet, but I thought I'd push it anyway. I'll update once properly tested. Note: Needs testing for both CloudFront (which we actually use) and S3 (which we don't but I guess I can switch the setting) URLs, as the result of this function is used for both. |
I got roped in to fix deployment, so I've now discovered that the recommended approach with filename* doesn't work because S3/CloudFront still won't let you use non-ISO characters.
So, the best option is to go back to the original hack but with a comment to explain it, and using a non-deprecated function for the charset conversion (moodle core_text API). Testing this now. |
9e0101e
to
aeea48d
Compare
Tested as follows:
In the case of steps 2 and 3 it worked before, this was just verifying that nothing changed. As I only used a different function to do the same thing, wasn't really expecting it to. So in summary, I've actually tested this now and will deploy to our live release in a few weeks. |
To go with my other PHP 8.2 fix PR #586...
There is a use of deprecated function utf8_encode. But, the way it's used appears to be entirely pointless - it is calling utf8_encode to create a header based on a string that already came from a header which should already be in UTF-8 (or possibly ISO-8859-1, but anyway).
@marxjohnson already pointed out this was unnecessary in #455
There do seem to be some problems related to S3 and character encoding re. Content-Disposition but I find it hard to believe that calling utf8_encode would help - in fact it sounds like UTF8 isn't technically allowed there and the right answer seems to be something very complicated with URL-encoding and a filename* parmeter as well a filename, according to this blog https://engineering.resolvergroup.com/2022/02/aws-s3-utf-8-content-disposition/