-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDK repeatedly retries bad requests #126
Comments
@izaaz It seems like |
Thanks for raising this issue @theolampert - We are taking a look at this |
For me the memory overload is not gradual...it's very quick. After about 30 to 45 seconds after launch I get a massive burst of these and then they don't stop.
Some stuff is going bonkers. |
We have a repro for this issue. We'll keep this thread posted. |
@izaaz When do you expect this issue to be resolved? |
@bejewel-sangwon - there is one other issue that is currently being worked on and this will be worked on right after that. ETA is about 2 weeks from now. This should only be happening if the client is sending a bad request. Most likely because of a bad userid/device id. Can you please check if you are sending device ids/user ids < 5 characters long? If yes, can you make sure the minIdLength property is set correctly in the config? |
My user ids are always |
I actually am seeing events in a bunch of
|
user id being null is ok from Amplitide's perspective. A request to amplitude needs either a user id or device id or both. We only check for length when it's not null. |
I managed to catch in the debugger today that randomly I get 400's coming back from the backend with the error "Invalid API key". And this is in the same app instance where it's been previously happy uploading events. Then the backend decides the API key is invalid. It's at this point that the contents of the (And as a side note: Edit: I've confirmed that when this happens, Edit: I've discovered that I have something that posts an event to the Amplitude SDK before Amplitude is initialized, which causes |
Yeah, so the workaround for this problem is to not try to post events before the Amplitude SDK is initialized. LOL |
The problem with this issue is that even if a trigger occurred due to an incorrect user_id, the event does not disappear but continues to accumulate like zombies, eventually causing the app itself to crash. If it were an incorrect event, it should have disappeared from the list, but it keeps accumulating, resembling a DDos attack with a huge number of requests, eventually overwhelming the app until it crashes. I hope there will be a prompt solution. 🙏 I will also attach a video showing the related logs. default.mp4 |
@bwised Thanks for bringing that to our attention too. @bejewel-changkyu Looking at your log, it looks like your event type is empty? |
@theolampert @bwised @bejewel-changkyu I've landed a fix for the issue, would it be possible to verify this is working on your end? It may take some time for affected devices to stop sending requests, but they should eventually converge to the expected behavior. Thanks! |
@crleona Internally this appears to be working for us, we'll release tomorrow and let you know if the issue is completely resolved. |
Thanks @theolampert, this is now released in 1.4.1. Please reopen this issue if you encounter any issues with the release. |
@crleona Hey, testing this internally, unfortunately we're seeing the same issues with 1.4.1 – @schurigeln will reach out later today with more details and screenshots. |
It seems to still be related to using user IDs that are below a certain length, logging out of that particular account works correctly. There are then still two issues here, the backend has an undocumented minimum user ID length and the SDK is still able to get itself into a state in which it continuously retries bad requests. |
@theolampert we'll update the docs to make this more explicit. Wrt the SDK retrying - currently, the response from the server indicates the first event that had bad data and the SDK drops that event and retries the rest. We are working with the backend team to get the full list so that the SDK can drop all bad requests in one go and not have to retry often. The fix that was pushed should at least make sure you end up with too many duplicates event locally. It still has a slow draining issue which we'll address soon. From a principle perspective, we've always leaned towards not dropping any data on the SDK. Which is why we don't drop the data when we notice that the user id length is less than 5. Keeping this ticket open while we get both of these resolved. |
Any update here? We're probably going to switch away from Amplitude if this doesn't get resolved asap |
@kishinmanglani we consider this solved on our end in the latest versions, are there any particular issues you are still seeing? |
Network requests fail with a 400 error code referring to the length of
user_id
ordevice_id
. Both of these values appear to be set in the request data so I'm unsure why this fails. The client will then repeatedly retry this, in the sampled screenshot it's sent around 800 network requests.The client will also repeatedly call flush and fill up it's local event cache, in my case the size of this cache is around ~600mb, my colleague's is a few gb. This will peg CPU usage and gradually increase memory usage as well ad disk writes.
This is also occuring in production and reported by by performance monitoring we have in Sentry.
There's a few things going on here but the main issue I see is a bug in the retry logic of the client that I haven't been able to track down.
The text was updated successfully, but these errors were encountered: