-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
e307fa6
commit d715b3e
Showing
5 changed files
with
103 additions
and
3 deletions.
There are no files selected for viewing
11 changes: 11 additions & 0 deletions
11
docs/2024/09/26/how-to-max-throughput-when-pulling-data-from-a-third-party-service.html
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
<!DOCTYPE html> | ||
<html lang="en"><head><meta charset="UTF-8" /><title>How to max throughput when pulling data from a third party service</title><meta content=" | ||
base-uri 'self'; | ||
form-action 'self'; | ||
default-src 'none'; | ||
script-src 'self'; | ||
img-src 'self'; | ||
font-src 'self'; | ||
connect-src 'self'; | ||
style-src 'self' 'unsafe-inline' | ||
" http-equiv="Content-Security-Policy" /><meta content="text/html; charset=UTF-8" http-equiv="content-type" /><link data-turbo-track="reload" href="/styles.css" rel="stylesheet" type="text/css" /><link href="/assets/favicon.png" rel="shortcut icon" /><script defer="defer" src="/toggle.js"></script><script defer="defer" src="/turbo.js" type="module"></script><meta content="width=device-width, initial-scale=1.0" name="viewport" /><meta content="same-origin" name="view-transition" /><meta content="A blog mostly about Clojure programming" name="description" /></head><body><header class="nav-sticky-top container-fluid"><nav class="container"><ul><li><div class="linkify" style="align-items:center;display:flex;"><img alt="portrait" height="40px" src="/assets/avatar.png" style="image-rendering:pixelated;padding:4px;" width="40px" /><h1 style="margin-bottom:0;">anders murphy</h1><a aria-label="Home" href="/"></a></div></li></ul><ul><li><a aria-label="Github" class="contrast no-chaos" href="https://github.com/andersmurphy"><svg class="icon" height="24" style="margin-bottom:6px;margin-top:6px;" viewBox="0 0 496 512" width="24" xmlns="http://www.w3.org/2000/svg"><linearGradient id="gradient-horizontal"><stop offset="0%" stop-color="var(--color-stop-1)"></stop><stop offset="33%" stop-color="var(--color-stop-2)"></stop><stop offset="66%" stop-color="var(--color-stop-3)"></stop><stop offset="100%" stop-color="var(--color-stop-4)"></stop></linearGradient><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg></a></li><li><a aria-label="RSS" class="contrast no-chaos" href="/feed.xml"><svg class="icon" height="24" style="margin-bottom:6px;margin-top:6px;" viewBox="0 0 16 16" width="24" xmlns="http://www.w3.org/2000/svg"><linearGradient id="gradient-horizontal"><stop offset="0%" stop-color="var(--color-stop-1)"></stop><stop offset="33%" stop-color="var(--color-stop-2)"></stop><stop offset="66%" stop-color="var(--color-stop-3)"></stop><stop offset="100%" stop-color="var(--color-stop-4)"></stop></linearGradient><path d="M2 0a2 2 0 0 0-2 2v12a2 2 0 0 0 2 2h12a2 2 0 0 0 2-2V2a2 2 0 0 0-2-2zm1.5 2.5c5.523 0 10 4.477 10 10a1 1 0 1 1-2 0 8 8 0 0 0-8-8 1 1 0 0 1 0-2m0 4a6 6 0 0 1 6 6 1 1 0 1 1-2 0 4 4 0 0 0-4-4 1 1 0 0 1 0-2m.5 7a1.5 1.5 0 1 1 0-3 1.5 1.5 0 0 1 0 3"></path></svg></a></li><li><div class="theme-toggle" id="toggle"><svg aria-hidden="true" class="icon theme-toggle__inner-moon" height="30" viewBox="0 0 32 32" width="30" xmlns="http://www.w3.org/2000/svg"><path d="M27.5 11.5v-7h-7L16 0l-4.5 4.5h-7v7L0 16l4.5 4.5v7h7L16 32l4.5-4.5h7v-7L32 16l-4.5-4.5zM16 25.4a9.39 9.39 0 1 1 0-18.8 9.39 9.39 0 1 1 0 18.8z"></path><circle cx="16" cy="16" r="8.1"></circle></svg></div></li></ul></nav></header><main class="container"><article><hgroup><h1>How to max throughput when pulling data from a third party service</h1><p><time datetime="2024-09-26T00:00:00+00:00">26 Sep 2024</time></p></hgroup><hr /><p>Say you have an app. When a user signs up to your app you need to pull large amount of data from a third party service. The faster you can do this, the more responsive your app can be, the less your user has to wait, the better the user experience. In this post I'll break down the key things to think about when approaching this problem.</p><h2 id="how_much_data%3F_how_long%3F">How much data? How long?</h2><p>First, get a feel for is how much data you will need to sync and how long it might take. As an example I'm going to draw from a recent project. On average for each user that connected to our app we needed to get the signature of their last 100 transactions and then pull the data for those transactions.<br /></p><p>We can get the transaction signatures in batches of 20 but we have to pull the transaction data for each transaction separately. </p><p>Each request takes 1 seconds. </p><p>If we do this all sequentially we need to make 5 requests to get the signatures and 100 requests to get the transaction data. </p><p>So 105 requests would take 105 seconds.</p><h2 id="don%27t_make_the_long_tail_wait_forever">Don't make the long tail wait forever</h2><p>The above assumes the average user has 100 transactions, what happen there's an outlier. Say a user with 1000 transactions? That would take 1050 seconds! 17 minutes. If that's a possibility you need a plan for informing the users they should come back later, or notify them when the results are finished. </p><p>Also think about what that would do to your system as a whole and the experience of other users if a single users is hogging all the requests.</p><h2 id="rate_limits">Rate limits</h2><p>So we've got an idea for how long things take if we run things sequentially. Next we need to find out the rate limit of the third party service. Let's say they say you can make 1000 requests a second. So the theoretical limit for how fast you can sync the average user is 10.5 seconds. </p><p>Keep in mind most services are at beast economical with the truth when it comes to their rate limits and/or leave key details out about their implementation. </p><h2 id="batch">Batch</h2><p>Before we dive into the complexity of getting max throughput check to see if there's any way to batch your requests. In our example if we could get the data for 10 transactions in a single requests, we could potentially go 10 times as fast. Sometimes this can be enough and you can skip all the complexity of concurrency.</p><h2 id="idempotency">Idempotency</h2><p>Now lets think about what happens when things go wrong and our syncs fails half way through the process and we have to restart the sync. Ideally, we don't want to have to redo a bunch of work or sync data we already have. This would eat into time and our rate limit. </p><p>The key thing here, is we don't want to request data we already have and ideally we want to write data as soon as we have it, or at least in the smallest batch sizes we can get away with without hammering performance.</p><p>A good rule of thumb is you want to be able to write a database query that gives you all un-synced transactions for a user.</p><h2 id="backoff_and_retry">Backoff and retry</h2><p>What happens when we exceed the rate limit? Ideally, the third party service returns a 429 and tells us when we can try again. If that's the case we need a mechanism for not crashing the job and trying again. Sometimes, the simpler approach, assuming our sync is idempotent is to have a catch up job that retries the sync later for that user. </p><p>Another thing to keep in mind is you don't want a single failure stopping all your other work in progress i.e if we fail to pull a single transaction we shouldn't stop all the other transactions from being pulled.</p><h2 id="global_concurrent_rate_limit">Global concurrent rate limit</h2><p>We need a mechanism that rate limits our requests, most languages will have a library that handles this. Ideally, it should thread safe so multiple threads can use it concurrently. See <a href='https://andersmurphy.com/2024/05/06/clojure-managing-throughput-with-virtual-threads.html'>this post for an implementation with semaphores in Clojure</a>.</p><h2 id="concurrency">Concurrency</h2><p>After doing the maths we've decided that we need some form of concurrency to get more throughput on our sync. Good thing we have a thread safe rate limiter.</p><p>Are threads expensive or cheap? If your using Go, Java or Clojure rejoice you have cheap real threads (parallelism). If your are using Ruby, Python, Lua or Node you'll have to use a combination async and/or queues and workers. Thankfully, in this case async is fine as requests are IO bound so mostly involve waiting rather than compute.</p><p>What requests can be done concurrently? In our example getting the signatures needs to be done sequentially as the results are paginated and you need the last signature of your current page to get the next page. However, getting the transaction data can be done concurrently as each transaction is independent. </p><p>Here, we effectively want to split our sequence of tasks into multiple concurrent tasks. <a href='https://andersmurphy.com/2024/05/06/clojure-managing-throughput-with-virtual-threads.html'>See this post again for an in Clojure</a></p><h2 id="unordered">Unordered</h2><p>Unordered concurrency can help minimizing latency by dealing with results as soon as they became available. It also helps you maximise data transformation/write throughput by saturating those resources. </p><h2 id="bottlenecks">Bottlenecks</h2><p>In a perfect world you want the third party service to be the bottleneck. But, this isn't always the case, sometimes it can be lack of concurrency or write speed to the database. This is where back pressure comes in. If your database is struggling to keep up you will need to mechanism to slow down the requests or your system might end up running out of resources. </p><p>In our case we found we could get much faster writes by reducing the concurrency when it came to inserts. We'd merge the results from all our requests into a single queue that would batch insert into the database rather than just hammering it with thousands of concurrent inserts.</p><p>Effectively, in simple terms, we had a queue of users jobs which would fan out to the maximum number of concurrent requests to saturate the rate limit and then fan in to a single queue to write to the database. </p><h2 id="sampling">Sampling</h2><p>If you are using a unordered concurrency/queue, i.e you can process results as soon as they are ready out of order, then sampling can make a big difference. By showing results to the user when you have 99% of the data rather than 100% of the data you can often dramatically increase the speed at which they see results. This is because sampling protects you from outliers in the third parties response time. </p><p>Say we make 5 request and they take 1s, 1s , 1s, 2s, 30s and we show the results after 5/5 requests complete, then the user will wait 30s. If we make the same requests but show the results after 4/5 requests complete, then the user will wait 2s.</p><p>Of course you can only do this in cases where you don't need all the data to show something useful to the user.</p><p>In our case sampling 99% of the results reduced the average wait time by 50%.</p><h2 id="don%27t_starve">Don't starve</h2><p>Make sure your thread safe rate limiter is FIFO to prevent starvation, if not you can end up in a situation where a user is waiting forever because their tasks are always at the back and never processed. </p><h2 id="conclusion">Conclusion</h2><p>Hopefully, this post provided a good list of things to think about when trying to maximise throughput when pulling data from third parties. Depending on the context you might only need to implement some of these. The most important one in my experience is making sure your syncs are idempotent as that mean you can handle partial sync and recover.</p></article><footer><p>© 2015-2024 Anders Murphy</p></footer></main></body></html> |
Oops, something went wrong.