-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add write every 10K messages #12
base: master
Are you sure you want to change the base?
Conversation
how did you pick ten thousand? I think a time based thing would make more sense, because that's the thing the users feel. 10 thousand messages could be lots or not much, it's hard to say. |
Totally arbitrary, I wanted something between 1 and
Any thoughts on how you'd like to see this implemented? |
Would be interesting to see a performance test of this. |
hmm, you must mean 700 megabytes of messages? @christianbundy hmm, I guess you'd have a timeout (unref it so it doesn't keep the node process going) that causes the write, and clear it if you do a write because it's in sync. Hmm, funny feeling that sometimes I've seen Hmm, also: maybe it would be best to fix it in async-single? and then call that every write? then flumeview-reduce can focus on just doing database stuff. If I recall correctly, the way I'm doing clearTimeout and setTimeout is a gonna be a bit slow there... |
This code looks good to me, seems like a smart improvement. |
@mixmix mean message size is about 700 bytes, because the most popular messages, contacts (follows), and votes (likes) are small. |
That would mean ~100 write calls over 700MB ... still ok
Could make it 20e3 and cut it down to 50. I guess another way to think about it is what the time-period is between saves.... but that might depend on the complement of indexes you're running (to my eye they seem to increase in step but don't know what's governing under the hood)
…On Sep 30 2019, at 12:03 pm, Dominic Tarr ***@***.***> wrote:
@mixmix (https://github.com/mixmix) mean message size is about 700 bytes, because the most popular messages, contacts (follows), and votes (likes) are small.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub (#12?email_source=notifications&email_token=AAUK3HWREJA4BIGG3GNLUMLQMEX3JA5CNFSM4IXVTSM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD74A4UQ#issuecomment-536350290), or mute the thread (https://github.com/notifications/unsubscribe-auth/AAUK3HSJBJ56DLVQJQGHZLLQMEX3JANCNFSM4IXVTSMQ).
|
@mixmix flumedb is ment to be generic, it's not just for ssb. That's why there is no "ssb-" at the start. |
Good point - forgot the general context
…On Sep 30 2019, at 1:11 pm, Dominic Tarr ***@***.***> wrote:
@mixmix (https://github.com/mixmix) flumedb is ment to be generic, it's not just for ssb. That's why there is no "ssb-" at the start.
also, to get the correct write number, also multiply by the number of reduces.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub (#12?email_source=notifications&email_token=AAUK3HTLEGV6SH5T7YD5TQLQME74JA5CNFSM4IXVTSM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD74CFQQ#issuecomment-536355522), or mute the thread (https://github.com/notifications/unsubscribe-auth/AAUK3HQ3V75I6QWS4LZ3Z33QME74JANCNFSM4IXVTSMQ).
|
@mixmix see the organizational patterns doc https://hackmd.io/IM5_tWIfSFuNoe3jtrjrtQ#well-thought-out-good-ideas |
Been thinking about this and I should probably write it down. Are there any simple gradient descent patterns we could use here? I'm sure there are much better algorithms, but I'm imagining an implementation where you do batches of:
I'm sure we have better tools at our disposal, but I'd imagine that any solution that worked here would also work in flumedb/flumeview-level#15. |
that's a good idea. hmm, we've had this type of problem a bunch of places... You could approach this by setting a relative value for latency and throughput. (That value will be hard to define of course) But also, you could just try to maximize just throughput you'd send small packets, but not if there are many of them... oh hang on, lets say that each packet has a 10 byte overhead. but how do you decide it's okay to send 1 packet a second, but not one packet a millisecond? hmm, if you had a time box, then you'd maximize throughput within that time... but wouldn't that just be delaying until the end of that box? (not ideal) If you know the parameters of the system, say, in tcp the maximum transfer unit is 1500 bytes, so no reason to buffer bigger than that... but sometimes you don't know what the upper layers in the system will do. (prehaps there is an encryption layer or other framing that adds more overhead...) Okay, maybe the right idea is to try and look at prior art, https://en.wikipedia.org/wiki/Nagle%27s_algorithm |
sorry, brain splurg there |
turns out that nagels algorithm is really simple, we do a write and if a unacknowledged packet has already been written, buffer that write until you get the ack. I already used that pattern with https://www.npmjs.com/package/pull-write if the write hasn't called back yet, buffer until it does. (pull-write also has a fixed max) the nice thing about nagel is that it doesn't actually require a model of overhead, but that's also the problem. I found a recent comment from nagel himself
here is another one: https://news.ycombinator.com/item?id=9050645 but this time he just says disable delayed ack. |
this is also interesting: https://en.wikipedia.org/wiki/TCP_tuning
with a half second latency nagel would cause transmission to go very slowly, of course... |
https://en.wikipedia.org/wiki/Bufferbloat some modern network devices have large buffers, but that means tcp doesn't know it's saturated, because of extra buffering it's reliable and doesn't slow down, so it just sends more data, but that takes longer for the buffer to drain. how to test for buffer bloat: https://www.bufferbloat.net/projects/bloat/wiki/Tests_for_Bufferbloat/ do |
super interesting rabbit hole here: http://blog.cerowrt.org/post/net_neutrality_customers/ |
Folks, can we just merge this? You seem to all agree that for any practical purpose this works and is a net improvement for whoever is using the library. The only real argument against this that I have seen is from @dominictarr, that this might impact non-ssb applications because the value chosen is too low and it causes tons of writes'. |
Resolves #11
This could probably be much more sophisticated, but given the fact that I have 700 million messages I don't think a few more save points would be too terrible.