-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pluggable encoders and decoders for tagged values #3
Comments
ATM cbor.js ignores all tagged vales and lets the user handle the tags. IANA lists many other tags which are not support by this library. Do we really want to implement them all? If not, who defines the "good" subset? I'm not against supporting tags, but I would see a more flexible solution for supporting tags than putting everything into the core code. Do you have any good ideas? |
Ok, I understand your concerns and I agree it may make more sense to have a modular approach, especially since this is a JS implementation and the final code size should be minimal in regards to what someone actually needs in functionality. I would even say that the decoder should be separate from the encoder, since in certain cases you only need one of both. For example, you could create two new packages cbor-js-encode and cbor-js-decode, and the current cbor-js package becomes a meta/convenience package which just includes both as dependencies and reexports encode/decode. As for the tags, I see that the decode() function has a var TypedArrayTagger = function(val, tag) {
return ... // as typed array of correct type
}
TypedArrayTagger.tags = [];
for (var i = 64; i <= 87; i++) {
TypedArrayTagger.tags.push(i);
} I think I would be happy with that. And if someone at one point wants to create a super-cbor library, he can just import all taggers that exist, and proxy the decode function so that the taggers are supplied implicitly. What about encoding? My personal use case doesn't involve encoding on JS side so I'm less interested in that. I think it's ok to leave that out for now as it looks like it can be cleanly added in the future if there is some need. What do you think? |
What would be the benefit of splitting the cbor package into meta+encode+decode? What would you think about a decoder-registry: var typedArrayTags = [];
for (var i = 64; i <= 87; i++) {
typedArrayTags.push(i);
}
CBOR.registerDecoder(function(val, tag {
return ... // as typed array of correct type
}, typedArrayTags); an alternative call of the function could directly accept your CBOR.registerDecoder(TypedArrayTagger) If the var cbor = require('cbor');
cbor.registerDecoder(require('cbor-typedarraytagger'));
cbor.decode(...); About the encoder: I thought about using an object for that: var TaggedValue = CBOR.TaggedValue;
CBOR.encode({
normalValue: "example",
taggedValue: new TaggedValue(32, "http://example.com") // URI; see Section 2.4.4.3 in RFC7049
}); |
Splitting: If your web application only needs to decode, then why should it include the encoder as well? With separate packages you could only require what you need. As for the registry, I'm slightly against this, as it means to have global state and complicates things like testing (if you register a decoder in one test, it will appear in another test if you don't clean it up somehow). It also prevents parallel testing of different versions of a decoder (for the same tags). But I absolutely see the value of being able to have a nice interface (decode(val)) without much effort. I guess an alternative would be to make the CBOR object a class: var cbor = new require('cbor')();
var obj = cbor.decode(v); Or with some taggers registered: var cbor = new require('cbor')({
'decoders': [require('cbor-typedarraytagger'), ...]
});
var obj = cbor.decode(v); For encoding, that's possible, but not very straightforward if I think of the many typed array tags. Then I would need some extra machinery which transforms the typed arrays into such TaggedValue objects. It would be nice to have something similar as for decoders, but it may be hard to do. |
Splitting: IMHO the overhead of downloading 2 or 3 files compared to one when you need encoder and decoder is more relevant than the file size (4KB in the minified version). Global state: I completely agree, but how often do you really encounter this problems in practice? IMHO it's more a theoretical problem, since real parallel testing requires more than one interpreter (since JS is singe threaded by design) and then you can have also two global objects. IMHO having an easy API is more important than supporting rare cases. And if you really need it you could always create a copy of the CBOR object before registering the decoders. We could also think about having a Encoder: Thanks for pointing that out. What do you think about registering the constructors: CBOR.registerEncoder(Uint8Array, function(val) {
return {
tag: 64,
value: new Uint8Array(val.buffer) // this Uint8Array makes it a CBOR bytestring
};
}); |
|
Yes, the global registry should be used as default if not passes as argument. |
Cool, so we came to a common agreement :)
I will start creating a decoder package for the typed arrays soon.
|
I started writing a typedarray decoder, based on the decoder object format we discussed: https://github.com/neothemachine/cborjs-typedarrays-decoder It's not done yet and needs testing. When do you think you can integrate the registerDecoder function? |
I think I have a working implementation for the decoder now. I found it strange to work directly with a function that has a tags property attached to it. It doesn't reflect any common model to do these things. Since we don't need state, I propose to change it to a simple object with |
Sorry for the long delay, but finally I did a first implementation of it in the registerDecoder branch. |
Cool. Looks good, just one thing missing. Remember we talked about
default registries and optionally supplying a custom registry to
encode() decode()? I think it is easy to extend your code for that.
Basically rename the registries to defaultEncoderRegistry and
defaultDecoderRegistry and then add another argument to encode() and
decode(). I guess to make it consistent the register and unregister
functions should also have a registry parameter which defaults to the
default registries.
In decode(), what is the second parameter "simpleValue" good for? Looks
like some fallback for unknown "additional information", but I don't see
that this is defined in the CBOR spec really. I think this should be
removed. Or am I missing something here?
|
I implemented the UNregister functions, for doing the testing stuff you wrote about. Can you tell me more about the use-case for different registries? To me it seams strange that the same tagged value can result in different decoded values in the same project. CBOR does not use all possible values for simpleValues (e.g. true, false) for now, but a CBOR value could contain a (not yet) defined value. Instead of directly throwing an error, i call this simpleValue function, where users can throw their own value (if they want to). |
Ok, let's leave it as it is. It's not a problem, probably just me not liking global stuff again :p |
me neither, but thinking about an additional registry parameter seams even worse to me, but maybe i find a nice solution when finishing the code and documentation. |
No I think the two functions are fine. I think in most cases you don't need both encoding and decoding anyway, and if someone needs it it's not a big deal. From my side it's fine. |
Ping. How are you getting on here? Do you need some help? |
Wondering the state of this. Has the tags supported been added? If we have to encode / decode with custom tags, how to specify the tag handlers? |
There is a draft RFC to be published soon which adds support for packed/typed arrays in CBOR, see the current version. As one of the original CBOR authors is also an author of this proposal I'm sure it will gain widespread adoption in encoders and decoders, especially because it is such a useful feature and will help greatly when big packed arrays need to be efficiently encoded and also decoded both in terms of space usage and speed.
I thought it would be good putting it on the radar for implementation. I think it will be rather straightforward to implement. I currently don't have time for it myself in the next 3-4 weeks but I certainly will help where I can, reviewing code etc.
Some notes for an implementation:
The text was updated successfully, but these errors were encountered: