Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pluggable encoders and decoders for tagged values #3

Open
letmaik opened this issue Jul 28, 2015 · 18 comments
Open

Pluggable encoders and decoders for tagged values #3

letmaik opened this issue Jul 28, 2015 · 18 comments

Comments

@letmaik
Copy link
Contributor

letmaik commented Jul 28, 2015

There is a draft RFC to be published soon which adds support for packed/typed arrays in CBOR, see the current version. As one of the original CBOR authors is also an author of this proposal I'm sure it will gain widespread adoption in encoders and decoders, especially because it is such a useful feature and will help greatly when big packed arrays need to be efficiently encoded and also decoded both in terms of space usage and speed.

I thought it would be good putting it on the radar for implementation. I think it will be rather straightforward to implement. I currently don't have time for it myself in the next 3-4 weeks but I certainly will help where I can, reviewing code etc.

Some notes for an implementation:

  • I would only encode actual JS typed arrays as CBOR typed arrays. Scanning regular arrays and checking if they are suitable would be too time-consuming.
  • For decoding you may argue that you really only want JSON compatible JS objects (without typed arrays), so there may be an option for decoding where you can switch to the typed-array mode which means that CBOR typed arrays are directly mapped to efficient JS typed arrays.
@paroga
Copy link
Owner

paroga commented Aug 19, 2015

ATM cbor.js ignores all tagged vales and lets the user handle the tags. IANA lists many other tags which are not support by this library. Do we really want to implement them all? If not, who defines the "good" subset? I'm not against supporting tags, but I would see a more flexible solution for supporting tags than putting everything into the core code. Do you have any good ideas?

@letmaik
Copy link
Contributor Author

letmaik commented Aug 21, 2015

Ok, I understand your concerns and I agree it may make more sense to have a modular approach, especially since this is a JS implementation and the final code size should be minimal in regards to what someone actually needs in functionality. I would even say that the decoder should be separate from the encoder, since in certain cases you only need one of both. For example, you could create two new packages cbor-js-encode and cbor-js-decode, and the current cbor-js package becomes a meta/convenience package which just includes both as dependencies and reexports encode/decode.

As for the tags, I see that the decode() function has a tagger argument which accepts the value and the tag number. So, let's assume I create my own package cbor-js-decode-typedarrays which supplies such a tagger function for the typed array tags (and which just returns the original value for unknown tags). It seems quite easy to use, just import both packages and invoke decode with the typed array tagger. So far so good, but what if other taggers should be used as well? How to combine them easily? I think to solve this, each tagger should advertise the tags it supports. Then you could accept an array of such taggers in decode(val, [tagger1,tagger2]) and internally create a map from tag number to tagger for efficient access. If many such taggers are used there is probably some overhead, so it may be good to also allow some pre-computed map as argument, like var taggers = Taggers([tagger1, tagger2]) and then decode(val, taggers). I think a tagger could then look something like:

var TypedArrayTagger = function(val, tag) {
  return ... // as typed array of correct type
}
TypedArrayTagger.tags = [];
for (var i = 64; i <= 87; i++) {
    TypedArrayTagger.tags.push(i);
}

I think I would be happy with that. And if someone at one point wants to create a super-cbor library, he can just import all taggers that exist, and proxy the decode function so that the taggers are supplied implicitly.

What about encoding? My personal use case doesn't involve encoding on JS side so I'm less interested in that. I think it's ok to leave that out for now as it looks like it can be cleanly added in the future if there is some need.

What do you think?

@paroga
Copy link
Owner

paroga commented Aug 21, 2015

What would be the benefit of splitting the cbor package into meta+encode+decode?

What would you think about a decoder-registry:

var typedArrayTags = [];
for (var i = 64; i <= 87; i++) {
  typedArrayTags.push(i);
}
CBOR.registerDecoder(function(val, tag {
  return ... // as typed array of correct type
}, typedArrayTags);

an alternative call of the function could directly accept your TypedArrayTaggerfunction (and uses the tagsproperty):

CBOR.registerDecoder(TypedArrayTagger)

If the TypedArrayTaggercomes from it's own package it should allow easy code reuse:

var cbor = require('cbor');
cbor.registerDecoder(require('cbor-typedarraytagger'));
cbor.decode(...);

About the encoder: I thought about using an object for that:

var TaggedValue = CBOR.TaggedValue;
CBOR.encode({
  normalValue: "example",
  taggedValue: new TaggedValue(32, "http://example.com") // URI; see Section 2.4.4.3 in RFC7049
});

@letmaik
Copy link
Contributor Author

letmaik commented Aug 21, 2015

Splitting: If your web application only needs to decode, then why should it include the encoder as well? With separate packages you could only require what you need.

As for the registry, I'm slightly against this, as it means to have global state and complicates things like testing (if you register a decoder in one test, it will appear in another test if you don't clean it up somehow). It also prevents parallel testing of different versions of a decoder (for the same tags). But I absolutely see the value of being able to have a nice interface (decode(val)) without much effort. I guess an alternative would be to make the CBOR object a class:

var cbor = new require('cbor')();
var obj = cbor.decode(v);

Or with some taggers registered:

var cbor = new require('cbor')({
  'decoders': [require('cbor-typedarraytagger'), ...]
});
var obj = cbor.decode(v);

For encoding, that's possible, but not very straightforward if I think of the many typed array tags. Then I would need some extra machinery which transforms the typed arrays into such TaggedValue objects. It would be nice to have something similar as for decoders, but it may be hard to do.

@paroga
Copy link
Owner

paroga commented Aug 21, 2015

Splitting: IMHO the overhead of downloading 2 or 3 files compared to one when you need encoder and decoder is more relevant than the file size (4KB in the minified version).

Global state: I completely agree, but how often do you really encounter this problems in practice? IMHO it's more a theoretical problem, since real parallel testing requires more than one interpreter (since JS is singe threaded by design) and then you can have also two global objects. IMHO having an easy API is more important than supporting rare cases. And if you really need it you could always create a copy of the CBOR object before registering the decoders. We could also think about having a CBOR.registryobject with registerDecoder function, which could than passed to the CBOR.encode and CBOR.decode functions as additional optional argument.

Encoder: Thanks for pointing that out. What do you think about registering the constructors:

CBOR.registerEncoder(Uint8Array, function(val) {
  return {
    tag: 64,
    value: new Uint8Array(val.buffer) // this Uint8Array makes it a CBOR bytestring
  };
});

@letmaik
Copy link
Contributor Author

letmaik commented Aug 21, 2015

Splitting: IMHO the overhead of downloading 2 or 3 files compared to
one when you need encoder and decoder is more relevant than the file
size (4KB in the minified version).

Right, what I had in mind is that there would be a build step for the
meta library which then produces a single merged and minified file for
each release. Given given the small size of 4kb... you're right, let's
not overcomplicate things.

Global state: I completely agree, but how often do you really
encounter this problems in practice? IMHO it's more a theoretical
problem, since real parallel testing requires more than one
interpreter (since JS is singe threaded by design) and then you can
have also two global objects. IMHO having an easy API is more
important than supporting rare cases. And if you really need it you
could always create a copy of the CBOR object before registering the
decoders. We could also think about having a |CBOR.registry|object
with |registerDecoder| function, which could than passed to the
|CBOR.encode| and |CBOR.decode| functions as additional optional argument.

What about the other way around? By default, the CBOR.encode and decode
functions could use the global registry. And if you don't want that, you
can supply your own registry as optional parameter. Maybe that's what
you actually meant?

Encoder: Thanks for pointing that out. What do you think about
registering the constructors:

Interesting idea. I think it's a good start. I can't think of any
serious issues with that.

@paroga
Copy link
Owner

paroga commented Aug 21, 2015

What about the other way around? By default, the CBOR.encode and decode
functions could use the global registry. And if you don't want that, you
can supply your own registry as optional parameter. Maybe that's what
you actually meant?

Yes, the global registry should be used as default if not passes as argument.

@letmaik
Copy link
Contributor Author

letmaik commented Aug 21, 2015 via email

@letmaik
Copy link
Contributor Author

letmaik commented Sep 1, 2015

I started writing a typedarray decoder, based on the decoder object format we discussed: https://github.com/neothemachine/cborjs-typedarrays-decoder It's not done yet and needs testing. When do you think you can integrate the registerDecoder function?

@letmaik
Copy link
Contributor Author

letmaik commented Sep 2, 2015

I think I have a working implementation for the decoder now. I found it strange to work directly with a function that has a tags property attached to it. It doesn't reflect any common model to do these things. Since we don't need state, I propose to change it to a simple object with tags and decode keys, where tags is the array of supported tags, and decode is the decode function.

@paroga
Copy link
Owner

paroga commented Sep 24, 2015

Sorry for the long delay, but finally I did a first implementation of it in the registerDecoder branch.
If you agree on the API, I'll add the missing test and publish a new version.

@letmaik
Copy link
Contributor Author

letmaik commented Sep 24, 2015 via email

@paroga
Copy link
Owner

paroga commented Sep 24, 2015

I implemented the UNregister functions, for doing the testing stuff you wrote about. Can you tell me more about the use-case for different registries? To me it seams strange that the same tagged value can result in different decoded values in the same project.

CBOR does not use all possible values for simpleValues (e.g. true, false) for now, but a CBOR value could contain a (not yet) defined value. Instead of directly throwing an error, i call this simpleValue function, where users can throw their own value (if they want to).

@letmaik
Copy link
Contributor Author

letmaik commented Sep 24, 2015

Ok, let's leave it as it is. It's not a problem, probably just me not liking global stuff again :p

@paroga
Copy link
Owner

paroga commented Sep 24, 2015

me neither, but thinking about an additional registry parameter seams even worse to me, but maybe i find a nice solution when finishing the code and documentation.
Does it fit your requirements now? Do you think the two different functions are ok, or do we need a registerEncoderAndDecoder() too?

@letmaik
Copy link
Contributor Author

letmaik commented Sep 24, 2015

No I think the two functions are fine. I think in most cases you don't need both encoding and decoding anyway, and if someone needs it it's not a big deal. From my side it's fine.

@letmaik
Copy link
Contributor Author

letmaik commented Dec 7, 2015

Ping. How are you getting on here? Do you need some help?

@letmaik letmaik changed the title Add support for upcoming packed arrays tags Pluggable encoders and decoders for tagged values Dec 9, 2015
@KrishnaPG
Copy link

Wondering the state of this. Has the tags supported been added?

If we have to encode / decode with custom tags, how to specify the tag handlers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants