Pluggable encoders and decoders for tagged values #3

letmaik · 2015-07-28T21:19:29Z

There is a draft RFC to be published soon which adds support for packed/typed arrays in CBOR, see the current version. As one of the original CBOR authors is also an author of this proposal I'm sure it will gain widespread adoption in encoders and decoders, especially because it is such a useful feature and will help greatly when big packed arrays need to be efficiently encoded and also decoded both in terms of space usage and speed.

I thought it would be good putting it on the radar for implementation. I think it will be rather straightforward to implement. I currently don't have time for it myself in the next 3-4 weeks but I certainly will help where I can, reviewing code etc.

Some notes for an implementation:

I would only encode actual JS typed arrays as CBOR typed arrays. Scanning regular arrays and checking if they are suitable would be too time-consuming.
For decoding you may argue that you really only want JSON compatible JS objects (without typed arrays), so there may be an option for decoding where you can switch to the typed-array mode which means that CBOR typed arrays are directly mapped to efficient JS typed arrays.

paroga · 2015-08-19T14:05:07Z

ATM cbor.js ignores all tagged vales and lets the user handle the tags. IANA lists many other tags which are not support by this library. Do we really want to implement them all? If not, who defines the "good" subset? I'm not against supporting tags, but I would see a more flexible solution for supporting tags than putting everything into the core code. Do you have any good ideas?

letmaik · 2015-08-21T09:33:43Z

Ok, I understand your concerns and I agree it may make more sense to have a modular approach, especially since this is a JS implementation and the final code size should be minimal in regards to what someone actually needs in functionality. I would even say that the decoder should be separate from the encoder, since in certain cases you only need one of both. For example, you could create two new packages cbor-js-encode and cbor-js-decode, and the current cbor-js package becomes a meta/convenience package which just includes both as dependencies and reexports encode/decode.

As for the tags, I see that the decode() function has a tagger argument which accepts the value and the tag number. So, let's assume I create my own package cbor-js-decode-typedarrays which supplies such a tagger function for the typed array tags (and which just returns the original value for unknown tags). It seems quite easy to use, just import both packages and invoke decode with the typed array tagger. So far so good, but what if other taggers should be used as well? How to combine them easily? I think to solve this, each tagger should advertise the tags it supports. Then you could accept an array of such taggers in decode(val, [tagger1,tagger2]) and internally create a map from tag number to tagger for efficient access. If many such taggers are used there is probably some overhead, so it may be good to also allow some pre-computed map as argument, like var taggers = Taggers([tagger1, tagger2]) and then decode(val, taggers). I think a tagger could then look something like:

var TypedArrayTagger = function(val, tag) {
  return ... // as typed array of correct type
}
TypedArrayTagger.tags = [];
for (var i = 64; i <= 87; i++) {
    TypedArrayTagger.tags.push(i);
}

I think I would be happy with that. And if someone at one point wants to create a super-cbor library, he can just import all taggers that exist, and proxy the decode function so that the taggers are supplied implicitly.

What about encoding? My personal use case doesn't involve encoding on JS side so I'm less interested in that. I think it's ok to leave that out for now as it looks like it can be cleanly added in the future if there is some need.

What do you think?

paroga · 2015-08-21T10:11:29Z

What would be the benefit of splitting the cbor package into meta+encode+decode?

What would you think about a decoder-registry:

var typedArrayTags = [];
for (var i = 64; i <= 87; i++) {
  typedArrayTags.push(i);
}
CBOR.registerDecoder(function(val, tag {
  return ... // as typed array of correct type
}, typedArrayTags);

an alternative call of the function could directly accept your TypedArrayTaggerfunction (and uses the tagsproperty):

CBOR.registerDecoder(TypedArrayTagger)

If the TypedArrayTaggercomes from it's own package it should allow easy code reuse:

var cbor = require('cbor');
cbor.registerDecoder(require('cbor-typedarraytagger'));
cbor.decode(...);

About the encoder: I thought about using an object for that:

var TaggedValue = CBOR.TaggedValue;
CBOR.encode({
  normalValue: "example",
  taggedValue: new TaggedValue(32, "http://example.com") // URI; see Section 2.4.4.3 in RFC7049
});

letmaik · 2015-08-21T10:46:11Z

Splitting: If your web application only needs to decode, then why should it include the encoder as well? With separate packages you could only require what you need.

As for the registry, I'm slightly against this, as it means to have global state and complicates things like testing (if you register a decoder in one test, it will appear in another test if you don't clean it up somehow). It also prevents parallel testing of different versions of a decoder (for the same tags). But I absolutely see the value of being able to have a nice interface (decode(val)) without much effort. I guess an alternative would be to make the CBOR object a class:

var cbor = new require('cbor')();
var obj = cbor.decode(v);

Or with some taggers registered:

var cbor = new require('cbor')({
  'decoders': [require('cbor-typedarraytagger'), ...]
});
var obj = cbor.decode(v);

For encoding, that's possible, but not very straightforward if I think of the many typed array tags. Then I would need some extra machinery which transforms the typed arrays into such TaggedValue objects. It would be nice to have something similar as for decoders, but it may be hard to do.

paroga · 2015-08-21T11:18:28Z

Splitting: IMHO the overhead of downloading 2 or 3 files compared to one when you need encoder and decoder is more relevant than the file size (4KB in the minified version).

Global state: I completely agree, but how often do you really encounter this problems in practice? IMHO it's more a theoretical problem, since real parallel testing requires more than one interpreter (since JS is singe threaded by design) and then you can have also two global objects. IMHO having an easy API is more important than supporting rare cases. And if you really need it you could always create a copy of the CBOR object before registering the decoders. We could also think about having a CBOR.registryobject with registerDecoder function, which could than passed to the CBOR.encode and CBOR.decode functions as additional optional argument.

Encoder: Thanks for pointing that out. What do you think about registering the constructors:

CBOR.registerEncoder(Uint8Array, function(val) {
  return {
    tag: 64,
    value: new Uint8Array(val.buffer) // this Uint8Array makes it a CBOR bytestring
  };
});

letmaik · 2015-08-21T11:45:35Z

Splitting: IMHO the overhead of downloading 2 or 3 files compared to
one when you need encoder and decoder is more relevant than the file
size (4KB in the minified version).

Right, what I had in mind is that there would be a build step for the
meta library which then produces a single merged and minified file for
each release. Given given the small size of 4kb... you're right, let's
not overcomplicate things.

Global state: I completely agree, but how often do you really
encounter this problems in practice? IMHO it's more a theoretical
problem, since real parallel testing requires more than one
interpreter (since JS is singe threaded by design) and then you can
have also two global objects. IMHO having an easy API is more
important than supporting rare cases. And if you really need it you
could always create a copy of the CBOR object before registering the
decoders. We could also think about having a |CBOR.registry|object
with |registerDecoder| function, which could than passed to the
|CBOR.encode| and |CBOR.decode| functions as additional optional argument.

What about the other way around? By default, the CBOR.encode and decode
functions could use the global registry. And if you don't want that, you
can supply your own registry as optional parameter. Maybe that's what
you actually meant?

Encoder: Thanks for pointing that out. What do you think about
registering the constructors:

Interesting idea. I think it's a good start. I can't think of any
serious issues with that.

paroga · 2015-08-21T11:47:57Z

What about the other way around? By default, the CBOR.encode and decode
functions could use the global registry. And if you don't want that, you
can supply your own registry as optional parameter. Maybe that's what
you actually meant?

Yes, the global registry should be used as default if not passes as argument.

letmaik · 2015-08-21T11:52:55Z

Cool, so we came to a common agreement :) I will start creating a decoder package for the typed arrays soon.

letmaik · 2015-09-01T15:25:43Z

I started writing a typedarray decoder, based on the decoder object format we discussed: https://github.com/neothemachine/cborjs-typedarrays-decoder It's not done yet and needs testing. When do you think you can integrate the registerDecoder function?

letmaik · 2015-09-02T11:31:58Z

I think I have a working implementation for the decoder now. I found it strange to work directly with a function that has a tags property attached to it. It doesn't reflect any common model to do these things. Since we don't need state, I propose to change it to a simple object with tags and decode keys, where tags is the array of supported tags, and decode is the decode function.

paroga · 2015-09-24T11:41:16Z

Sorry for the long delay, but finally I did a first implementation of it in the registerDecoder branch.
If you agree on the API, I'll add the missing test and publish a new version.

letmaik · 2015-09-24T22:29:14Z

Cool. Looks good, just one thing missing. Remember we talked about default registries and optionally supplying a custom registry to encode() decode()? I think it is easy to extend your code for that. Basically rename the registries to defaultEncoderRegistry and defaultDecoderRegistry and then add another argument to encode() and decode(). I guess to make it consistent the register and unregister functions should also have a registry parameter which defaults to the default registries. In decode(), what is the second parameter "simpleValue" good for? Looks like some fallback for unknown "additional information", but I don't see that this is defined in the CBOR spec really. I think this should be removed. Or am I missing something here?

paroga · 2015-09-24T22:38:10Z

I implemented the UNregister functions, for doing the testing stuff you wrote about. Can you tell me more about the use-case for different registries? To me it seams strange that the same tagged value can result in different decoded values in the same project.

CBOR does not use all possible values for simpleValues (e.g. true, false) for now, but a CBOR value could contain a (not yet) defined value. Instead of directly throwing an error, i call this simpleValue function, where users can throw their own value (if they want to).

letmaik · 2015-09-24T22:47:33Z

Ok, let's leave it as it is. It's not a problem, probably just me not liking global stuff again :p

paroga · 2015-09-24T22:53:28Z

me neither, but thinking about an additional registry parameter seams even worse to me, but maybe i find a nice solution when finishing the code and documentation.
Does it fit your requirements now? Do you think the two different functions are ok, or do we need a registerEncoderAndDecoder() too?

letmaik · 2015-09-24T22:59:56Z

No I think the two functions are fine. I think in most cases you don't need both encoding and decoding anyway, and if someone needs it it's not a big deal. From my side it's fine.

letmaik · 2015-12-07T18:05:37Z

Ping. How are you getting on here? Do you need some help?

KrishnaPG · 2019-07-14T14:20:13Z

Wondering the state of this. Has the tags supported been added?

If we have to encode / decode with custom tags, how to specify the tag handlers?

letmaik mentioned this issue Sep 9, 2015

Use finalised extension mechanism Reading-eScience-Centre/cborjs-typedarray-decoder#1

Closed

letmaik changed the title ~~Add support for upcoming packed arrays tags~~ Pluggable encoders and decoders for tagged values Dec 9, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pluggable encoders and decoders for tagged values #3

Pluggable encoders and decoders for tagged values #3

letmaik commented Jul 28, 2015

paroga commented Aug 19, 2015

letmaik commented Aug 21, 2015

paroga commented Aug 21, 2015

letmaik commented Aug 21, 2015

paroga commented Aug 21, 2015

letmaik commented Aug 21, 2015

paroga commented Aug 21, 2015

letmaik commented Aug 21, 2015 via email

letmaik commented Sep 1, 2015

letmaik commented Sep 2, 2015

paroga commented Sep 24, 2015

letmaik commented Sep 24, 2015 via email

paroga commented Sep 24, 2015

letmaik commented Sep 24, 2015

paroga commented Sep 24, 2015

letmaik commented Sep 24, 2015

letmaik commented Dec 7, 2015

KrishnaPG commented Jul 14, 2019

Pluggable encoders and decoders for tagged values #3

Pluggable encoders and decoders for tagged values #3

Comments

letmaik commented Jul 28, 2015

paroga commented Aug 19, 2015

letmaik commented Aug 21, 2015

paroga commented Aug 21, 2015

letmaik commented Aug 21, 2015

paroga commented Aug 21, 2015

letmaik commented Aug 21, 2015

paroga commented Aug 21, 2015

letmaik commented Aug 21, 2015 via email

letmaik commented Sep 1, 2015

letmaik commented Sep 2, 2015

paroga commented Sep 24, 2015

letmaik commented Sep 24, 2015 via email

paroga commented Sep 24, 2015

letmaik commented Sep 24, 2015

paroga commented Sep 24, 2015

letmaik commented Sep 24, 2015

letmaik commented Dec 7, 2015

KrishnaPG commented Jul 14, 2019