Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: parser function for string access by pointer #125

Open
bergzand opened this issue May 9, 2018 · 14 comments
Open

Feature: parser function for string access by pointer #125

bergzand opened this issue May 9, 2018 · 14 comments
Assignees

Comments

@bergzand
Copy link
Contributor

bergzand commented May 9, 2018

I have a few use cases where I have a CBOR stream with a rather large byte strings. Having a function to get a pointer and length to these byte string would help saving some memory as a buffer to temporarily copy the data to is not required. Something like

CborError cbor_value_get_byte_string(CborValue *value, uint8_t **result, size_t *len);

would be perfect. This could return an CborErrorUnknownLength when a chunked string is passed.

My main use case is a COSE library where I want to retrieve the payload without having to copy the to another buffer. Simply accessing it with a pointer is preferred to save memory on a constraint device.

I'm willing to contribute the code myself, but I need a few pointers on how to get the pointer to the actual data.

@thiagomacieira
Copy link
Member

Hello Koen

I had that exact function in 0.5 but I removed it for further development. I un-reverted it in the dev branch, in edd2b9e. Please note that I do plan on changing it again, see 8b539f2.

@thiagomacieira thiagomacieira self-assigned this May 9, 2018
@thiagomacieira
Copy link
Member

The problem I have to solve is that "pointer to the actual data" may not be a simple thing, especially in constrained devices. Take the example of Zephyr: instead of one buffer containing all data (a "linear" buffer), the data received from the network is stored in a struct netbuf, which is a linked (chained) list of 128-byte chunks. So we need to return to the caller not a pointer to the beginning of the data, but instead a struct netbuf plus offset.

@bergzand
Copy link
Contributor Author

@thiagomacieira Thanks for the clarification! I understand that my request might not be as simple as I've thought it to be :)

Please let me know if I can help you in some way with testing.

@bpintea
Copy link

bpintea commented Jul 12, 2019

I'd second this FR, for the non-stream parsing use case.

The library is great for stream parsing, but for the case where I have the entire CBOR object available, extracting the byte/text strings comes with a penalty of an extra copy or allocation plus copy.

When I have the entire object, the string is in a contiguous chunk and readily available, so I only need to know where it starts and its length, which the decoder knows of already.
An attempt to get the ref should of course fail if the string is chunked (in which case the caller still has the option of having the library allocate/copy).

@thiagomacieira
Copy link
Member

That is true and the API I designed works for when the entire message is available in a contiguous chunk. It's not so good when that isn't the case. I've managed to make it work with an arbtrary Qt's QIODevice and avoid double-copy, but the API is fragile. We need a better API.

@jleni
Copy link

jleni commented Dec 17, 2019

any news with respect to this? would it be possible to expose:
CborError get_string_chunk(CborValue *it, const void **bufferptr, size_t *len)in cbor.h?

This could be a good workaround AFAIK and would allow direct access to a pointer to the start of the string.

@thiagomacieira
Copy link
Member

The problem is committing to an API that isn't long-term supportable, especially on small, RTOSes where the buffer may not be a contiguous chunk of memory.

@jleni
Copy link

jleni commented Dec 17, 2019

yes, I understand that.. but enforcing copies is even worse for small devices..
maybe get_string_chunk under conditional compilation for people that want that option would be useful.

@thiagomacieira
Copy link
Member

I know. Adding something under conditional compilation is the same as you performing the hack: I won't support you if it breaks later when we have a stable API.

@jeremyherbert
Copy link

jeremyherbert commented Jan 4, 2020

I'm also very keen to see this functionality (for me, direct pointer access is not necessary but memory must be stack/static only).

Can I suggest a string iterator which allows you to read chunks of a string up to a maximum length? For example:

// pretend that an iterator 'it' already exists
char small_buffer[32];
CborStringIterator string_it;
size_t actual_chunk_size = 0;

cbor_create_string_iterator(&it, &string_it); // 'it' currently points to a text string or byte string
while (!cbor_string_iterator_at_end(&string_it)) {
    cbor_string_iterator_copy_and_advance(&string_it, small_buffer, 32, &actual_chunk_size);
    
    for (size_t i=0; i<actual_chunk_size; i++) {
        // 0 <= actual chunk size <= 32
        do_some_work(small_buffer[i]);
    }
}

This way the alignment of any chunks is not relevant; the copy function can return any number of bytes up to a maximum.

@jleni
Copy link

jleni commented Jan 4, 2020

It is a good idea but I would open another feature request for that case.
I think the current feature request is specific about having access to a direct pointer to the CBOR buffer to avoid any kind of copies or additional memory usage. I find this specific use case very important in memory-constrained embedded environments.

@thiagomacieira
Copy link
Member

Please open a new request for that, but please explain how cbor_value_copy_{text,byte}_string won't work for you.

@letmaik
Copy link

letmaik commented Jun 21, 2021

I'm also looking for direct access to a byte string pointer with definite length. I would have thought this is a common use case somehow.

@thiagomacieira
Copy link
Member

Somewhat. I did hack around that issue for my own code...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants