-
Notifications
You must be signed in to change notification settings - Fork 801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MapArray Requires Values Array #1642
Comments
/// [MapArray] is physically a [crate::array::ListArray] that has a
/// [crate::array::StructArray] with 2 child fields.
pub struct MapArray {
data: ArrayData,
values: ArrayRef,
value_offsets: RawPtrBox<i32>,
} |
So the basic idea is to avoid the obligation of values in the map. According to
Did I get the idea right? So it should be possible to parse something like this without any errors
|
That is my understanding, but some spelunking in the C++ or Java implementations may be warranted to confirm how they choose to handle it |
@tustvold here's Arrow's spec: https://github.com/apache/arrow/blob/master/format/Schema.fbs#L103-L131 /// A Map is a logical nested type that is represented as
///
/// List<entries: Struct<key: K, value: V>>
///
/// In this layout, the keys and values are each respectively contiguous. We do
/// not constrain the key and value types, so the application is responsible
/// for ensuring that the keys are hashable and unique. Whether the keys are sorted
/// may be set in the metadata for this field.
///
/// In a field with Map type, the field has a child Struct field, which then
/// has two children: key type and the second the value type. The names of the
/// child fields may be respectively "entries", "key", and "value", but this is
/// not enforced.
///
/// Map
/// ```text
/// - child[0] entries: Struct
/// - child[0] key: K
/// - child[1] value: V
/// ```
/// Neither the "entries" field nor the "key" field may be nullable.
///
/// The metadata is structured so that Arrow systems without special handling
/// for Map can make Map an alias for List. The "layout" attribute for the Map
/// field must have the same contents as a List.
table Map {
/// Set to true if the keys within each value are sorted
keysSorted: bool;
} Parquet seems to allow both @frolovdev I suppose a solution is to check whether a map has both key and value, then fall back to parsing it as a list.
would then be read in as |
Bumping this in light of apache/parquet-format#469. I think @nevi-me is correct. As pointed out by @wgtmac, arrow-cpp treats a |
Describe the bug
The parquet specification states that the values for a MapArray are optional - https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps.
However, the currently logic in
ArrayReaderBuilder
requires a values array, along with the current definition ofDataType::Map
To Reproduce
Try to read a MapArray without a values array, you will receive an error
Expected behavior
I'm not actually entirely sure, the arrow specification doesn't seem to describe the semantics of Map Arrays, however, our code seems to assume that the values array is required. I'm creating this to track the fact something isn't right here, but I don't actually know what.
Additional context
Map Arrays were added by @nevi-me as part of #395
The text was updated successfully, but these errors were encountered: