Transformer fields set does not match our data #25

rolanddb · 2017-03-08T14:36:14Z

Hi,
I'm trying to load the atomic.events data from Snowplow into Spark.
I'd like to do this using the EventTransformer.transform() method.

We observe a difference in the fields in our data, compared to what the transform method requires. Therefore, all events are marked as failure (unable to parse).

This is the mismatch:
Fields in our data: 128
Fields in SDK transformer: 131
In transformer but not in data: Set(derived_contexts, unstruct_event, contexts, refr_device_tstamp)
In data but not in transformer: Set(refr_dvce_tstamp)

Was refr_dvce_tstamp renamed?
How come we are missing the contexts and unstruct_event?

I can make it work by forking the SDK and modifying the transform method, but ideally I'd continue to use the main branch..

Thanks!

alexanderdean · 2017-03-08T18:24:32Z

Hi @rolanddb - hopefully all the answers you need are in #5. It would be great if you could open a PR adding support for your version of the enriched events!

alexanderdean · 2017-03-08T18:26:59Z

Ah @rolanddb - I read this slightly too fast. Are you trying to parse an extract from Redshift using this SDK? That's not supported - you should be pointing this at your enriched:good:archive instead.

rolanddb · 2017-03-08T18:27:36Z

Ah! So it wasn't me :)
I can make a PR tomorrow, it's a trivial change.
Thanks.

alexanderdean · 2017-03-08T18:28:17Z

See follow-up comment! I misunderstood your situation I think...

rolanddb · 2017-03-08T18:29:01Z

@alexanderdean I'm loading the events from S3, main/shredded/good.
Should I be using 'enriched' instead?

alexanderdean · 2017-03-08T18:29:31Z

Exactly, yes.

rolanddb · 2017-03-08T18:30:31Z

Ok, I'll give that a try tomorrow.

alexanderdean · 2017-04-10T14:05:26Z

Re-opening as @rolanddb's seems to be having an ongoing issue here.

chuwy · 2017-05-15T16:04:52Z

Hello @rolanddb,

We're about to publish Scala SDK 0.2.0, but still didn't manage to identify any case where transformer could not match enriched TSV data.

I see only one possible cause here - possibly you tried to load events enriched with Snowplow pre-R73 (released in December 2015), which produced more columns than it produces now. If I'm wrong here, would it be possible to provide some details (error message, TSV example) that breaks transformer.

rolanddb · 2017-05-16T07:40:45Z

Hi @chuwy,
There was some confusion about the difference between enriched / shredded data. So most of the difference between the dataset and the fields defined in the transformer was explained by that.

alexanderdean · 2017-05-16T09:25:40Z

Okay great, closing and descheduling...

alexanderdean closed this as completed Mar 8, 2017

alexanderdean reopened this Apr 10, 2017

alexanderdean assigned chuwy Apr 10, 2017

alexanderdean added the bug label Apr 10, 2017

alexanderdean added this to the Version 0.2.0 milestone Apr 10, 2017

alexanderdean closed this as completed May 16, 2017

alexanderdean removed this from the Version 0.2.0 milestone May 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transformer fields set does not match our data #25

Transformer fields set does not match our data #25

rolanddb commented Mar 8, 2017

alexanderdean commented Mar 8, 2017 •

edited

Loading

alexanderdean commented Mar 8, 2017

rolanddb commented Mar 8, 2017

alexanderdean commented Mar 8, 2017

rolanddb commented Mar 8, 2017

alexanderdean commented Mar 8, 2017

rolanddb commented Mar 8, 2017

alexanderdean commented Apr 10, 2017

chuwy commented May 15, 2017

rolanddb commented May 16, 2017

alexanderdean commented May 16, 2017

Transformer fields set does not match our data #25

Transformer fields set does not match our data #25

Comments

rolanddb commented Mar 8, 2017

alexanderdean commented Mar 8, 2017 • edited Loading

alexanderdean commented Mar 8, 2017

rolanddb commented Mar 8, 2017

alexanderdean commented Mar 8, 2017

rolanddb commented Mar 8, 2017

alexanderdean commented Mar 8, 2017

rolanddb commented Mar 8, 2017

alexanderdean commented Apr 10, 2017

chuwy commented May 15, 2017

rolanddb commented May 16, 2017

alexanderdean commented May 16, 2017

alexanderdean commented Mar 8, 2017 •

edited

Loading