-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apache ORC Support in TensorFlow IO #1372
Comments
@oliverhu any update on this? |
no update recently @kvignesh1420 |
@oliverhu can we document the current feature in the form of a tutorial? |
sure, will add that ! |
Reference FYKI: https://github.com/tensorflow/io/tree/master/docs/tutorials |
Is HDFS supported now? Loading from HDFS path results in coredump dataset = tfio.IODataset.from_orc("hdfs://xxx/yy/iris.orc", capacity=15).batch(1) |
Open
HDFS supported (with kerberos) by #1674 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
(Creating this issue for visibility so people interested can join the discussion... )
Overview
Load Apache ORC formatted data natively into TensorFlow from file system supported by TensorFlow, e.g. HDFS, local disk, etc.
Motivation
We traditionally use Avro to store our dataset but it is becoming inefficient to use row based format for big data analytics processing. Historically we selected ORC as our columnar storage format. (not planning to argue Parquet vs ORC here ;))
Design Discussions
Milestones
parse_example_v2
.)The text was updated successfully, but these errors were encountered: