-
Notifications
You must be signed in to change notification settings - Fork 35
Support reading from and writing to Arrow files #415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This requires overriding `Arrow.DictEncoding` so that an `Arrow.DictEncoded` with a `CategoricalArray` dictionary with one entry per level is created. This is the only way to ensure that indexing the Arrow column gives `CategoricalValue` objects. In practice such columns will most often be used after conversion to `CategoricalArray` via `copy`, `DataFrame`, etc. Apparently, pandas do not allow reading the resulting file if the array allows for missing values as it does not accept `missing` in the dictionary. Instead it would need missing entries to be coded via null indices, which is less efficient.
Failure on 32-bit seems due to an |
@nalimilan feel like adding some arch checks to disable testing/the extension defining any methods on 32bit? |
Done! I also bumped the minimal Julia version to 1.6 as packages fail to install on 1.0 (which is super old anyway). |
I'm not an org member so I can't approve but LGTM! |
Co-authored-by: Phillip Alday <palday@users.noreply.github.com>
Thanks! |
This requires overriding
Arrow.DictEncoding
so that anArrow.DictEncoded
with aCategoricalArray
dictionary with one entry per level is created. This is the only way to ensure that indexing the Arrow column givesCategoricalValue
objects. In practice such columns will most often be used after conversion toCategoricalArray
viacopy
,DataFrame
, etc.Apparently, pandas do not allow reading the resulting file if the array allows for missing values as it does not accept
missing
in the dictionary. Instead it would need missing entries to be coded via null indices, which is less efficient.