package avro
import "github.com/apache/arrow-go/v18/arrow/avro"
Package avro reads Avro OCF files and presents the extracted data as records
Index ¶
- Variables
- func ArrowSchemaFromAvro(schema avro.Schema) (s *arrow.Schema, err error)
- type OCFReader
- func NewOCFReader(r io.Reader, opts ...Option) (*OCFReader, error)
- func (r *OCFReader) AvroSchema() string
- func (r *OCFReader) Close()
- func (r *OCFReader) Err() error
- func (r *OCFReader) Metrics() string
- func (r *OCFReader) Next() bool
- func (r *OCFReader) OCFRecordsReadCount() int64
- func (r *OCFReader) Record() arrow.Record
- func (r *OCFReader) Release()
- func (r *OCFReader) Retain()
- func (rr *OCFReader) Reuse(r io.Reader, opts ...Option) error
- func (r *OCFReader) Schema() *arrow.Schema
- type Option
Variables ¶
Functions ¶
func ArrowSchemaFromAvro ¶
ArrowSchemaFromAvro returns a new Arrow schema from an Avro schema
Types ¶
type OCFReader ¶
type OCFReader struct {
// contains filtered or unexported fields
}
Reader wraps goavro/OCFReader and creates array.Records from a schema.
func NewOCFReader ¶
NewReader returns a reader that reads from an Avro OCF file and creates arrow.Records from the converted avro data.
func (*OCFReader) AvroSchema ¶
AvroSchema returns the Avro schema of the Avro OCF
func (*OCFReader) Close ¶
func (r *OCFReader) Close()
Close closes the OCFReader's Avro record read cache and converted Arrow record cache. OCFReader must be closed if the Avro OCF's records have not been read to completion.
func (*OCFReader) Err ¶
Err returns the last error encountered during the iteration over the underlying Avro file.
func (*OCFReader) Metrics ¶
Metrics returns the maximum queue depth of the Avro record read cache and of the converted Arrow record cache.
func (*OCFReader) Next ¶
Next returns whether a Record can be received from the converted record queue. The user should check Err() after call to Next that return false to check if an error took place.
func (*OCFReader) OCFRecordsReadCount ¶
OCFRecordsReadCount returns the number of Avro datum that were read from the Avro file.
func (*OCFReader) Record ¶
Record returns the current record that has been extracted from the underlying Avro OCF file. It is valid until the next call to Next.
func (*OCFReader) Release ¶
func (r *OCFReader) Release()
Release decreases the reference count by 1. When the reference count goes to zero, the memory is freed. Release may be called simultaneously from multiple goroutines.
func (*OCFReader) Retain ¶
func (r *OCFReader) Retain()
Retain increases the reference count by 1. Retain may be called simultaneously from multiple goroutines.
func (*OCFReader) Reuse ¶
Reuse allows the OCFReader to be reused to read another Avro file provided the new Avro file has an identical schema.
func (*OCFReader) Schema ¶
Schema returns the converted Arrow schema of the Avro OCF
type Option ¶
type Option func(config)
Option configures an Avro reader/writer.
func WithAllocator ¶
WithAllocator specifies the Arrow memory allocator used while building records.
func WithChunk ¶
WithChunk specifies the chunk size used while reading Avro OCF files.
If n is zero or 1, no chunking will take place and the reader will create one record per row. If n is greater than 1, chunks of n rows will be read. If n is negative, the reader will load the whole Avro OCF file into memory and create one big record with all the rows.
func WithReadCacheSize ¶
WithReadCacheSize specifies the size of the OCF record decode queue, default value is 500.
func WithRecordCacheSize ¶
WithRecordCacheSize specifies the size of the converted Arrow record queue, default value is 1.
func WithSchemaEdit ¶
WithSchemaEdit specifies modifications to the Avro schema. Supported methods are 'set' and 'delete'. Set sets the value for the specified path. Delete deletes the value for the specified path. A path is in dot syntax, such as "fields.1" or "fields.0.type". The modified Avro schema is validated before conversion to Arrow schema - NewOCFReader will return an error if the modified schema cannot be parsed.
Source Files ¶
loader.go reader.go reader_types.go schema.go
Directories ¶
| Path | Synopsis |
|---|---|
| arrow/avro/avro2parquet |
- Version
- v18.3.0 (latest)
- Published
- May 6, 2025
- Platform
- linux/amd64
- Imports
- 21 packages
- Last checked
- 10 months ago –
Tools for package owners.