diff --git a/README.md b/README.md index d076703..dd6e62b 100644 --- a/README.md +++ b/README.md @@ -15,27 +15,9 @@ [![Duplicated Lines (%)](https://sonarcloud.io/api/project_badges/measure?project=com.exasol%3Aparquet-io-java&metric=duplicated_lines_density)](https://sonarcloud.io/dashboard?id=com.exasol%3Aparquet-io-java) [![Lines of Code](https://sonarcloud.io/api/project_badges/measure?project=com.exasol%3Aparquet-io-java&metric=ncloc)](https://sonarcloud.io/dashboard?id=com.exasol%3Aparquet-io-java) -This project provides a library that reads [Parquet](https://parquet.apache.org/) files into Java objects. - -## Installation - -Add this library as a dependency to your project's `pom.xml` file. - -```xml - - - com.exasol - parquet-io-java - LATEST VERSION - - -``` - -Please use the latest version of the library. +## In a nutshell -## Usage - -Here is a small example code showing the usage of the library. +This project provides a library that reads [Parquet](https://parquet.apache.org/) files into Java objects. ```java final Path path = new Path("/data/parquet/part-0000.parquet"); @@ -44,85 +26,13 @@ try (final ParquetReader reader = RowParquetReader .builder(HadoopInputFile.fromPath(path, conf)).build()) { Row row = reader.read(); while (row != null) { - List values = row.getValues(); - System.out.println(values); + System.out.println(row.getValues()); row = reader.read(); } -} catch (final IOException exception) { - // -} -``` - -## Data Type Mapping - -The following table shows how each Parquet data type is mapped into Java data -types. - -| Parquet Data Type | Parquet Logical Type | Java Data Type | -|:---------------------|:---------------------|:---------------| -| boolean | | Boolean | -| int32 | | Integer | -| int32 | date | Date | -| int32 | decimal(p, s) | BigDecimal | -| int64 | | Long | -| int64 | timestamp_millis | Timestamp | -| int64 | timestamp_micros | Timestamp | -| int64 | decimal(p, s) | BigDecimal | -| float | | Float | -| double | | Double | -| binary | | String | -| binary | utf8 | String | -| binary | decimal(p, s) | BigDecimal | -| fixed_len_byte_array | | String | -| fixed_len_byte_array | decimal(p, s) | BigDecimal | -| fixed_len_byte_array | uuid | UUID | -| int96 | | Timestamp | -| group | | Map | -| group | LIST | List | -| group | MAP | Map | -| group | REPEATED | List | - -### Parquet Repeated Types - -Parquet data type can repeat a single field or the group of fields. The -parquet-io-java (PIOJ) reads these data types into Java `List` type. - -For example, given the following Parquet schemas: - -``` -message parquet_schema { - repeated binary name (UTF8); -} -``` - -``` -message parquet_schema { - repeated group person { - required binary name (UTF8); - } -} -``` - -The PIOJ reads both of these Parquet types into Java list of `["John", "Jane"]`. - -On the other hand, you can import a repeated group with multiple fields as a -list of maps. - -``` -message parquet_schema { - repeated group person { - required binary name (UTF8); - optional int32 age; - } } ``` -The PIOJ reads it into a list of person maps: - -``` -[ Map("name" -> "John", "age" -> 24), Map("name" -> "Jane", "age" -> 22) ] -``` - +For installation, usage examples, data type mapping, and the API reference, see the [User Guide](doc/user_guide.md). ## Information for Users - [User Guide](doc/user_guide.md) diff --git a/doc/changes/changes_2.0.16.md b/doc/changes/changes_2.0.16.md index 6325535..8c7ed16 100644 --- a/doc/changes/changes_2.0.16.md +++ b/doc/changes/changes_2.0.16.md @@ -1,4 +1,4 @@ -# Parquet for Java 2.0.16, released 2026-05-?? +# Parquet for Java 2.0.16, released 2026-05-12 Code name: Migrate Scala to Java diff --git a/doc/user_guide.md b/doc/user_guide.md index c23edf5..78c3a8c 100644 --- a/doc/user_guide.md +++ b/doc/user_guide.md @@ -65,3 +65,70 @@ try (final RowParquetChunkReader reader = RowParquetChunkReader } } ``` + +## Data Type Mapping + +The following table shows how each Parquet data type is mapped into Java data types. + +| Parquet Data Type | Parquet Logical Type | Java Data Type | +|:---------------------|:---------------------|:---------------| +| boolean | | Boolean | +| int32 | | Integer | +| int32 | date | Date | +| int32 | decimal(p, s) | BigDecimal | +| int64 | | Long | +| int64 | timestamp_millis | Timestamp | +| int64 | timestamp_micros | Timestamp | +| int64 | decimal(p, s) | BigDecimal | +| float | | Float | +| double | | Double | +| binary | | String | +| binary | utf8 | String | +| binary | decimal(p, s) | BigDecimal | +| fixed_len_byte_array | | String | +| fixed_len_byte_array | decimal(p, s) | BigDecimal | +| fixed_len_byte_array | uuid | UUID | +| int96 | | Timestamp | +| group | | Map | +| group | LIST | List | +| group | MAP | Map | +| group | REPEATED | List | + +## Parquet Repeated Types + +Parquet data types can repeat a single field or a group of fields. Parquet IO Java reads these repeated types into Java `List` types. + +For example, given the following Parquet schemas: + +``` +message parquet_schema { + repeated binary name (UTF8); +} +``` + +``` +message parquet_schema { + repeated group person { + required binary name (UTF8); + } +} +``` + +Parquet IO Java reads both of these Parquet types into a Java list such as `["John", "Jane"]`. + +On the other hand, you can import a repeated group with multiple fields as a list of maps. + +``` +message parquet_schema { + repeated group person { + required binary name (UTF8); + optional int32 age; + } +} +``` + +Parquet IO Java reads it into a list of person maps: + +``` +[ Map("name" -> "John", "age" -> 24), Map("name" -> "Jane", "age" -> 22) ] +```