Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 4 additions & 94 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,27 +15,9 @@
[![Duplicated Lines (%)](https://sonarcloud.io/api/project_badges/measure?project=com.exasol%3Aparquet-io-java&metric=duplicated_lines_density)](https://sonarcloud.io/dashboard?id=com.exasol%3Aparquet-io-java)
[![Lines of Code](https://sonarcloud.io/api/project_badges/measure?project=com.exasol%3Aparquet-io-java&metric=ncloc)](https://sonarcloud.io/dashboard?id=com.exasol%3Aparquet-io-java)

This project provides a library that reads [Parquet](https://parquet.apache.org/) files into Java objects.

## Installation

Add this library as a dependency to your project's `pom.xml` file.

```xml
<dependencies>
<dependency>
<groupId>com.exasol</groupId>
<artifactId>parquet-io-java</artifactId>
<version>LATEST VERSION</version>
</dependency>
</dependencies>
```

Please use the latest version of the library.
## In a nutshell

## Usage

Here is a small example code showing the usage of the library.
This project provides a library that reads [Parquet](https://parquet.apache.org/) files into Java objects.

```java
final Path path = new Path("/data/parquet/part-0000.parquet");
Expand All @@ -44,85 +26,13 @@ try (final ParquetReader<Row> reader = RowParquetReader
.builder(HadoopInputFile.fromPath(path, conf)).build()) {
Row row = reader.read();
while (row != null) {
List<Object> values = row.getValues();
System.out.println(values);
System.out.println(row.getValues());
row = reader.read();
}
} catch (final IOException exception) {
//
}
```

## Data Type Mapping

The following table shows how each Parquet data type is mapped into Java data
types.

| Parquet Data Type | Parquet Logical Type | Java Data Type |
|:---------------------|:---------------------|:---------------|
| boolean | | Boolean |
| int32 | | Integer |
| int32 | date | Date |
| int32 | decimal(p, s) | BigDecimal |
| int64 | | Long |
| int64 | timestamp_millis | Timestamp |
| int64 | timestamp_micros | Timestamp |
| int64 | decimal(p, s) | BigDecimal |
| float | | Float |
| double | | Double |
| binary | | String |
| binary | utf8 | String |
| binary | decimal(p, s) | BigDecimal |
| fixed_len_byte_array | | String |
| fixed_len_byte_array | decimal(p, s) | BigDecimal |
| fixed_len_byte_array | uuid | UUID |
| int96 | | Timestamp |
| group | | Map |
| group | LIST | List |
| group | MAP | Map |
| group | REPEATED | List |

### Parquet Repeated Types

Parquet data type can repeat a single field or the group of fields. The
parquet-io-java (PIOJ) reads these data types into Java `List` type.

For example, given the following Parquet schemas:

```
message parquet_schema {
repeated binary name (UTF8);
}
```

```
message parquet_schema {
repeated group person {
required binary name (UTF8);
}
}
```

The PIOJ reads both of these Parquet types into Java list of `["John", "Jane"]`.

On the other hand, you can import a repeated group with multiple fields as a
list of maps.

```
message parquet_schema {
repeated group person {
required binary name (UTF8);
optional int32 age;
}
}
```

The PIOJ reads it into a list of person maps:

```
[ Map("name" -> "John", "age" -> 24), Map("name" -> "Jane", "age" -> 22) ]
```

For installation, usage examples, data type mapping, and the API reference, see the [User Guide](doc/user_guide.md).
## Information for Users

- [User Guide](doc/user_guide.md)
Expand Down
2 changes: 1 addition & 1 deletion doc/changes/changes_2.0.16.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Parquet for Java 2.0.16, released 2026-05-??
# Parquet for Java 2.0.16, released 2026-05-12

Code name: Migrate Scala to Java

Expand Down
67 changes: 67 additions & 0 deletions doc/user_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,3 +65,70 @@ try (final RowParquetChunkReader reader = RowParquetChunkReader
}
}
```

## Data Type Mapping

The following table shows how each Parquet data type is mapped into Java data types.

| Parquet Data Type | Parquet Logical Type | Java Data Type |
|:---------------------|:---------------------|:---------------|
| boolean | | Boolean |
| int32 | | Integer |
| int32 | date | Date |
| int32 | decimal(p, s) | BigDecimal |
| int64 | | Long |
| int64 | timestamp_millis | Timestamp |
| int64 | timestamp_micros | Timestamp |
| int64 | decimal(p, s) | BigDecimal |
| float | | Float |
| double | | Double |
| binary | | String |
| binary | utf8 | String |
| binary | decimal(p, s) | BigDecimal |
| fixed_len_byte_array | | String |
| fixed_len_byte_array | decimal(p, s) | BigDecimal |
| fixed_len_byte_array | uuid | UUID |
| int96 | | Timestamp |
| group | | Map |
| group | LIST | List |
| group | MAP | Map |
| group | REPEATED | List |

## Parquet Repeated Types

Parquet data types can repeat a single field or a group of fields. Parquet IO Java reads these repeated types into Java `List` types.

For example, given the following Parquet schemas:

```
message parquet_schema {
repeated binary name (UTF8);
}
```

```
message parquet_schema {
repeated group person {
required binary name (UTF8);
}
}
```

Parquet IO Java reads both of these Parquet types into a Java list such as `["John", "Jane"]`.

On the other hand, you can import a repeated group with multiple fields as a list of maps.

```
message parquet_schema {
repeated group person {
required binary name (UTF8);
optional int32 age;
}
}
```

Parquet IO Java reads it into a list of person maps:

```
[ Map("name" -> "John", "age" -> 24), Map("name" -> "Jane", "age" -> 22) ]
```
Loading