Skip to content

ReadFromAndWriteToBuffersAndImages

Olivier Chafik edited this page Nov 10, 2022 · 3 revisions

General information on how to write data to OpenCL memory objects (Buffers and 2D/3D Images

Introduction

This page describes common properties and principles of buffers and images, but another page is dedicated to image-specific features : [UsingImages].

OpenCL Memory Objects

OpenCL lets you create memory objects of three types : buffers (1D arrays), 2D images and 3D images.

These memory objects can be stored in the host memory (typically, in RAM) or in the device memory (typically, in GRAM directly on the graphic card).

Memory allocation : host vs. device

For host memory-stored memory objects, you can either ask OpenCL to allocate memory on its own or directly providing the pointer to some memory you allocated (and initialized) by yourself.

When using a host-allocated memory object, OpenCL might let the kernels access directly to the data from the host or instead choose to cache some of this data, in read and/or write modes. What it means is that if you create a direct buffer of values and create an OpenCL buffer out of it, in "use host pointer" mode, you cannot expect your changes done on your direct buffer to be visible from OpenCL kernels using the OpenCL buffer. Before and after doing any change to or reading any data from a host pointer, you must call the CLBuffer.map / unmap methods.

Lastly, OpenCL might let you read/write directly from an OpenCL memory object with the map/unmap mechanism. Mapping a memory object gives a direct pointer to the memory object's data (for instance using a DMA mechanism). However, mapping a memory object's data is only guaranteed to work in the case of host-allocated pointers, so it should be used with caution.

Creating Memory Objects with JavaCL

Creating Buffers

While OpenCL defines buffers as arrays of bytes, JavaCL provides support for typed buffers with 1 to 1 mapping to NIO buffers.

If you use a __global float* argument in a kernel of yours, you can either choose to use a CLBuffer<Byte> and do the float <-> byte conversion and offsets calculations on your own or directly use a CLBuffer<Float> that will take care of everything.

Host-allocated pointers

JavaCL / BridJ JavaCL / JNA
``` CLContext context = ... ;

int size = 100; Pointer ptr = Pointer.allocateFloats(size).order(context.getByteOrder());

// Last 'false' argument requires that no copy of the pointer is made : // keep ptr as the primary data source CLBuffer clBuffer = context.createBuffer(CLMem.Usage.InputOutput, ptr, false);

</td><td>

CLContext context = ... ;

int size = 100; FloatBuffer ptr = ByteBuffer.allocateDirect(size * 4).byteOrder(context.getByteOrder()).asFloatBuffer();

// Last 'false' argument requires that no copy of the pointer is made : // keep ptr as the primary data source CLBuffer clBuffer = context.createBuffer(CLMem.Usage.InputOutput, ptr, false);

</td></tr></table>

####  Device-allocated pointers 

Allocate a zeroed-out buffer :

int size = 100; CLBuffer buffer = context.createFloatBuffer(CLMem.Usage.InputOutput, size);


Allocate a buffer as a copy of existing data :

// Last 'true' argument asks for a copy of ptr to be made in the device memory. // The clBuffer will have no further link to ptr. CLBuffer clBuffer = context.createBuffer(CLMem.Usage.InputOutput, ptr, true);


###  Reading from / Writing to a buffer 

CLContext context = ... ; // e.g. JavaCL.getBestContext() CLQueue queue = ... ; // e.g. context.createDefaultQueue() CLBuffer clBuffer = ... ; // e.g. context.createFloatBuffer(CLMem.Usage.InputOutput, size)

<table><tr><th>JavaCL / BridJ</th><th>JavaCL / JNA</th></tr><tr><td>

// Reading everything to a new native memory location : Pointer data = clBuffer.read(queue);

// Reading elements 3 to 9 included to a new native memory location : data = clBuffer.read(queue, 3, 7);

// Reading elements 3 to 9 included to existing data memory location : data = Pointer.allocateFloats(7).order(context.getByteOrder()); clBuffer.read(queue, 3, 7, data, true); // 'true' : blocking operation

</td><td>

// Reading everything to a new buffer : FloatBuffer data = clBuffer.read(queue);

// Reading elements 3 to 9 included to a new buffer : data = clBuffer.read(queue, 3, 7);

// Reading elements 3 to 9 included to existing data buffer : // (note that while indirect buffers will work, direct buffers will provide much better performance) data = NIOUtils.directFloats(7, context.getByteOrder()); clBuffer.read(queue, 3, 7, data, true); // 'true' : blocking operation

</td></tr></table>

###  Mapping a buffer for read / write operations 

Mapping a buffer is only guaranteed to work for buffers allocated from host-pointers.

<table><tr><th>JavaCL / BridJ</th><th>JavaCL / JNA</th></tr><tr><td>

CLBuffer clBuffer = ... ;

try { Pointer data = clBuffer.map(queue. CLMem.MapFlags.Write); data.set(10f); // write something to the buffer clBuffer.unmap(queue, data); } catch (CLException.MapFailure ex) { // map didn't succeed : maybe use CLBuffer.write / read instead. }

</td><td>

CLBuffer clBuffer = ... ;

try { FloatBuffer data = clBuffer.map(queue. CLMem.MapFlags.Write); data.put(0, 10f); // write something to the buffer clBuffer.unmap(queue, data); } catch (CLException.MapFailure ex) { // map didn't succeed : maybe use CLBuffer.write / read instead. }

</td></tr></table>

##  Important Note on Endianness 

Each OpenCL device can have a different [endianness](https://en.wikipedia.org/wiki/Endianness).

This makes it very hard to share the same buffers between two devices with mismatching endianness (for instance, take a little endian Intel Core device vs. a big endian Radeon 4850  GPU device on the same ATI Stream platform).

CLContext.getByteOrder() can be called to check that the context has a consistent byte ordering (checks that all of its device's CLDevice.getByteOrder() methods return the same order, and returns null in case it's mismatching).

All the CLBuffer read/map methods should return properly ordered NIO buffers (using the order of the queue's device.

##  Garbage Collection vs. Manual Release 

JavaCL maps OpenCL entities (allocated by the OpenCL driver, typically in the device memory) to Java objects (managed by the JVM's garbage collector).

OpenCL entities are released when their Java objects counterparts are garbage collected.

In many cases this can lead to serious issues : when the OpenCL driver runs out of memory, it does not tell Java to try and collect unused objects (which would release a few OpenCL entities in the process) and just fails, which makes JavaCL throw a CLException.MemObjectAllocationFailure or OutOfResources exception.

To avoid that, one can manually release an unused buffer (or any JavaCL entity) by calling CLEntity.release() (CLEntity is a base class which is inherited by CLBuffer, CLImage2D, CLProgram, CLEvent... virtually all JavaCL classes of interest).

JavaCL features a workaround for allocations that fail due to lack of OpenCL memory (it then calls the Java Garbage Collector and waits a while before retrying), but it is advised to call release() everytime possible (but only once on each object, of course).

Clone this wiki locally