ALPyNA - Automatic Loop Parallelisation in Python for Novel Architectures

ALPyNA is a loop parallelisation framework that applies classical loop parallelisation techniques as popularised by Allen and Kennedy[5]. ALPyNA generates and JIT compiles GPU kernels from linear looop nests written in imperative Python.

Loop domain sizes are determined for each instance of the executing loop nest by using runtime introspection. An analytical cost model [2] is used to determine the optimal device (CPU or GPU) to generate and JIT compile code for. The cost model requires a very short profiling period at installation time before the first execution.

ALPyNA has been tested with Python v3.5.

Installation

Prerequisites

ALPyNA has been tested to run on Ubuntu and Debian Linux, with CUDA drivers. However it should be able to run with the installation of the following dependences.

CUDA (see nvidia installation instructions).
Python virtual-environment
Numpy
Numba

Install a virtual environment

Create a new virtual environment. python3 -m venv <venv-root>/alpyna-virt-env and switch to the virtual environment. source <venv-root>/alpyna-virt-env/bin/activate.
If required install the astor python module. pip install astor

Enable Numba to locate CUDA library paths. (On my Debian installation:)

export NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so 
export NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice

Install time profiling

The ALPyNA cost-model (ACM) requires a short one time profiling run on each hardware set-up.

Add the nvidia GPU hardware characteristics to the hw_param map in src/Hardware/cuda_gpu.py and set the gpu_model to the model in your hardware setup.

gpu_model = 'gtx-1060'
hw_parm = {
'gtx-1060': {
              "sm"  : 9 ,
              "ws"  : 32 ,
              "wsched" : 4
            },
'titan-xp': {
              "sm"  : 30 ,
              "ws"  : 32 ,
              "wsched" : 4
            }
}

The variables are :

Number of Streaming Multiprocessors (sm)
Warp Size (ws)
Number of warp schedulers within each Streaming Multiprocessor (wsched)

Bug: Also change the line self.hw in constructor of class GPU_Exec_Cost() which is currently hardcoded to the model of the GPU. This is on the list of things to do.

The hardware parameters for the CPU and the GPU should be passed into the profiling tool within the source Static_Profile_Setup.py.

if __name__ == '__main__':
    cpu_param = CPU_Param(800, 8192)
    gpu_param = CUDA_GPU_Param(1500, 1536, 9 * 2, 8192)
    _init_profile(cpu_param, gpu_param)

CPU_Param uses two parameters:
1. CPU single core maximum frequency
2. Last level cache Size
GPU_Param uses four parameters:
1. GPU single core maximum frequency
2. Last level cache size
3. Cache ratio : This is the number of L1 GPU caches that share the Last level cache within the GPU.
4. Data Transfer bandwidth. This was calculated offline using nvprof. Given in units of MiBps.

Installation time profiling is executed by executing the command python3 Static_Profile_Setup.py. This will generate a file .alpyna_profile.jsonwhich will be used in all subsequent execution runs of ALPyNA.

Execution

The static_nalyse() function (in Static_Analysis_Driver.py) is called once per program invocation. It takes as parameters all the source code intended to be analysed for GPU/CPU code generation and JIT compilation. It statically analyses the code, generates skelatal kernels and in-memory data-structures to be used by ALPyNA's runtime.

The static-analyse() function returns a Python module which can be de-referenced with the name of the function required by the programmer from this point onwards. For e.g. the test harness looks like the following code:

def init_test_harness(filename):
    with open(filename, mode="r") as fd:
        _opts = flt_util.ALPyNA_Options()
        _opts.read_config()

        code = fd.read()
        _as_mod = parloop.static_analyse(code, _opts)
        return _as_mod
    return None

if __name__ == '__main__':
    alp_obj = init_test_harness("tests.py")
    # e.g. Create numpy arrays arg_1 and arg_2
    alp_obj.my_loopy_func(arg_1, arg_2)

To do

Automate probing of hardware characteristics both for installation profile generation and for normal execution.
Add support for openCL GPUs.

References

Dejice Jacob. 2020. Opportunistic acceleration of array-centric Python computation in heterogeneous environments. PhD thesis (University of Glasgow), February 16, 2021, UK, doi: 10.5525/gla.thesis.82011
Dejice Jacob, Phil Trinder, and Jeremy Singer. 2020. Pricing Python Parallelism: a Dynamic Language Cost Model for Heterogeneous Platforms. In Proceedings of the 16th ACM SIGPLAN International Symposium on Dynamic Languages (DLS ’20), November 17, 2020, Virtual, USA, doi: 10.1145/3426422.3426979.
Dejice Jacob, Phil Trinder, and Jeremy Singer. 2019. Python Programmers Have GPUs too: Automatic Python Loop Parallelization with Staged Dependence Analysis. In Proceedings of the 15th ACM SIGPLAN International Symposium on Dynamic Languages (DLS ’19), October 20, 2019, Athens, Greece, 42-54 doi: 10.1145/3359619.3359743
Dejice Jacob and Jeremy Singer. 2019. ALPyNA: acceleration of loops in Python for novel architectures. In Proceedings of the 6th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming (ARRAY 2019). ACM, New York, NY, USA, 25-34. doi: 10.1145/3315454.3329956.
Ken Kennedy and John R. Allen. 2001. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
doc		doc
src		src
COPYING.LESSER		COPYING.LESSER
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ALPyNA - Automatic Loop Parallelisation in Python for Novel Architectures

Installation

Prerequisites

Install a virtual environment

Install time profiling

Execution

To do

References

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

ALPyNA - Automatic Loop Parallelisation in Python for Novel Architectures

Installation

Prerequisites

Install a virtual environment

Install time profiling

Execution

To do

References

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages