Lab work 9: Count words parallel using tbb high-level tools

Authors (team): Shevchenko Ivan
Additional tasks: implemented using both tbb::parallel_pipeline and tbb::flow_graph

Prerequisites

gcc
cmake
python3
libboost-all-dev
libtbb-dev
libfmt-dev (optionally for pretty output)

Compilation

./compile.sh

Usage: ./compile.sh [options]
  Options:
    -h      --help                  Show help message.
    -O      --optimize-build        Compile with optimization before executing (default).
    -o      --no-optimize-build     Compile without optimization before executing.
    -D      --debug-build           Compile with debug options.
    -d      --no-debug-build        Compile without debug options.
    -I      --install_prefix        Installation path.
    -R      --remove-build-dirs     Remove build dirs after the install.

Installation

xargs sudo apt-get install < dependencies/apt.txt
pip install -r dependencies/requirements.txt

Installation

xargs sudo apt-get install < dependencies/apt.txt
pip install -r dependencies/requirements.txt

Usage

C++

Usage:
        countwords_par_tbb [-h|--help] <config>
Allowed options::
  -h [ --help ]         Show help message
  -c [ --conf ] arg     Path to configuration file

Python

src/evaluate.py - is used for automatic launches of words counting multiple times

usage: evaluate.py [-h] [-c CONFIG] [-b EXEC] [-o OUT] times flush

positional arguments:
  times                 times to repeat evaluation
  flush                 buffers flushing 0 - no, 1 - yes

options:
  -h, --help            show this help message and exit
  -c CONFIG, --config CONFIG
                        path to a config file
  -b EXEC, --bin-executable EXEC
                        path to an executable
  -o OUT, --output OUT  output file for all evaluations

Important!

All optional arguments in python script have default values that are suitable for project configuration:

Default path to a config file is data/config.cfg
Default path to an executable is bin/countwords_par_tbb
Default path to a result file is data/times.csv

Python script does not work for executable compiled in DEBUG mode.

Default compilation without using any flags will result in program using tbb::flow_graph in its implementation. Adding an option -DCOUNT_WORDS_PIPELINE=ON to cmake will cause the program to use tbb::parallel_pipeline in its implementation.

The project can use the fmt library to format the result files if CMake can find the package. Otherwise, setw will be used, which can cause the wrong displaying of data, as some Unicode characters require several bytes to be represented that are not considered in setw.

Compilation via cmake will place the executable in a bin folder.

Results

I have started my program using the config data/config.cfg. Results of the execution for both modes can be accessed seen here sorted by alphabet and here sorted by number of occurrence (they are the same as previous lab, checked using diff util). Timings for launch using tbb::flow_graph are:

Total=1047504 (17.5 mins)
Finding=72363 (1.2 mins)
Reading=225172 (3.8 mins)
Writing=33221 (33.2 secs)

Timings for launch using tbb::parallel_pipeline are:

Total=1105124 (18.4 mins)
Finding=31007 (31.0 secs)
Reading=121199 (2.0 mins)
Writing=31806 (31.8 secs)

Worth to mention that the way of calculating finding and reading times has changed since last lab. Each function now calculates its execution time and add it to the global variable. Using this approach waiting time is not counted as it did in previous labs.

That is also noticeable that both implementation are faster than implementation using low-level concurrent queue. This is easily explainable, as using thread_queue we can only delegate some exact task for each thread, but using high-level tools it is up to them to dispatch task evenly across all available threads. As a result program that uses queue at some point not fully utilizes all the resources, but both flow_graph and parallel_pipeline do, resulting in faster execution.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
cmake		cmake
data		data
dependencies		dependencies
include		include
options_parser		options_parser
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
compile.sh		compile.sh
main.cpp		main.cpp
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lab work 9: Count words parallel using tbb high-level tools

Prerequisites

Compilation

Installation

Installation

Usage

C++

Python

Important!

Results

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lab work 9: Count words parallel using tbb high-level tools

Prerequisites

Compilation

Installation

Installation

Usage

C++

Python

Important!

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages