
Coarse-grained graph compilation for embedded multiprocessors
PhD thesis by Merten Popp
Versandkostenfrei!
Versandfertig in 6-10 Tagen
33,99 €
inkl. MwSt.
PAYBACK Punkte
0 °P sammeln!
This work addresses the challenges of mapping imaging and computer vision applications to embedded multiprocessor systems. Even though there is an abundance of data parallelism available for typical algorithms in this application domain, it is far from trivial to exploit this parallelism because of complex interactions between different optimization techniques. Furthermore, a given optimal solution will only be optimal on a specific hardware platform. On different platforms, it will not only show inferior performance, but might in the worst case, if for example local memories are smaller, not ...
This work addresses the challenges of mapping imaging and computer vision applications to embedded multiprocessor systems. Even though there is an abundance of data parallelism available for typical algorithms in this application domain, it is far from trivial to exploit this parallelism because of complex interactions between different optimization techniques. Furthermore, a given optimal solution will only be optimal on a specific hardware platform. On different platforms, it will not only show inferior performance, but might in the worst case, if for example local memories are smaller, not be feasible at all. Thus, cumbersome retargeting is required every time a novel platform emerges.In this work, we propose a programming model that enables the programmer to specify imaging and vision applications as a graph of kernels, independent of hardware details and without already prescribing a schedule, iteration order or buffer sizes. The goal of this work is to provide a coarse-grained graph compiler that can map this application graph in an automated process to an embedded multi-core architecture with small and explicitly managed program and data memories without caches. The proposed graph compiler optimizes the makespan of the application under these constraints. We call our tool a coarse-grained graph compiler because it schedules kernels and allocates data buffers in a similar fashion that a regular compiler schedules individual instructions and allocates individual registers. The output of our tool is a C program that is subsequently compiled by a regular "fine-grained" C compiler.We perform several experiments on complex imaging applications including the Local Laplacian filter. The results show that our graph compiler can adapt the given applications for a variety of different hardware platforms by using different methods to implement parallelized solutions until one is found that fits to the characteristics of the given hardware platform.