Today’s computing environments are becoming extremely complex, utilizing the capabilities of a wide range of multi-core processors, reconfigurable hardware (e.g. FPGAs), digital signal processors, and graphic processing units (GPUs). The process of developing efficient software that would work across such multiple architectures poses a wide number of challenges for the programmer. An application possess a number of workload behaviours, ranging from compute intensive (e.g., numerical methods, financial modelling and iterative methods) to control intensive (e.g., searching, sorting, and parsing) to data intensive (e.g., simulation & modelling, data mining, and image processing) where the overall throughput of the application is heavily dependent on the computational efficiency of the underlying hardware.