Challenges faced by Utility Grid computing

Print
Compute Grid

The ever growing demand for computing from industry has lead to excessive resource consumption issues which is not only impacting sustainability of compute and utility grids in terms of cost but also from environment perspective as well. Since Grid computing consists of several High Performance Computing centers under different domain, it makes the problem even more difficult. The process of managing resources and scheduling computations over the Grid is complex as they are heterogeneous in nature, distributed, owned by different organizations/individuals with their own policies, have varying load/availability and different cost models. This introduces a huge number of challenges such as heterogeneous substrate, policy extensibility, site autonomy, online control, resource allocation, transparency, and scalability. Some of these issues are currently being addressed by tool-kits such as Globus, gLite and UNICORE. For effective resource utilization, we need to address the various challenges by effectively distributing parallel applications on grid.

In most grids computing environments a workflow specification is generated with the aid of Grid information services such as VDS (Virtual Data System) and MDS (Monitoring and Discovery Services) prior to the run time. A workflow specification defines tasks and their data dependencies. An Enactment engine then manages the execution of workflow. There are three major components in an Enactment engine: the scheduling, data movement and fault management. Scheduling process discovers resources and will allocate tasks to suitable resources, while data movement manages the transfer of data between resources. Fault management provides mechanisms to manage faults during execution on grid.  Scheduling is one of the critical issues in the workflow management.


Effective scheduling algorithms can have a significant impact on the overall performance. Current algorithms are not able to generate optimal solutions for mapping tasks on distributed systems within polynomial time. Solutions based on exhaustive search of resources are not practical as the overhead is very high. Scheduling decisions have to be quick as there may be many users competing for resources. Another challenge is the fact that resources are never completely under the control of the scheduler in a heterogeneous environment. Based on the changes in the environment, the scheduler has to recreate the decision for resource allocation, which introduces overhead.  Also, Heterogeneous environments do not perform tasks identically. Data-intensive application pose additional challenges as large data sets need to be transferred & executed over multiple systems. Finally security of the data across a networked environment is one of the biggest challenges which still require huge commitment from the grid computing providers.

Paradigm solutions

An event driven grid middle-ware can solve some of the issues faced by scheduler by introducing a multi-hop communication. This will utilize a lightweight rendezvous routing algorithm on Subscribe/Publish system which will facilitate an event based communication among the various systems in the heterogeneous environment and reduce the overall overhead of identifying the current state of systems.
A syntactic interface can be used to partition the large data into set of interdependent data sets which can be distributed and executed in parallel.
Pipeline parallelism paradigm can be introduced which will allow dependent steps to overlap their execution by having the output of one process chained (piped) directly into the next process as input. This will significantly reduce the overall execution time.

Final Thoughts:
When will Grid computing become ubiquitous is still questionable. Although it is being used by public sectors to assist Research institutes, there’s still more to be done before it becomes synonymous with every day computing.

References :

http://ir.canterbury.ac.nz/bitstream/10092/3060/1/12618906_12618906_Pawlikowski.pdf
http://cs.iit.edu/~scs/psfiles/GCC02_XHe.pdf
http://www.cloudbus.org/reports/EOSJArticleTR05.pdf
http://dspace.mit.edu/handle/1721.1/31169
http://airccse.org/journal/ijcses/papers/0211cses07.pdf

*If you find something is misleading or not correct then please throw some light on it.