|Up a level|
Dathathri, Roshan and Reddy, Chandan and Ramashekar, Thejas and Bondhugula, Uday (2013) Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory. In: 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT), SEP 07-11, 2013, Edinburgh, SCOTLAND, pp. 375-386.
Bandishti, Vinayaka and Pananilath, Irshad and Bondhugula, Uday (2012) Tiling stencil computations to maximize parallelism. In: 12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 2012, New York.
Pouchet, Louis-Noel and Bondhugula, Uday and Bastoul, Cedric and Cohen, Albert and Ramanujam, J and Sadayappan, P and Vasilache, Nicolas (2011) Loop transformations: convexity, pruning, and optimization. In: POPL '11 Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages, 2011, New York, NY, USA.
Bordawekar, Rajesh and Rao, Ravi and Bondhugula, Uday (2010) Believe it or Not! Multicore CPUs can Match GPUs for FLOP-intensive Applications. Report RC24982, IBM TJ.
Rao, Ravi and Bordawekar, Rajesh and Bondhugula, Uday (2010) Can CPUs Match GPUs on Performance with Productivity?: Experiences with Optimizing a FLOP-intensive Application on CPUs and GPU. Report RC25033, IBM T.J.
Ramashekar, Thejas and Bondhugula, Uday (2013) Automatic Data Allocation and Buffer Management for Multi-GPU Machines. In: ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 10 (4).