Jonathan Dursi

home banner

Hadoop For HPCers

I and my colleague Mike Nolta have put together a half-day tutorial on Hadoop - briefly covering HDFS, Map Reduce, Pig, and Spark - for an HPC audience, and put the materials on github. The Hadoop ecosystem of tools continues to rapidly grow, and now includes tools like Spark and Flink that are very good for iterative numerical computation - either simulation or data analysis. These tools, and the underlying technologies, are (or should be) of real interest to the HPC community, but most materials...

Continue...

Scalable Data Analysis in R

R is a great environment for interactive analysis on your desktop, but when your data needs outgrow your personal computer, it’s not clear what to do next. I’ve put together material for a day-long tutorial on scalable data analysis in R. It covers: A brief introduction to R for those coming from a Python background; The bigmemory package for out-of-core computation on large data matrices, with a simple physical sciences example; The standard parallel package, including what was the snow and multicore facilities, using airline...

Continue...

Present and Future Computing, Data, and Networks Committee of the Canadian Astronomical Society (CASCA)

This document is a whitepaper I wrote for the CASCA Computing and Data committee outlining the computing needs for the Canadian astronomy community for the coming several years. It does a fairly decent job of laying out the diverse range of large-scale R&D computing needs for the national community. Executive Summary Advanced research computing resources have never been so essential to the Canadian Astronomy and Astrophysics research community. In the past few years, astronomical researchers have benefited greatly from modern large-scale computing systems; a diverse...

Continue...

Stopping your program at the first NaN

If you know that somewhere in your program, there lurks a catastrophic numerical bug that puts NaNs or Infs into your results and you want to know where it first happens, the search can be a little frustrating. However, as before, the IEEE standard can help you; these illegal events (divide by zero, underflow or overflow, or invalid operations which cause NaNs) can be made to trigger exceptions, which will stop your code right at the point where it happens; then if you run your...

Continue...

Testing Roundoff

A talk has been circulating (HT: Hacker News) from a conference celebrating 50 years of scientific computing at Stanford where the author, William Kahan, discusses an old and sadly disused trick for testing the numerical stability of the implementation of an algorithm that should work with any C99 or Fortran 2003 compiler without changing the underlying code. It’s definitely a tool that’s worth having in your toolbox, so it’s worth mentioning here. We’ll consider a simple numerical problem; imagine a projectile launched from height $h...

Continue...

Codes as Instruments: Community Applications and Simulation Software for the Hardware Architectures of the Next Decade

It is becoming increasingly problematic that, even as computing and data becomes more and more fundamental to research, and the complexity and diversity of computing technologies out there grows, getting stable funding for developing high-quality research software remains so difficult. In this whitepaper for the CASCA 2010 Long Range Plan, my colleague Falk Herwig and I lay out the case for increased funding of R&D software development by professional research software developers. We make a couple points which I genuinely believe to be strong: First,...

Continue...
-->