Jonathan Dursi

home banner

Objections, Continued

Thanks for all of the comments about my HPC and MPI post, on the post itself, or on twitter, or via email. While much of the comments and discussions were positive, it won’t surprise you to learn that there were objections, too; so I thought I’d keep updating the Objections section in a new post. I’ve also posted one (hopefully last) followup. But do keep sending in your objections! Further Objections You’re saying we’d have to rewrite all our code! If someone had suggested I...

Continue...

HPC is dying, and MPI is killing it

Pictured: The HPC community bravely holds off the incoming tide of new technologies and applications. Via the BBC. This should be a golden age for High Performance Computing. For decades, the work of developing algorithms and implementations for tackling simulation and data analysis problems at the largest possible scales was obscure if important work. Then, suddenly, in the mid-2000s, two problems — analyzing internet-scale data, and interpreting an incoming flood of genomics data — arrived on the scene with data volumes and performance requirements which...

Continue...

Spark in HPC clusters

Over the past several years, as research computing centres and others who run HPC clusters tried to accommodate other forms of computing for data analysis, much effort went into trying to incorporate Hadoop jobs into the scheduler along with other more traditional HPC jobs. It never went especially well, which is a shame, because it seems that those past unsuccessful attempts have discouraged experimentation with related next-generation technologies which are a much better fit for large-scale technical computing. Hadoop v1 was always going to be...

Continue...

Machine Learning for Scientists

I recently taught a 1-day machine learning workshop for scientists for the good folks at SciNetHPC. There was enough interest (nearly forty people signed up for a day-long session near the end of term) that we had to book a large-ish classroom. There’s a lot of interest in the topic — which might even be surprising, given that a lot of the material is either familiar or pretty easy to digest for those who spend a lot of their time doing scientific data analysis. But...

Continue...

The Shell For Scientists

I’ve posted a half-day “The Shell for Scientists” tutorial that I’ve given variants on a number of times; the motivating problem, provided by Greg Wilson for a two-day set of of tutorials at the University of Toronto, was cleaning up a bunch of auditory lab data on people’s cochlear implants. The focus is on productivity and automation; PDF slides are available here (although I really should translate them into a markdown-based format to make them more re-usable). Covered are a number of basic shell commands...

Continue...

Floating-Point Data Shouldn't Be Serialized As Text

Write data files in a binary format, unless you’re going to actually be reading the output - and you’re not going to be reading a millions of data points. The reasons for using binary are threefold, in decreasing importance: Accuracy Performance Data size Accuracy concerns may be the most obvious. When you are converting a (binary) floating point number to a string representation of the decimal number, you are inevitably going to truncate at some point. That’s ok if you are sure that when you...

Continue...
-->