Author

Sustain is a Verb: Code is Sustained or Not Sustained, not 'Sustainable'

software

(Note: This post is an excerpt from #182 of the Research Computing Teams Newsletter) So you, reader, will already understand that software is not “sustainable”. There’s no sustainability linter you can run over the code to highlight possible sustainability issues, no test suite you can run to check for sustainability regressions. Sustainability is not an inherent property of a piece of software. Same with a computing system, or a curated database, or.. Instead, these efforts are sustained, or not, by people or organizations who pay...

Gratis Offering, Risk, and our Post-ZIRP environment

strategy

(Note: This post is adapted from #170 of the Research Computing Teams Newsletter) It’s fantastic today that there’s so many free (gratis) tiers of service and packages of software, open source or otherwise, that we can use as the foundations for the computing, software, or data services we offer to our research communities. It really is! I feel that viscerally, because when I was coming of age in this community, proprietary and only barely interoperable OSes, compilers, libraries, resource managers, data platforms… were the norm,...

Buy and Lease, not Cloud vs On-Prem

strategy

(Note: This post is adapted from #130 of the Research Computing Teams Newsletter) I’d like us to move past the “cloud-vs-on-prem” debate. Right now, AWS or GCP will deliver their cloud hardware into your data centres to run there, if you want. Various commercial software can be subscribed to to manage infrastructure control. Hardware can be leased, bought, sold back. If your data centre is a co-lo, so the premises aren’t yours, is it really on premises? And… There’s a whole spectrum of options available...

Buy and Lease, not Cloud vs On-Prem

strategy

(Note: This post is adapted from #130 of the Research Computing Teams Newsletter) I’d like us to move past the “cloud-vs-on-prem” debate. Right now, AWS or GCP will deliver their cloud hardware into your data centres to run there, if you want. Various commercial software can be subscribed to to manage infrastructure control. Hardware can be leased, bought, sold back. If your data centre is a co-lo, so the premises aren’t yours, is it really on premises? And… There’s a whole spectrum of options available...

The Utility vs the Professional Services Firm

strategy

As research computing and data becomes more complex and diverse, we need more professional services firms and fewer utilities (Note: This post is adapted from #127 of the Research Computing Teams Newsletter) I get to talk with a lot of research computing and data teams - software, data, and systems. Sometimes in these conversations it’s pretty clear that some teams, or the team and their funder, or a team and I, are talking a bit past each other. And that’s usually because they or we...

What I've Learned from Looking at 1,500 Jobs Leading Research Computing Teams

career

Job numbers continue to grow; lots of data and product management jobs; IR groups at Universities becoming bigger employers (Note: This post is adapted from #111 of the Research Computing Teams Newsletter) A year and a half ago I posted my observations on the first 500 jobs posted to the job board - we’re getting close to 1,500 now, and it’s worth taking a look to see what if anything has changed in research computing team leadership and management jobs1. There are some trends that...

Researcher's Time Has Value, Too

strategy

..And Researchers Value Their Time (Note: This post is adapted from #102 of the Research Computing Teams Newsletter) If you followed HPC twitter in late 2021 at all, you will have seen a heartfelt thread by a well-known research software developer, one who was a key contributor to the Singularity project among others, lamenting the frankly appalling state of developer productivity in HPC - both in what tools exist, and support for them (and other tools for developers) at academic centres. A lot of people...

To Compete, Your Team Needs a Specialty

And ‘HPC’ or ‘Research Software Development’ isn’t a specialty (Note: This post is adapted from #90 of the Research Computing Teams Newsletter) Quick: what’s your team’s specialty? Your team’s specialty is its reputation for what it’s good at. Not what you think your team is good at; what matters is what specific thing your stakeholders (funders, clients, institutional decision makers) think your specialty is. What they recommend you for to peers, what they recommend funding you for to decision makers. In the post-pandemic world, researchers...

Research Computing Funding Should Mostly Just Go To Researchers

Research computing and data — supporting research efforts with software, computer and data expertise and resources — is fundamentally all of a piece. Today there’s fewer and fewer hard boundaries between where the system requirements end and where the software or data resource requirements begin; and teams supporting researchers must have expertise across the stack. This convergence is a huge opportunity for research computing, but it’s also a challenge for funders. How to know how much to allocate to software, and how much to hardware?...

Nobody Else Cares About Your Tech Stack

Focus on your researchers’ and funders’ problems, not your technical solution (Note: This post is adapted from #75 of the Research Computing Teams Newsletter) Many of us who are managing research computing and data teams come up through the ranks doing research ourselves, and have experience in grantwriting for open research calls. That can actually hold us back from succeeding with getting grants for “digital research infrastructure” — building teams and infrastructure to support research. The thing is, digital research infrastructure calls, the sort that...

When Research Infrastructure Is and Isn't Maintained

funding

(Note: This post is adapted from #53 of the Research Computing Teams Newsletter) There were two big stories in the news this week (as I write this, at the end of 2020) about what’s possible with sustained research infrastructure funding and what happens when research infrastructure isn’t sustained. In the first, you’ve probably read about AlphaFold, Google Brain’s efforts to bring deep learning to protein folding. It did very well in the 14th annual Critical Assessment of (protein) Structure Prediction (CASP) contest. Predictably but unfortunately,...

Buckle up, CPUs are going to get weirder

The M1 is a good test run, let’s get ready (Note: This post is adapted from last week’s issue 51 of the resarch computing teams newsletter) The big news of the past month has been Apple’s new M1 CPU. The M1’s specs in and of themselves kind of interesting, but more important to us in research computing is that the M1 is an example of how CPUs are going to get more different as time goes on, and that will have impacts on our teams....

What will Post-Pandemic Academic Research Computing Look Like?

We’re nowhere near the endgame yet. But even now in the middle of the COVID-19 times it is not too soon to think about what research computing will look like when the threat of infection by SARS-CoV-2 no longer shapes our work lives. While the future looks good for research computing team individual contributors who are willing to learn on the fly, the coming years will be treacherous for teams as organizations, and their managers. What hath 2020 wrought There’s a few pretty unambiguous “inputs”...

Things I Learned from Looking at 500 Research Computing Manager Jobs over 10 Months

management

I write a weekly newsletter for research computing managers, team leads, or those aspiring to those roles. One of the things I’ve wanted to emphasize in the newsletter is that managing research computing teams is a profession in and of itself, and worth doing well. Part of that is emphasizing the existence of career opportunities. So since the beginning I’ve included job listings and maintained a job board, posting about 500 such jobs over the past 10 months and removing them as they become filled...

White Managers in Research Computing, We Need to be Speaking Out About Racism, then Listening and Advocating

management

Many people in our research computing community — and in the broader research community we serve — are in pain this week. There’s another video of another Black man, George Floyd, begging for his life while being murdered by a police officer in Minneapolis. Here in Toronto a Black woman, Regis Korchinski-Paquet, died when what should have been a routine call resulted in a mystifying number of police officers showing up. With only police officers present in her apartment, she went over her high-rise balcony...

COBOL, Imperial College, Bursty Maintenance, and Sustained Scientific Software

We’ve all read about the huge rise in unemployment claims causing unprecedented loads on US state software systems, with the situation so dire that the governor of New Jersey put out an urgent call for COBOL programmers. It’s worth looking at this from the point of view of research software, where we need software to be sustainable and reproducible for long periods of time. The systems that need suddenly need COBOL developers have often been chugging away with maintenance and tweaks for 40–50 years. This...

How To Quickly Start One-on-Ones with your Research Computing Team: A One-Week Plan of Action

Research computing teams around the world are finding themselves working completely remotely suddenly. As a manager, you’ve gotten over the first hump and made sure everyone has the tools they need - software, VPN access, accounts on whatever chat and videoconferencing tools you’ll need. Now what? We all know that remote teams need more communication than on-site teams, so you’ll need to start communicating more. This is a perfect time to start doing one-on-ones if you haven’t been doing them already. What follows is a...

The Purpose of Research Computing is the Research, not the Computing

Absolutely everyone in research computing will agree that supporting research is their centre’s highest goal. And they’re not lying, but at many centres I’ve visited, they aren’t really correct, either. The day-to-day work in such a centre, naturally enough, is all about technical operations - keeping the computers running, updating software, making sure /scratch has enough space free, answering emails. And of course, it has to be. But without internal champions actively and continually turning the focus back to the purpose of those activities -...

Computational Science Collaborations Train Great Managers - But Trainees Might Need Help To Become Good Managers First

What I write below likely applies to fields of theoretical and observational science that involve collaborations, too. I think the experiences that trainees in laboratory science are likely significantly different, as are those people who spent a large amount of time working in a single group in a well-defined large project. I’d certainly like to hear from colleagues from those areas; are there similarities, or are things quite different? We don’t like to talk about it much, but the natural career path in academia -...

What Should a National Research Computing Platform Be?

What is a National Research Computing Platform For in 2019? Computers are everywhere now, but computing is still hard. Canada should build on its competitive advantage by strengthening existing efforts to provide expertise, skills and training to researchers and scholars across the country, and let others provide the increasingly commodity hardware. The result will be a generation of trainees with deep research and cloud experience, and a critical mass of talent at centres focussed on building enabling technologies. As R&D becomes increasingly intertwined with computational...

A Killer Feature for Scientific Development Frameworks: An Incremental Path To Maturity

hpc

( Note: This is a bit of a work in progress; even more so than usual, comments/criticisms/additions welcome ) The Stages of Research Software Development Research software development covers a lot of ground — it’s the development of software for research, and research is a broad endeavour that covers a lot of use cases. The part of research software development that I find the most interesting is the part that is a research effort itself; the creation of new simulation methods, new data analysis techniques,...

Chapel's Home in the Landscape of New Scientific Computing Languages

I was invited to speak at this past weekend’s fourth annual Chapel Implementers and Users Workshop (CHIUW 2017). It was a great meeting, with lots of extremely high-quality talks on work being done with and on Chapel. The slides from the presentations will be up shortly, and I recommend them - the libfabric, KNL, use-after-free tracking, and GraphBLAS works were of particular interest to me. The Code Camp on the next day, working with members the Chapel team on individual particular projects, was also a...

Compute Canadian: Building a successful and federated computational research enterprise, together

Canada is a federated nation, and this is particularly visible in areas of research funding, where both the federal and provincial orders of government play a role. In building a successful digital research infrastructure to support Canadian science and scholarship, we must recognize that reality, and rely on the successful examples of many organizations in Canada and around the world that embrace such a federated approach. In this discussion paper, my colleague Jill Kowalchuck and I lay out what we hope to be the beginnings...

Should I use Chapel or Julia for my next project?

Julia and Chapel are both newish languages aimed at productitive scientific computing, with parallel computing capabilities baked in from the start. There’s lots of information about both online, but not much comparing the two. If you are starting a new scientific computing project and are willing to try something new, which should you choose? What are their strengths and weaknesses, and how do they compare? Here we walk through a comparison, focusing on distributed-memory parallelism of the sort one would want for HPC-style simulation. Both...

Beyond Single Core R: Parallel Data Analysis

I was asked recently to do short presentation for the Greater Toronto R Users Group on parallel computing in R; My slides can be seen below or on github, where the complete materials can be found. I covered some similar things I had covered in a half-day workshop a couple of years earlier (though, obviously, without the hands-on component): How to think about parallelism and scalability in data analysis The standard parallel package, including what was the snow and multicore facilities, using airline data as...

MPI's Place in Big Computing

The organizers of EuroMPI 2016 were kind enough to invite me to give a keynote and participate in a panel at their meeting, which was held at the end of September in beautiful Edinburgh. The event was terrific, with lots of very interesting work going on in MPI implementations and with MPI. The topic of my talk was “MPI’s Place in Big Computing”; the materials from the talk can be found on github. The talk, as you might expect, included discussion of high-productivity big data...

Jupyter Notebooks for Performing and Sharing Bioinformatics Analyses

R
tutorial

I was asked to do a half-day tutorial at the Great Lakes Bioinformatics conference Workshop session. The focus was mainly on R, with some python as well. We covered: The basics of Jupyter notebooks - what they are and how they work How to install and run Jupyter notebooks on their laptop, in R and Python How to perform interactive analyses in a web browser using Jupyter Using markdown and latex to How to “Port” an R bioinformatics workflow from some scripts into a Jupyter...

Spark, Chapel, TensorFlow: Workshop at UMich

The kind folks at the University of Michigan’s Center for Computational Discovery and Engineering (MICDE), which is just part of the very impressive Advanced Research Computing division, invited me to give a workshop there a couple of months ago about the rapidly-evolving large-scale numerical computing ecosystem. There’s lots that I want to do to extend this to a half-day length, but the workshop materials — including a VM that can be used to play with Spark, Chapel and TensorFlow, along with Jupyter notebooks for each...

Approximate Mapping of Nanopore Squiggle Data with Spatial Indexing

Over at the Simpson Lab blog, I have an post describing a novel method for Directly Mapping Squiggle Data, using k-d trees to map segmented kmers; a simple proof of concept is available on github.

On Random vs. Streaming I/O Performance; Or seek(), and You Shall Find --- Eventually.

At the Simpson Lab blog, I’ve written a post on streaming vs random access I/O performance, an important topic in bioinformatics. Using a very simple problem (randomly choosing lines in a non-indexed text file) I give a quick overview of the file system stack and what it means for streaming performance, and reservoir sampling for uniform random online sampling.

Understanding Partial Order Alignment for Multiple Sequence Alignment

Over at the Simpson Lab blog, I have an explainer on Understanding Partial Order Alignment, an under-appreciated method for multiple sequence alignment; I hope the explanation there (and explanatory implementation) is useful to those exploring graph-based approaches to alignment.

HPC+MPI on RCE Podcast

hpc
MPI

In the latest episode of the RCE podcast, Jeff Squyres, Brock Palen, and I spoke about the HPC and MPI series of blogposts and the community reaction. It was a really interesting discussion; Brock has worked closely with an enormous variety of researchers and helps run an HPC centre, while Jeff deeply understands HPC networking, from the getting ones and zeros onto the wires at the lowest-level of hardware up to being an extremely active member of the MPI forum. I was really pleased that...

Coarray Fortran Goes Mainstream: GCC 5.1

This past week’s release of GCC 5.1 contains at least two new features that are important to the big technical computing community: OpenMP4/OpenACC offloading to Intel Phi/NVIDIA accellerators, and compiler support for Coarray Fortran, with the communications layer provided by the OpenCoarrays Project. While I don’t want to downplay the importance or technical accomplishment of the OpenMP 4 offloading now being available, I think it’s important to highlight the widespread availability for the first time of a tried-and-tested post-MPI programming model for HPC; and one...

In Praise of MPI Collectives and MPI-IO

While I have a number of posts I want to write on other topics and technologies, there is one last followup I want to make to my MPI post. Having said what I think is wrong about MPI (the standard, not the implementations, which are of very high quality), it’s only fair to say something about what I think is very good about it. And why I like these parts gives lie to one of the most common pro-MPI arguments I’ve been hearing for years;...

Objections, Continued

hpc
MPI

Thanks for all of the comments about my HPC and MPI post, on the post itself, or on twitter, or via email. While much of the comments and discussions were positive, it won’t surprise you to learn that there were objections, too; so I thought I’d keep updating the Objections section in a new post. I’ve also posted one (hopefully last) followup. But do keep sending in your objections! Further Objections You’re saying we’d have to rewrite all our code! If someone had suggested I...

HPC is dying, and MPI is killing it

Pictured: The HPC community bravely holds off the incoming tide of new technologies and applications. Via the BBC. This should be a golden age for High Performance Computing. For decades, the work of developing algorithms and implementations for tackling simulation and data analysis problems at the largest possible scales was obscure if important work. Then, suddenly, in the mid-2000s, two problems — analyzing internet-scale data, and interpreting an incoming flood of genomics data — arrived on the scene with data volumes and performance requirements which...

Spark in HPC clusters

hpc
spark

Over the past several years, as research computing centres and others who run HPC clusters tried to accommodate other forms of computing for data analysis, much effort went into trying to incorporate Hadoop jobs into the scheduler along with other more traditional HPC jobs. It never went especially well, which is a shame, because it seems that those past unsuccessful attempts have discouraged experimentation with related next-generation technologies which are a much better fit for large-scale technical computing. Hadoop v1 was always going to be...

Machine Learning for Scientists

I recently taught a 1-day machine learning workshop for scientists for the good folks at SciNetHPC. There was enough interest (nearly forty people signed up for a day-long session near the end of term) that we had to book a large-ish classroom. There’s a lot of interest in the topic — which might even be surprising, given that a lot of the material is either familiar or pretty easy to digest for those who spend a lot of their time doing scientific data analysis. But...

The Shell For Scientists

tutorial

I’ve posted a half-day “The Shell for Scientists” tutorial that I’ve given variants on a number of times; the motivating problem, provided by Greg Wilson for a two-day set of of tutorials at the University of Toronto, was cleaning up a bunch of auditory lab data on people’s cochlear implants. The focus is on productivity and automation; PDF slides are available here (although I really should translate them into a markdown-based format to make them more re-usable). Covered are a number of basic shell commands...

Floating-Point Data Shouldn't Be Serialized As Text

Write data files in a binary format, unless you’re going to actually be reading the output - and you’re not going to be reading a millions of data points. The reasons for using binary are threefold, in decreasing importance: Accuracy Performance Data size Accuracy concerns may be the most obvious. When you are converting a (binary) floating point number to a string representation of the decimal number, you are inevitably going to truncate at some point. That’s ok if you are sure that when you...

Hadoop For HPCers

I and my colleague Mike Nolta have put together a half-day tutorial on Hadoop - briefly covering HDFS, Map Reduce, Pig, and Spark - for an HPC audience, and put the materials on github. The Hadoop ecosystem of tools continues to rapidly grow, and now includes tools like Spark and Flink that are very good for iterative numerical computation - either simulation or data analysis. These tools, and the underlying technologies, are (or should be) of real interest to the HPC community, but most materials...

Scalable Data Analysis in R

tutorial
R

R is a great environment for interactive analysis on your desktop, but when your data needs outgrow your personal computer, it’s not clear what to do next. I’ve put together material for a day-long tutorial on scalable data analysis in R. It covers: A brief introduction to R for those coming from a Python background; The bigmemory package for out-of-core computation on large data matrices, with a simple physical sciences example; The standard parallel package, including what was the snow and multicore facilities, using airline...

Present and Future Computing, Data, and Networks Committee of the Canadian Astronomical Society (CASCA)

This document is a whitepaper I wrote for the CASCA Computing and Data committee outlining the computing needs for the Canadian astronomy community for the coming several years. It does a fairly decent job of laying out the diverse range of large-scale R&D computing needs for the national community. Executive Summary Advanced research computing resources have never been so essential to the Canadian Astronomy and Astrophysics research community. In the past few years, astronomical researchers have benefited greatly from modern large-scale computing systems; a diverse...

Stopping your program at the first NaN

If you know that somewhere in your program, there lurks a catastrophic numerical bug that puts NaNs or Infs into your results and you want to know where it first happens, the search can be a little frustrating. However, as before, the IEEE standard can help you; these illegal events (divide by zero, underflow or overflow, or invalid operations which cause NaNs) can be made to trigger exceptions, which will stop your code right at the point where it happens; then if you run your...

Testing Roundoff

c
ieee754

A talk has been circulating (HT: Hacker News) from a conference celebrating 50 years of scientific computing at Stanford where the author, William Kahan, discusses an old and sadly disused trick for testing the numerical stability of the implementation of an algorithm that should work with any C99 or Fortran 2003 compiler without changing the underlying code. It’s definitely a tool that’s worth having in your toolbox, so it’s worth mentioning here. We’ll consider a simple numerical problem; imagine a projectile launched from height $h...

Codes as Instruments: Community Applications and Simulation Software for the Hardware Architectures of the Next Decade

It is becoming increasingly problematic that, even as computing and data becomes more and more fundamental to research, and the complexity and diversity of computing technologies out there grows, getting stable funding for developing high-quality research software remains so difficult. In this whitepaper for the CASCA 2010 Long Range Plan, my colleague Falk Herwig and I lay out the case for increased funding of R&D software development by professional research software developers. We make a couple points which I genuinely believe to be strong: First,...

Canadian Astronomical Computing, Data And Network Facilities: A White Paper for the 2010 Long Range Plan

In this whitepaper for the CASCA 2010 Long Range Plan, I and the rest of the Computing, Data, and Network committee of CASCA lay out the state of ecosystem for computation in support of Canadian astronomy, and suggests a path forward for the time period of the 2010-2020 long range plan. Abstract Significant investment in new large, expensive astronomical observing facilities spanning a substantial portion of the electronic spectrum was a dominant theme of LRP2000 and continues to be necessary for Canadian astronomy to maintain...

-->