Can energy usage data tell us anything about the quality of our programming languages?
Last year a team of six researchers in Portugal from three different universities decided to investigate this question, ultimately releasing a paper titled “Energy Efficiency Across Programming Languages.” They ran the solutions to 10 programming problems written in 27 different languages, while carefully monitoring how much electricity each one used — as well as its speed and memory usage.
Specifically, they used 10 problems from the Computer Language Benchmarks Game, a free software project for comparing performance which includes a standard set of simple algorithmic problems, as well as a framework for running tests. (It was formerly known as “The Great Computer Language Shootout.”) “This allowed us to obtain a comparable, representative, and extensive set of programs… along with the compilation/execution options, and compiler versions.”
It was important to run a variety of benchmark tests because ultimately their results varied depending on which test was being performed. For example, overall the C language turned out to be the fastest and also the most energy efficient. But in the benchmark test which involved scanning a DNA database for a particular genetic sequence, Rust was the most energy-efficient — while C came in third.
Yet even within that same test, the “best” language depends on what your criterion is. For that test C also turned out to be only the second fastest language (again, placing behind Rust). But Rust dropped a full nine positions if the results were sorted by memory usage. And while Fortran was the second most energy efficient language for this test, it also dropped a full six positions when the results were instead sorted by execution time.
A faster language is not always the most energy efficient.
The researchers note that they “strictly followed” the CLBG project’s guidelines about compiler versions and the best optimization flags. Power consumption was measured using a tool from Intel — the Running Average Power Limit tool — with each program executed not just once, but 10 times, “to reduce the impact of cold starts and cache effects, and to be able to analyze the measurements’ consistency and avoid outliers.” (For this reason, they report that “the measured results are quite consistent.”) For added consistency, all of the tests were on a desktop running Linux Ubuntu Server 16.10 (kernel version 4.8.0-22-generic), with 16GB of RAM and a 3.20GHz Haswell Intel Core i5-4460 CPU.