Many of you must have heard of the Julia programming language by now, perhaps from me or @rvezy or someone else, but may be wondering was is all the fuss about. There is plenty of material online that can explain the details better than I could ever do, but I thought about sharing why I think Julia is the perfect language for FSPM. My idea is that anyone who has used Julia can also share their experience below (or if you are doubting yourself, ask questions about it!).
Below an elevator pitch (tldr) and then you can stay in the elevator until the 50th floor and read the rest .
tldr: It is very easy to write code in Julia and make it run as fast as your machine allows, even if you are not an experienced software developer. Also, Julia is designed for collaborative projects and for maximum code reuse with minimum hassle. These two aspects challenge the classic developer-user dualism that I believe is at the root of the sustainability crisis in scientific software, allowing for cutting-edge software that is easier and cheaper to maintain and more accessible to the average user .
Julia: A solution to the two language problem
Firstly, the Julia programming language is the first serious attempt at solving the two language problem. This is a phenomenon whereby programming in may areas (e.g., research) is split into two roles: users that employslow, dynamic, interactive languages (R, Python, Matlab) that are easy to learn and use for prototyping code and developers that used fast, compiled, static languages to implement platforms, algorithms, models and anything that had to scale and run fast (C/C++, Fortran, Java). Although there have been attempts to speed up classic dynamic languages there is still is a large performance gap between languages like Matlab, Python or R and the potential performance of a really fast language (Julia Micro-Benchmarks).
The consequence is that, for applications where significant engineering and outsourcing to compiled libraries is possible, the user will just interact with an API in Python or R without sacrificing too much (good examples would be deep learning with Keras or Bayesian modeling with brms). However, the moment the user needs that one extra feature that the developer did not account for, they are out of luck, as it may not even be possible to add it without significant engineering effort (good examples are anything involving non-linear or “black box” models: non-linear optimization, non-linear solvers, (partial) differential equations, ray tracing).
Julia is a fast, dynamic, interactive language designed specifically to get rid of the two language problem. You can write your code in Julia and with very little effort on your part you will get the performance of native C code (at that point the limitation is your skill as a programmer). Not only that, Julia tends to produce high performance with very little code that is as readable as Python’s code and the performance improvements follow a clean and well-document workflow. And this applies to any code, no matter the domain or whether it is a linear or non-linear problem. This challenges the classic paradigm: coders can become more like developers with much less time investment, and that includes maintaining the code (which is what kills most software…). This blog post illustrates this nicely - My Target Audience. This shift in the paradigm is important: the developer - user paradigm is important because that model may work in industry but not in academia due to ever increasing cost of maintaining and improving software that requires experienced software developers while at the same time always limiting the options users have.
This developer-user dualism is very strong in the FSPM community. In some cases, the FSPM software is not even presented as an API within a scientific languages (like Keras or brms that I mentioned above) but as a standalone “studio” or “platform” that is meant to be self-contained and offer everything the user requires. This can lead to workflow that are hard or impossible to reproduce (let alone automate) without significant effort from the user. Inevitably, this also means the user will miss features that are needed and cannot leverage other packages like one would do in R, Julia or Python. This is indicative that FSPM has a strong developer-user culture and two-language problem, hence the potential of Julia.
Julia: Decentralized and collaborative
If you use Julia you will hear about its most important feature as a language: multiple dispatch. This may sound like an obscure technical detail but it actually enables another paradigm shift in the Julia community. Rather than the old paradigm of large, monolithic packages with complex class hierarchies, Julia favors a decentralized, functional approach to programming. The classic paradigm emerged out of object-oriented programming with reference classes (as defined in C++, later Java and Python, which have been the languages used to teach computer scientists). In this old paradigm, the emphasis is on defining classes where the data structure and functionality (methods) are bound to each other and code reuse is mostly achieved by inheriting data and methods from other classes.
In Julia, emphasis is on generic programming and interfaces: focus on functionality or on data structures, but do not marry one to the other. This leads to a horizontal rather than vertical organization of data types and methods which enables a massive amount of code reuse. Do you need to add functionality to an existing data type? Just define a method for it without ever touching the source code? Do you want your function to apply to any data type? Just leave the type of input undefined and Julia will compile a specialized method when you call the function (like any function in Python of R but actually fast).
An example of this amazing feature is how you can take any function that was written to work with “normal” arrays of numbers and, if it is written in a certain style (which is often required for performance) you will be able to run the code on the cpu or the gpu just by changing the type of array (e.g., just replacing Array
with CuArray
in your code). More examples include performing calculations with physical units (we all have been bitten by that one…), uncertainty propagation, automatic differentiation, etc. In all these cases, by writing the function as generically as possible a simple change in the type of inputs you will greatly extend what you can do without changing the source code. Try doing any of this with a library that contains 100 classes all tangled up in an inheritance tree.
Finally, collaboration in Julia is not only enabled by the way code can be combined but also by the way developers interact. Every Julia package is a git repository, it has to be, there is no option. Registering a package is pretty much a matter of adding a Github repository address onto a central registry (correct me if I am wrong here), which itself is just a Github repository (try to register an R package in CRAN and it will make you cry…). The entire package development is meant for collaborative workflow (e.g., documentation and unit testing is designed to work with continuous integration from the start) and anyone is welcome if they have good ideas, regardless of your status or institution, as long as you bring good ideas to the table. Also, most Julia packages have a vanilla MIT license which basically means: “do whatever you want with the code and don’t bother me about it”.
If you got here, congratulations, you are either enthusiastic about Julia or you have a lot of free time. Anyhow thanks for reading and hope to see your reply below