Shalom's Blog

Command Line Arguments Considered Harmful

March 22, 2021 | 3 minutes

I'm not going to even start with a story. Let me just give you an example. If you want to run a distributed training operation with pytorch, this is the command to run.

python -m torch.distributed.launch \
--nproc_per_node 8 text-classification/run_glue.py \
--model_name_or_path bert-large-uncased-whole-word-masking \
--task_name mnli \
--do_train \
--do_eval \
--max_seq_length 128 \
--per_device_train_batch_size 8 \
--learning_rate 2e-5 \
--num_train_epochs 3.0 \
--output_dir /tmp/mnli_output/

This is what we consider a small set of command line arguments for a relatively incomplex python script. Command line arguments should not be used like this. As a user, I hate that I have to constantly remember what the commands where to run this script. I hate that whenever I have to replicate running the script to someone else, I have go through the added work of going into my bash, pressing Ctrl-R, and finding the command in my history so I can go copy it over and send it with the instructions.

Command line arguments do not make sense when you have more than 8. Nobody can remember what they do or how to use them. When's the last time you got the command to run perfectly for the first time for a command other than curl, git, or grep? It's been years and I still get my docker flags wrong. So why are we expected to remember the flags for a python script we run infrequently?

Consider how often you have to go figure out what commands you typed when you explain to your co-worker how to use that script. Consider what you'd do if you have to use that script on a new machine. In each case, you have to stop, go dig into the instructions.

Unless it's a command you're familiar with, command line arguments are a hassle. But they're not even necessary. In every case where a program was written to take in command lines, it could have used a config file instead.

Use Config Files

Command line arguments and config files enable the program to do exactly the same thing. The program becomes parametric to some user-specified input. They're identical in effect, but in terms of the experience of the person using them there's an enormous difference. Config files are shareable, config files are commentable, they can be stored in git, they can't be lost to your bash history, and they're even easier to edit. Imagine a world where you'd have to configure your nginx by command line arguments. Or if you could only run python scripts through -c. You store your nginx configs and python scripts in files because it's simply far more convenient. For complex command line arguments, the same is true.

Exceptions

The only time when it makes sense to use command-line arguments is when the program would actually be less complicatd without a config file. For simple commands, commands you use often, and commands meant to be composed together in a shell, they clearly should not have to take config files. However, for those big beefy programs, a server, a database, or even a small program just happens to take 10 or parameters to run, they should be run via config file.

By Shalom Yiblet
follow @syiblet for updates