Monday, February 3, 2020

On Cmake Quoting, Argument Splitting, and Variable Expansion

So, I recently made the embarrassing discovery that I did not understand a fundamental behavior of cmake - how cmake determines how many arguments a function call gets when you’re using variable expansion.

First, let’s start with the obvious - if you have no variable expansion, then
how many arguments a function gets is determined by spaces. ie:

myFunction(arg_one arg_two arg_three)

If you want to have a single argument that contains spaces, you can use quotes:

myFunction(arg_one "arg two with spaces" "arg the third")

Now, I also knew that when expanding a variable, it could result in more than one arg:

myFunction(${could_be_more_than_one_arg})

…and, of course, I knew that variables could contain spaces:

set(var_with_spaces "this has spaces")

So, naturally, I thought that the way to prevent a var with spaces from being expanded to more than one arg was to quote it:

myFunction("${var_with_spaces}")

This is wrong!

Or, well, at least misleading. It’s true that the call above will only get invoked with a single arg. But that’s also the case if you don’t quote it - ie, this will only result in myFunction receiving one arg:

myFunction(${var_with_spaces})

So… what IS the point of quoting a variable? And in what situations will an unquoted variable expansion result in more than one arg?

The answer has to do with cmake lists - or, more precisely, any cmake var that contains semicolons (;)1.

Any unquoted cmake arg will be treated as a list - that is, it is split into more arguments by (non-backslash-escaped) semicolons.

So, there are TWO different characters that are used to split inputs into function arguments - spaces and semicolons - and they’re used at two separate steps of argument resolution:

  • First, during the syntactical parsing step, arguments are divided up by spaces (except when they’re inside of quotes). At this stage, semicolons don’t matter.
  • At this point, we have quoted and unquoted arguments - quoted arguments are ALWAYS treated as a single argument (though they undergo variable expansion first).
  • Unquoted arguments also undergo variable expansion; however, they are then split by semi-colons, and each non-empty list item2 is fed into the function as a separate arg. Note that, at this point, spaces are NOT significant - only semicolons matter!

So, this results in 3 args:3

print_args("foo bar" stuff last)
# -- ARGC: 3

And this ALSO results in 3 args:

set(var_with_spaces "foo bar")
print_args(${var_with_spaces} stuff last)
# -- ARGC: 3

And this is the same - the quotes here are unneeded:

set(var_with_spaces "foo bar")
print_args("${var_with_spaces}" stuff last)
# -- ARGC: 3

However, this results in 4 args:

set(var_with_semicolons "foo;bar")
print_args(${var_with_semicolons} stuff last)
# -- ARGC: 4

…so quoting is needed to bring it back to 3:

print_args("${var_with_semicolons}" stuff last)
# -- ARGC: 3

Here’s something that may be confusing to some: given what I’ve just said, how many args will this translate to?

set(my_var foo bar)
print_args(${my_var})

The answer is 2, which may be confusing, since I just said that spaces inside of variables won’t result in more args. However, the culprit here is “set” - if it’s given multiple arguments, it turns it into a cmake list - which essentially just means it converts the spaces in this example into semicolons. So the value of my_var is actually foo;bar, NOT foo bar.

Here’s one other tricky bit - how many args would you expect this to expand to?

print_args(foo;bar)

The answer, again, is 2. Why is that? Didn’t I just say that, during the first step, semicolons don’t matter?

I did… but the trick here is that the unquoted-argument processing happens, even for arguments that don’t have any variable expansion. So here, initial processing results in one unquoted arg (foo;bar). The unquoted arg then undergoes variable expansion (which in this case, does nothing, since there are no vars), and is then treated as a list, and split - resulting in two args!

If you’re confused, it can help to play with some examples yourself - here’s a useful this set of test cases, and a handy print_args function:

https://gist.github.com/elrond79/ac6941b7337c607b10521a57cac85b70

I gave an exact description of the parsing process above, but you can mostly think of it like this:

  • Each quoted arg always results in exactly 1 arg - no exceptions!
  • Outside of quotes, and “outside” of variables, both spaces AND semicolons will result in arg-splitting
  • Outside of quotes, but “inside” of variables, ONLY semicolons will result in arg-splitting - spaces here are “ignored”!
  • Outside of quotes, after arg splitting, any empty items are thrown out - which may result in zero args

There is one more gotcha - if you’re defining cmake code, which is then evaluated, then this whole process of argument splitting + variable expansion happens more than once, and you may have to throw quotes in places you wouldn’t “normally” have to. ie, if you have something like this:

set(var_with_spaces "foo bar")
install(CODE "my_install_func(${var_with_spaces})" )

…then you might think that, when my_install_func is called, it will only get one arg. However, it gets two - and the reason is that, before my_install_func is in the picture, the args for install are FIRST evaluated - and it’s expansion will result in this:

install(CODE "my_install_func(foo bar)" )

…which means that, at install time, this is what’s run:

my_install_func(foo bar)

…NOT this:

my_install_func(${var_with_spaces})

…so it’s now clear that it will get two args!

Sadly, given cmake’s “everything-is-a-string” philosophy, these sort of quoting issues come up more than in other languages. The good news, though, is that if you just follow some best practices, you generally need to think about all this!

Personally, it’s been my habit to just throw quotes around variables by “default”, and just leave them off only in cases where argument splitting is expected or makes sense.

So, in this case, I always quote:

divide_things("${var1}" "${var2}")

…because I don’t expect this function to ever want more than two args. However, in this situation:

call_something("${var1}" ${more_args})

…I leave more_args unquoted, because it’s whole point is it might expand into one or more things. Even in the case of the install code examples, if you were following these guidelines, then you’d probably have written:

install(CODE "my_install_func(\"${var_with_spaces}\")" )

…which would “just work” as expected.

So, if you just follow the general rule of “always quote unless you want/expect multiple args”, then for the most part, you don’t need to know the details!


  1. Since there really is no cmake “list” type, just strings with semicolons which are treated differently in some contexts. ↩︎

  2. Note that since only “non-empty” items are passed through, this means also means that an empty string will be interpreted as ZERO args… a common cause for errors when a var doesn’t exist! ↩︎

  3. You can find the definition of the print_args function in this gist (the same one I link to later). ↩︎