Monday, February 3, 2020

On Cmake Quoting, Argument Splitting, and Variable Expansion

So, I recently made the embarrassing discovery that I did not understand a fundamental behavior of cmake - how cmake determines how many arguments a function call gets when you’re using variable expansion.

First, let’s start with the obvious - if you have no variable expansion, then
how many arguments a function gets is determined by spaces. ie:

myFunction(arg_one arg_two arg_three)

If you want to have a single argument that contains spaces, you can use quotes:

myFunction(arg_one "arg two with spaces" "arg the third")

Now, I also knew that when expanding a variable, it could result in more than one arg:

myFunction(${could_be_more_than_one_arg})

…and, of course, I knew that variables could contain spaces:

set(var_with_spaces "this has spaces")

So, naturally, I thought that the way to prevent a var with spaces from being expanded to more than one arg was to quote it:

myFunction("${var_with_spaces}")

This is wrong!

Or, well, at least misleading. It’s true that the call above will only get invoked with a single arg. But that’s also the case if you don’t quote it - ie, this will only result in myFunction receiving one arg:

myFunction(${var_with_spaces})

So… what IS the point of quoting a variable? And in what situations will an unquoted variable expansion result in more than one arg?

The answer has to do with cmake lists - or, more precisely, any cmake var that contains semicolons (;)1.

Any unquoted cmake arg will be treated as a list - that is, it is split into more arguments by (non-backslash-escaped) semicolons.

So, there are TWO different characters that are used to split inputs into function arguments - spaces and semicolons - and they’re used at two separate steps of argument resolution:

  • First, during the syntactical parsing step, arguments are divided up by spaces (except when they’re inside of quotes). At this stage, semicolons don’t matter.
  • At this point, we have quoted and unquoted arguments - quoted arguments are ALWAYS treated as a single argument (though they undergo variable expansion first).
  • Unquoted arguments also undergo variable expansion; however, they are then split by semi-colons, and each non-empty list item2 is fed into the function as a separate arg. Note that, at this point, spaces are NOT significant - only semicolons matter!

So, this results in 3 args:3

print_args("foo bar" stuff last)
# -- ARGC: 3

And this ALSO results in 3 args:

set(var_with_spaces "foo bar")
print_args(${var_with_spaces} stuff last)
# -- ARGC: 3

And this is the same - the quotes here are unneeded:

set(var_with_spaces "foo bar")
print_args("${var_with_spaces}" stuff last)
# -- ARGC: 3

However, this results in 4 args:

set(var_with_semicolons "foo;bar")
print_args(${var_with_semicolons} stuff last)
# -- ARGC: 4

…so quoting is needed to bring it back to 3:

print_args("${var_with_semicolons}" stuff last)
# -- ARGC: 3

Here’s something that may be confusing to some: given what I’ve just said, how many args will this translate to?

set(my_var foo bar)
print_args(${my_var})

The answer is 2, which may be confusing, since I just said that spaces inside of variables won’t result in more args. However, the culprit here is “set” - if it’s given multiple arguments, it turns it into a cmake list - which essentially just means it converts the spaces in this example into semicolons. So the value of my_var is actually foo;bar, NOT foo bar.

Here’s one other tricky bit - how many args would you expect this to expand to?

print_args(foo;bar)

The answer, again, is 2. Why is that? Didn’t I just say that, during the first step, semicolons don’t matter?

I did… but the trick here is that the unquoted-argument processing happens, even for arguments that don’t have any variable expansion. So here, initial processing results in one unquoted arg (foo;bar). The unquoted arg then undergoes variable expansion (which in this case, does nothing, since there are no vars), and is then treated as a list, and split - resulting in two args!

If you’re confused, it can help to play with some examples yourself - here’s a useful this set of test cases, and a handy print_args function:

https://gist.github.com/elrond79/ac6941b7337c607b10521a57cac85b70

I gave an exact description of the parsing process above, but you can mostly think of it like this:

  • Each quoted arg always results in exactly 1 arg - no exceptions!
  • Outside of quotes, and “outside” of variables, both spaces AND semicolons will result in arg-splitting
  • Outside of quotes, but “inside” of variables, ONLY semicolons will result in arg-splitting - spaces here are “ignored”!
  • Outside of quotes, after arg splitting, any empty items are thrown out - which may result in zero args

There is one more gotcha - if you’re defining cmake code, which is then evaluated, then this whole process of argument splitting + variable expansion happens more than once, and you may have to throw quotes in places you wouldn’t “normally” have to. ie, if you have something like this:

set(var_with_spaces "foo bar")
install(CODE "my_install_func(${var_with_spaces})" )

…then you might think that, when my_install_func is called, it will only get one arg. However, it gets two - and the reason is that, before my_install_func is in the picture, the args for install are FIRST evaluated - and it’s expansion will result in this:

install(CODE "my_install_func(foo bar)" )

…which means that, at install time, this is what’s run:

my_install_func(foo bar)

…NOT this:

my_install_func(${var_with_spaces})

…so it’s now clear that it will get two args!

Sadly, given cmake’s “everything-is-a-string” philosophy, these sort of quoting issues come up more than in other languages. The good news, though, is that if you just follow some best practices, you generally need to think about all this!

Personally, it’s been my habit to just throw quotes around variables by “default”, and just leave them off only in cases where argument splitting is expected or makes sense.

So, in this case, I always quote:

divide_things("${var1}" "${var2}")

…because I don’t expect this function to ever want more than two args. However, in this situation:

call_something("${var1}" ${more_args})

…I leave more_args unquoted, because it’s whole point is it might expand into one or more things. Even in the case of the install code examples, if you were following these guidelines, then you’d probably have written:

install(CODE "my_install_func(\"${var_with_spaces}\")" )

…which would “just work” as expected.

So, if you just follow the general rule of “always quote unless you want/expect multiple args”, then for the most part, you don’t need to know the details!


  1. Since there really is no cmake “list” type, just strings with semicolons which are treated differently in some contexts. ↩︎

  2. Note that since only “non-empty” items are passed through, this means also means that an empty string will be interpreted as ZERO args… a common cause for errors when a var doesn’t exist! ↩︎

  3. You can find the definition of the print_args function in this gist (the same one I link to later). ↩︎

Sunday, May 6, 2018

Wavenet Paper: impressions

So, I'm going to start posting my impressions of some deep-learning papers I've read, mostly as a way of encouraging myself to do more (because, you know, if it doesn't exist online, it didn't happen!), but also to provide a place for me to make notes and ask questions.  So here goes...

Recently read the paper WAVENET- A Generative Model for Raw Audio, by Aaron van den Oord et al (2016-09-19) - you can find a webpage describing it at a higher level, with results, here.

Quick summary - they have come up with a new method for generating convincing human speech, given an input text, that matches a given speaker identity... and the results are really impressive!  I, for one, don't know if I would be able to distinguish it from actual recorded human speech!

Of course, I should note, based on the samples, the training datasets seem to mostly consist of professional voice actors reading text samples in... well, as somewhat flat, professional sort of way... like they were going to be used to give google-maps driving directions, or something similar.  Words are all clearly enunciated, and sentences seem to have pretty well-delineated endings.  It's possible that it might be harder to emulate, say, conversational speech - which is a lot more free flowing - or speech with a more dramatic intent.  I suspect the algorithm would struggle to apply the right emotional cues to the correct situations, for instance, or might stray more from believability the longer it had to generate a continuous sample for.

Still - what it does is VERY impressive... it adds inflection and emphasis at believable places, and even adds things like audible breaths.  And the model is general enough that they were also able to use it to generate music - ie, "original" piano compositions, for instance.  Here we start to see some of the limitations of the technique: they manage to convincingly sound like a piano being played, but fail to give the impression of a larger composition.  They're a bit like a conversation with someone with severe short term memory problems, or like a dream: there's continuity / cohesion on short time scales, but if you take a step back and look at it on a larger time scale, there's large shifts and not much overall consistency.

Anyway - as for the nitty-gritty in the paper itself: the basic idea is they they use a 1-D convolutional network... with the restriction that the convolutional filter only extends backwards in time. This way, it can be used to generate new audio, one sample at a time, by generating a new sample, then shifting the convolutional input point forward one sample (to use your newly-generated sample), reapplying to generate the next sample, etc.  It's a bit like one of my favorite scenes from Wallace and Gromit, where he's riding a train, and laying down the track before him as he goes:




They call this a "causal" convolution... which makes sense, unless your poor brain for some reason keeps reading that as "casual", and you spend half the paper wondering why they think their approach is so informal...not that I would do that... 😙♫

In order to increase the "receptive area" of a node, without adding too many layers, they use a technique called "dilation," which I initially mistook as just a fancy term for "stride," but there's a key difference.  To explain the difference, they show this image for a non-dilated convolutional network:



...and this one for a dilated network:

Now, on viewing the second, I thought, "Ah, that's just convolutional network, with a stride of 2!"  However, on re-reading, I noticed this line (emphasis mine):

This is similar to pooling or strided convolutions, but here the output has the same size as the input.
I had thought that all the orange circles besides the far-right most one, and all the dotted-lines connected to them, represented the graphs we would get, if we were generating a different time sample.  All the circles that didn't have a bold arrow connecting to them, then, were not nodes that actually existed in the evaluation of the layer, but were only shown to illustrate their "place" if we were evaluating at a different time step.  This meant that the number of nodes in the layer went from 16 to 8 to 4 to 2 to 1... a classic "stride-by-2" layout.  This is incorrect - ALL the outputs are used when evaluating at the current time step, and all nodes in all the layers exist, for this time-step... which means the "dilation" really is "skipping N nodes".

This is an interesting approach, compared with classic strides... basically, it's a way of getting the receptive-area-expansion effect of strides, but without the down-rezzing.  The downside, obviously, is that it makes for a much larger / complicated graph, which presumably takes longer to train, etc.  Still... assuming I can make graphs using this technique that will fit into my graphics card's memory, it would be interesting to experiment with using this instead of strides in some 2D pixel networks (ie, style-transfer); I feel the commonly-used strides are likely introducing some boxy artifacts, and perhaps this approach will help alleviate that.

Wednesday, April 4, 2018

L2 Regularization and neural network "simplicity"

So, this is related to the topic of my last blog post, http://neuralnetworksanddeeplearning.com - and I was initially going to bundle this in there, but it's kinda lengthy, and I didn't want to dilute my whole-hearted recommendation of the book... what follows really is something that started as nit-picking, but led to what (I think) was a better understanding of how L2 regularization works... and it by no means dampers my enthusiasm for the book!

However... for some reason, when reading the book, there was one section that made me pause, and the more I thought about it, the more I came to a different conclusion than the author. It was, as foreshadowed by the title, the section on L2 Regularization and neural network "simplicity".

In it, he essentially makes the claim that L2 Regularization results in a simpler "model".  He spends a fair amount of time discussing simplicity in a larger sense, and examples where simpler explanations are or are not more correct... but never really makes a convincing argument for why the smaller-weight models favored by L2 regularization should be considered "simpler".

He DOES make some good points about why it might favor more generalized models, vs just memorizing noise... and then implies that this therefore makes it "simpler".  The main justification here is an analogy to a situation where you have some noisy data, and can use either a linear approximation or a polynomial fitting. Intuitively, the linear model is both simpler AND more generalized, but I don't know that the two things - simplicity and generalization - always go hand-in-hand, and I would argue that in the case of neural networks and L2 regularization, they don't.

To see why, let's consider one of the ways in which his linear vs polynomial comparison differs from our regularized vs. unregularized comparison: number of variables. His polynomial model essentially has 10 different variables, while his linear model only has one, slope (or two, if you consider offset, though in his pictured example it's 0).  So, another way of looking at it might be to call the simpler network the one with fewer variables.

Ah, you say, what relevance does that have to our regularized vs. unregularized comparison? Don't both of those have the same number of parameters? And, technically, yes, that's true... but consider this: regularization is something that helps a network perform better when it's overfitting... that is, when it's number of parameters is relatively large compared to the number inputs we're training over. So, say we have a situation where regularization is helping; in that case, it's likely that if we take the unregularized version, and simply increase the size of the network (but keep the training set size the same), we'll see relatively small increases in real-world performance... but if we do the same with the regularized one, we might expect to see a bigger impact.  That implies that regularized networks are making "better use of" their parameters... that is, that even though they technically have the same number of parameters, the regularized one has more "useless" parameters... and I think that's exactly what's happening.

To see why I think L2 regularization helps avoid "useless" parameters, let's take things down to more concrete terms: on the most basic level, if we have a set of 4 weights, then given two distributions of weights, A and B:

A = [.03, .9, .02, .05]
B = [.3, .4, .1, .2]

...then regularization will strongly favor B over A. But without any context, but simply looking at the weights, I think most people would say that A is simpler than B - A is effectively saying "the second input is so much more important than the other inputs, we we can effectively ignore the rest of them" - that sounds a whole lot simpler than the approach B is taking, which is to effectively say, "while the second IS more important, it's still important to consider all the others as well!"  Without regularization, there's nothing to prevent this from going to extremes, as long as it happens to fit better to the training data - ie, A (the unregularized) result might end up looking like [.0001, .99999, 1e-10, .001] - which can be pretty much modeled by a 1-parameter system, and is fairly "simple" - even though B might only give a 2% worse result on the training data, and still uses all 4 parameters.

To put things in a different perspective - let's look at the handwriting-recognition problem. Say we happen to notice that ALL the "9"s in our training sample have a value > .5 in a given pixel. As time goes on, without regularization, our neural network will tend to HEAVILY weight the input from that pixel when deciding if something is a 9, which effectively ends up decreasing the importance of other pixels, or larger patterns.  The regularized approach, on the other hand, will sort of be saying that, "ok, even though that one pixel seems more important on this data set, I don't want to forget the contributions of all the other pixels" - so that, when we feed the network a 9 that is < .5 in that pixel, it is able to cope with that better.  This is a more nuanced approach, and to my mind at least, more complex.

Finally, I would argue that, for most problems we want to use machine learning for, Occam's razor is reversed - the simpler solution is LESS likely to be correct! Indeed, the whole field of machine learning can be thought of having been birthed by the desire to find more complex solutions - ie, for dealing with problems for which we can't find any simple models to deal with.  The problems are so complex, that intuitively, I'm likely to think that the that more correct model is also likely more complex*... so, since regularized models tend to give better results for these problems, I'm more inclined to believe they're more complex!

Now, I know that a lot these arguments are pretty complex and hand-wavey... but to me, they feel closer to the truth of what's happening here... and, I suppose, the real point of all this was that I think it gave me a better intuition on how L2 regularization is likely working!

*I think this heuristic - that machine-learning problems are so complex that the more correct model is also likely more complex - will often hold because the "ground" truth for many of these problems is what a human would say - ie, our basis for comparison is the model mapped in the neurons in our brains, which are incredibly complex.  Of course, there are counter examples - handwriten digit recognition is largely solved, for instance, with relatively small networks, so the heuristic sort of fails here.  But the standard NIST handwriting recognition problem is also one with a lot of constraints and preconditions, which make it a lot easier to solve - we're presupposing that the images we're fed ARE digits, they're frequently segmented already, we're only considering digits (and not letters, and capital letters, and punctuation), we don't have to find them within larger images, etc.  The more of those preconditions are eliminated, the closer they get to the tasks our brains are actually doing, and the more complex the problem gets... and the more I will believe that a more correct network is more complex.

Neural Networks and Deep Learning

So, the title of the post serves two purposes - one, to serve notice to this blog (hah - as though anyone reads this!) that these are topics that I've recently become very interested in, and will likely be posting about a lot, and two, to let everyone know about an awesome online book of the same name, http://neuralnetworksanddeeplearning.com

It's a REALLY great resource for people looking to get started with neural networks. Of course, everyone learns in different ways, so I should clarify that I'm someone who likes to get a good mix of practical knowledge and theoretical underpinnings... but if that sounds like you, then I can't recommend this book highly enough. In a relatively short amount of text, he gives a broad enough overview of the field that I really felt I could start diving into the topic - reading papers, and tinkering with code - while still managing to go in depth enough into his topics that I felt I had a decent understanding of how (or why) they worked. It's a rare feat... kudos, Michael Nielsen!

Monday, April 18, 2016

My quick opinions of Eclipse CDT vs CLion

In my last post, I mentioned I found a neat feature in Eclipse, but that I still use CLion most of the time. This begs the obvious question: Why don't I just use Eclipse / CDT full time?

Well, I initially switched because of two main reasons: 1) annoyance with the time it always took getting the IDE setup to recognize all my various include paths, library paths, options, etc... and 2), the fact that CODAN* (the static analyzer in CDT) seems to miss a lot of situations. So I decided to give it another shot.

I mostly work with CMake projects - a given, since I'm using CLion, and it's biggest downside is it ONLY works with CMake - so I used the CMake Eclipse project generator. It seemed to work fairly well, which helped with 1)... but longer term, the fact that I would potentially need to re-run it any time the CMake changed... which in turn would mean I could lose any project settings / changes I made from within Eclipse - is a worry. Plus the fact that I have to run a separate command line tool frequently is a turn off.

Still, those are things I could deal with... except it seemed that 2), CODAN's unreliable, is still a factor. I had only been using it for 5 minutes before coming across a situation which CODAN incorrectly flagged as an error, but which CLion got right. I always find it annoying to have all those extraneous red highlights in my IDE, so back I went to CLion...

...but, I still keep Eclipse open for that Shift+Alt+T action!

*...though, if this is named after the infamous Armada in "The Last Starfighter," thumbs up to that!

IDE shortcuts for adding function definitions

So when I'm writing C++ (or C) code, I constantly find myself either writing a function inside the header (.h) file, and later wanting to move it to the the implementation (.cpp) file.... or KNOWING I'll want it in the .cpp at the outset, but having to re-type out all the boilerplate in the .cpp again.

I mostly use CLion these days for C++ development... and while it's generally a nice IDE, I was dismayed to find it doesn't have couldn't find a refactor option for this*. Eclipse, however, has one: "Toggle Function defintion". You can get at it right clicking in the function definition, going to "Refactor" > "Toggle Function Defintion"... or by just pressing Shift+Alt+T.

It has a few caveats, however:
  • You have to do each function one at a time 
  • It's a two step process - it will first shift it to outside the class declaration, but still within the header file, as an inline definition... you then have to scroll down to it, click on it again, and move it to the .cpp file. 
  • You have to have a defintion; so for cases where I know I'm going to want it in the .cpp at the outset, I have to add an empty definition ({}), then click on it, before moving. 
The last gripe is pretty minor, but the other two cost time... though they STILL save enough time that I find myself keeping a copy of Eclipse open, alongside my copy of CLion, just to use this feature.

Why don't I just use Eclipse / CDT full time? Good question... but my answer started to veer off topic a bit, so I put it in another post..

*Update! A friend showed me that CLion does, indeed have an intention - just go to the declaration, press "Alt-Enter", then choose "Implement function 'foo'" - or, if it's already defined in the header, click on the name, hit"Alt-Enter", then choose "Move function definition to source file".  Huzzah!

Friday, October 26, 2012

How do you find where a python attribute "comes from"?

I work on a python project in the visual-effects industry called pymel, and someone recently asked me about the 'vtx' "attribute" on a mesh object - where it came from, and how they would find that out.  The answer is that it's added using the __getattr__ method on the Mesh class... but this got me thinking - is there a general way to find where a given attribute "comes from?"

When classes use tricks like __getattr__, it's hard to determine - standard methods like using dir or searching through the mro's __dict__ entries won't help.

The only way I could think of to find out it's from a __getattr__ would be to march up the mro chain, looking for __getattrs__, and testing them - to see if they return a result for the desired attribute, and at what point that result changes.

So, I wrote a function which does just that... and while it's at it, also checks the __dict__, __slots__, and __getattribute__.  It even does a last-ditch check to see if it's a c-compiled object. It's designed to generally tell you, "where the heck did this attribute come from"?
In order to get all (or at least, most) of the edge cases right, it ended up being way more complex than I'd originally imagined.. but it seems to get things right nearly all of the time. Hopefully it helps someone!