Yes, good idea. It [audio clips] won't complicate the skin.
I agree. I wouldn't even put a checkbox for it in the UI - if an image has an attached audio clip, use it. Audio clips can't get there by accident, or be being added by some other app.
The thing to think about is, "When a new user opens the settings for the first time, will he be scared off? Will he immediately think, 'Oh, my god, I have to learn coding to use this!!!'"
That's the problem with things that add flexibility, but at the cost of simplicity. Things like code-laden caption templates or a forest of metadata options in a fiddly-to-edit box are fine for an experienced user, but they scare the horses.
