I substituted 'peace and quiet' for 'which' in the sentences to help clarify the difference in meaning when we then used ' what' and 'why'.
However, the sentence I formed,
"Peace and quiet is what he can't enjoy ..." is not how a native speaker would say it. Also, you used the plural 'cities' and used the article 'the' - these complicate the meaning. Let's make things simpler.
You are asking about the difference in meaning between ..."enjoy, living in..." and "...enjoy by living in..."; so let's change the sentences to what a native speaker would say:
He can't enjoy much peace and quiet, living in a big city.
What the speaker is actually saying is," ...peace and quiet, living as he does in a big city." So the meaning is, he is unable to enjoy any peace and quiet because he lives in a city, (and we know from our own experience that cities are noisy.)
"...peace and quiet by living..."
When we use 'by' with a verbal noun (living, going, eating, taking), then 'by' indicates how something can be achieved: 'mosquitoes can be killed by spraying with insecticide';
So: "By living in a big city, I can go to theatres and museums whenever I want."
and
"I can enjoy peace and quiet by living in the country."
could "Country life gives him peace and quiet, which is why he can't enjoy living in big cities" be possible?
Yes. We understand already that country = peaceful/city=noisy, so the logic of the statement is fine.