"A YouTube video of a cat eating peanut butter also contains footage of a very sad boy who wanted a peanut butter and jam sandwich."
To use the word "footage" instead of "video clip" is not correct, nor is saying "two footages" as footage is not a count noun. Footage is referred to as "some footage", "the footage" or "lots of footage" (I can't think of any others).
[Not a teacher]
Student or Learner