The study outlined in the Scientific American article examined the motor contributions to speech perception in a different way. If we are constantly simulating oral gestures when listening to speech, then manipulating our own oral motor activity during speech processing should influence perception. Ito, Tiede, and Ostry (2009) presented people with an ambiguous word (somewhere in between "head" and "had") while a machine, pictured below, stretched the participants' mouths in one of three directions.
From Ito, Tiede, and Ostry (2009)
Two of the stretching directions corresponded to the way the face changes during the pronunciation of "head" and "had." The third was unrelated. As the motor theory would predict, the direction of stretching biased participants toward hearing the word that matched the facial movement, even if the token was more similar to the opposing word. The unrelated stretching condition had no discernible effect on perception. So, there's plenty of evidence that supports a motor component to speech recognition, but, as yet, no one has come up with an experimental design that can unequivocally refute motor theory (although methods in brain imagining are beginning to come close). It seems that it's largely unfalsifiable, which makes it a difficult theory to accept wholeheartedly.
P.S. If you liked the McGurk effect, you should check out this version of it combined with the Margaret Thatcher illusion.
Ito, T., Tiede, M., & Ostry, D. J. (2009). Somatosensory function in speech perception. Proceedings of the National Academy of Sciences, 106, 1245-1248. (link)
Liberman, A. M. & Mattingly, I. G. (1985). The motor theory of speech perception revised. Cognition, 21, 1-36. (link)
MacDonald, J. & McGurk, H. (1978). Visual influences on speech perception processes. Perception & Psychophysics, 24, 253-257.